Stable Diffusion1.5を試してみる

Pocket

すでに一度Stable Diffusionを動作させたことがある方向けになります。

まだ試したことがない人は、以下をまず実施してみてください。

以下のページにアクセスして、同意します。(Hugging Faceのアカウントが作成されていることが前提です)

https://huggingface.co/runwayml/stable-diffusion-v1-5

同意していきます。

colabを立ち上げて、ノートブックを作成しておきます。

以下のコマンドを入力します。(2023/01/29時点で動作しないので後述するものを参照ください)

!pip install diffusers==0.5.1 transformers scipy ftfy

以下が表示されたら成功です。

Successfully installed diffusers-0.5.1 ftfy-6.1.1 huggingface-hub-0.10.1 tokenizers-0.13.1 transformers-4.23.1

Hugging Faceのアクセストークンを確認します。

以下にアクセスして確認します。

colabで、以下を入力し実行します。

YOUR_TOKEN="Hugging Faceのアクセストークン"

続いて以下を実行します。(2023/01/29時点で動作しないので後述するものを参照ください)

from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16",use_auth_token=YOUR_TOKEN)
pipe = pipe.to(device)

この記事を書いた際は、上記コードで動作していのですが、以下のエラーが出て動かなくなっていました。

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-e2de96012da6> in <module>
----> 1 pipe = StableDiffusionPipeline.from_pretrained(MODEL_ID, torch_dtype=torch.float16, revision="fp16",
      2                                                use_auth_token=YOUR_TOKEN)
      3 pipe = pipe.to(DEVICE)

/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    492                         class_obj()
    493 
--> 494                     raise ValueError(
    495                         f"The component {class_obj} of {pipeline_class} cannot be loaded as it does not seem to have"
    496                         f" any of the loading methods defined in {ALL_IMPORTABLE_CLASSES}."

ValueError: The component <class 'transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor'> of <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.

以下のように修正して実行します。一旦colabのセッションを切る必要があります。0.9.0に固定しています。

!pip install diffusers==0.9.0 transformers scipy ftfy

pretrainedのパラメータが、model_idのみとなっています。

from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe = pipe.to(device)

すべてのDLが完了したら準備完了です。

以下のコマンド実行で、cat.pngが生成されます。

prompt = "cute cat paly with ball"
image = pipe(prompt).images[0] 
image.save(f"cat.png")

以前のコマンドは、以下でした。

prompt = "cute cat paly with ball"
image = pipe(prompt)["sample"][0]
image.save(f"cat.png")

生成された画像は、以下になります。

前回の画像がこちら。たまたまかもしれませんがよくなっている気がします。

前回やったコマンドもバージョン1.5でも試してみます。

prompt = "cute cat paly with ball, professionally retouched, soft lighting, wide angle, 8 k high definition, intricate, elegant, art by brian miller, peter mohrbacher, Lens flare effect"

image = pipe(prompt, height=512, width=768).images[0]

image.save(f"cat.png")

これは前回の方が好きな感じです。

アニメのキャラクター的なものを出してみようと思います。

なんかそれっぽいw

prompt="cute girl, anime, japan, high quality, warrior"
image = pipe(prompt, height=512, width=768).images[0]
image.save(f"girl.png")

もう一つ挑戦

prompt="cute girl, anime, japan, high quality, sword warrior, whole body, face, 4k, 8k"
image = pipe(prompt, height=512, width=768).images[0]
image.save(f"girl5.png")

なかなかいいのではないでしょうか

ほんと呪文しだいですが、遊びとしては楽しいですね。

2023/01/29に同じコマンドで再生成したら、こんな感じの画像になりました。