Home Artists Posts Import Register

Content

More and more people are asking for insights on how to enhance deep learning model-created faces. The answer is simple: MOAR AI!

I've tuned an openai diffusion model to aid me in making better faces for my tarot cards deck (or I'm just too lazy and unskilled to draw them manually).

This post will be a starting point in explaining how to train your own models, and, most importantly, how to plug them back into DiscoDiffusion and other awesome notebooks.

I've used this repo https://github.com/openai/improved-diffusion but you can use this one as well - https://github.com/openai/guided-diffusion

I tried to use the smaller model from vanilla DiscoDiffusion (256x256), but it's been too large to fit into colab GPU even with a batch of 1.

We can either fine-tune a smaller model that will fit our rig, or train it from scratch. 


Code

Moved it to colab to not scare you with patreon formatting :D


Colab

A colab link to start with: https://colab.research.google.com/drive/1Xfd5fm4OnhTd6IHPMGcoqw54uhGT3HdF?usp=sharing

That should do it!


FAQ

Q: Do we need text captions or text-image pairs to fine-tune the model?
A: No, only images. We are tuning an unconditioned generator, so we don't need text.

Q: How do we plug our model into DiscoDiffusion?
A: We just change the model and some model settings to the ones we used while tuning.

Q: How do we resume training?
A: You need to have 3 checkpoint files: opt*.pt, model*.pt, and ema*.pt
Then specify the model file in you command line when resuming like this:
--resume_checkpoint /content/drive/MyDrive/deep_learning/ddpm/v2/model052000.pt
(don't forget to replace with the actual path to your model :D)


Files

Comments

Snifferson

Love the card... What kind of workflow do you have to bring all of that together?

sxela

Hi! I made it with DiscoDiffusion and then ran its face through my comics faces model as init and then put it back

Nancy Hao

Hi! I really like your tutorial and have generated some high-quality animated face images! I'm new to the diffusion model and have a question about the relationship between CLIP and the Diffusion model. When we fine-tune the diffusion model, do we need to worry about training CLIP? Do we need to provide some text-image pairs when we fine-tune Diffusion using a new dataset? You mentioned that the fine-tuning is going to use CLIP inside DiscoDiffusion. But should we give CLIP any prompts?