Home Artists Posts Import Register

Downloads

Content

Hey guys, so I am trying to figure out the best combination of settings for Lora model training and I went through some thorough image generations that I thought you might appreciate if you are aiming to train your own models.

I use this software for my Lora model training:
https://github.com/bmaltais/kohya_ss

Here are the key details for the baseline control training session

  • 50~ images
  • 2 repeats = 100 images per epoch
  • 25 epochs
  • batch 4
  • learning rate: 1e-4
  • lr scheduler: constant
  • network dimm and alpha = 4

Here is a screenshot of the training configurations in detail, and I have attached the config.json to easily load.

So I wanted to experiment on 2 configuration variables: Network Rank and Network Alpha together, and the LR Scheduler.

So using the base training configuration, I try out Network Rank and Alpha at settings:

  • 8
  • 16
  • 32
  • 64

And I Set the Scheduler as

  • Constant
  • Cosine
This is the prompt that we are testing:

watermark, 8k, hires, masterpiece, highest quality, facing camera,  realistic digital painting photo of a godly royal samurai (dark elf:1.2)  (girl female:1.2), far shot, medium curly black hair, (Dungeon and  Dragon:1.3), fire smoke and ashes particles, ivory full plate armor with  fire and Blast-off bronze, engrave runic in Splendid details, (The Isle  of the Dragon:1.2), (light particle:1.1), (shiny skin:1.2), (Blade  Runner 2049 movie:1.2),(game of thrones style:1.2),(studio ghibli anime  style:1.2), (depth of field:1.3), global illumination,  art by hoang lap  and fuji choko and artgerm and greg rutkowski and viktoria gavrilenko,

Negative prompt: 3D, cartoon, anime, illustration, multipanel

Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2952911054, Size:  768x768, Model hash: c51d7fde66, Model: Bastard_v5.9_LiveAction2, ENSD:  31337

This is the concept we are using as our training example:

ahegao, 1girl, solo, long_hair, blonde_hair, brown_eyes, pointy_ears, uvula, close-up

In theory, if we can get this face expression to come out, we can get any train any face expression, and maybe even any face in general. So lets try!

We train, then generate XYs.

These are Constant scheduler from 8, 16, 32, 64 network rank and alpha:

These are Cosine scheduler from 8, 16, 32, 64 network rank and alpha (Don't mind the washed out look, I forgot to place a VAE for this generation set):

Looking at Constant vs Cosine, it looks like cosine just has much more stable image transformation, doesn't it? It looks like constant images are just all over the place. I wonder if this is a seed thing, a prompt thing, or a training thing. I guess generating more and training more will reveal it.

It also looks like the lower rank and alpha at 8 is not worth it, at 16, there appears to be a good version at epoch 22 for constant, but besides that it doesn't look good, and same for 32, the feature seems to come out at the 20+ epochs at full strength, and barely at that. But they do appear it seems. So possibly 32 rank and alpha might be a minimum you want to train at. For 64, it looks like the feature we are looking for comes into effect starting from 0.6-0.7 range. Though by this time, it looks like constant varies wildly, where cosine is pretty consistent with what its generating and how its training is changing the image. Cosine appears to be more stable for training features.

Here are some additional training sessions I ran that played with a different configuration: Learning Rate.

I set the learning rate to 10x slower on all the parts:

  • Learning rate: 1e-5
  • Text Encoder learning rate: 5e-6
  • Unet learning rate: 1e-4

And I trained for 250 epochs, thinking 10x slower learning, so 10x longer to match the results. Lets see.

This first one is at rank and alpha 64 and Constant

This second one is at rank and alpha 128 and Constant

Now some things quickly stick out. For the given learning rate, it appears we have hit much stronger target feature generation much earlier in the training process, and the features are more uniform, perhaps a little too uniform. Signs of over training? We will have to dig deeper to find out.

Its also interesting to note that 64 vs 128 rank and alpha, 128 create a more flexible feature generation at later training epochs.


Here are the best ones from the XYs:

I turned these two specific strengths into Loras for you guys to play with. 

ahegao_v0.4.22.078.safetensors (top photo)
https://pixeldrain.com/u/616SWH1b 

ahegao_v1.4.22.09.safetensors (bottom photo)
https://pixeldrain.com/u/UqC2tF1C

I am not done yet with this, because its a small focused data set that is tagged well enough and I really want to see where the optimal training configurations are hidden.

Comments

No comments found for this post.