Home Artists Posts Import Register

Content

Patreon exclusive posts index to find our scripts easily

Join discord to get help, chat, discuss and also tell me your discord username to get your special rank : SECourses Discord

Please also Star, Watch and Fork our Stable Diffusion & Generative AI  GitHub repository and join our Reddit subreddit

Tips And Golden Information For LoRA Training

  • Network Rank - Larger networks, all else equal, need a lower learning rate to be stable. This relationship seems to hold at scale, i.e LoRA usually need learning rates ~10x higher than the original model.

  • Network Alpha - Is literally just a scalar on the effective learning rate, but consequently any suggested learning rate from anyone else is completely meaningless unless they also provide this parameter and the rank. Your chosen learning rate is effectively multiplied by (alpha/rank) to get your "real" learning rate.

  • Optimizer - Valid learning rates are not compatible across optimizers. Different optimizers will require different learning rates. So Adafactor and AdamW will not be same.

  • Batch Size - Changing the batch size (or gradient accumulation steps, which acts effectively as a multiplier both on the batch size and time per optimizer update step in equal measure) decreases the overall gradient noise by getting a more representative sample of the dataset, and those less "noisy," more useful gradients allow you to use marginally higher learning rates. Adam mostly diminishes the effect of this, though. So higher Batch size will reduce your Learning rate impact. What this means is, if you get overtraining with batch size 1, you may not get overtraining with batch size 8 with same learning rate.

  • Precision - If you change the LoRA weight dtype from FP32, you will probably have to adjust the learning rate. BF16 has low precision and high range, and compared to FP16 or even FP32 will need a higher learning rate to get the update steps to actually do anything. So what this means is, learning rate will change according to the used precision. FP32, FP16 and BF16 will require different learning rates. Therefore, either follow my workflows exactly or you need to do more training experimentation.

As for identifying a learning rate:

There is no easy way to do this. All you can do is run trainings at various learning rates until it works. I like to sweep 1e-7, 1e-6, 1e-5, 1e-4, and 1e-3 first to see which is stable, and then go halfway between the two most stable results and repeat until I am satisfied. Some things to look out for when sweeping learning rates:

  • A learning rate that is too low will make little to no progress.

  • A learning rate that is too high will diverge, making oversaturated, ugly, or generally non-representative samples that do not appear to even be moving generally in the direction of your dataset.

  • There is a limit to what learning rates will allow your model to converge in a stable manner, and once you've identified it (ideally identified the learning rate that performs the best on a short test run), you'll have to run that learning rate and instead increase the length of the training until it converges at a result you think fits well enough. I usually aim 150 epochs for training users and this really generalize for everything.

Other notes you may find helpful:

  • If you aren't already, use min snr gamma. It's pretty much just free lunch, and using a value of 5 (default) or 1 (recommendation by birch-san for latent models like Stable Diffusion, stable in my own testing) will allow your training to converge faster.

  • Weight Decay (0.01) usually provides better results when doing DreamBooth / Fine-Tuning but it may be depended on used Optimizer. Give it a try too.

Comments

Anduvo

Are you researching already SD3 training ? Do you have any tips maybe already ? I am doing training tests with OneTrainer and so far with SD3 I haven't even reach SDXL quality . When doing LoRa or dreambooth, the benefits of the new VAE 16 channels are by default or we have to set up some parameters ?

Furkan Gözükara

yes SD3 is really more powerful but I am waiting OneTrainer or Kohya to merge it into main. they are still developing and still has errors

Manpreet Singh

Do you think personalized image generation using LoRA is better vs SDXL or SD1.5 dreambooth?