Home Artists Posts Import Register

Downloads

Content

Patreon exclusive posts index

Join discord and tell me your discord username to get a special rank : SECourses Discord

I have done new 22 trainings (i had done 150+ before) after the latest changes of the Kohya

23 May 2024 Update

  • For 8400 (200 repeat) steps training on RunPod, full configuration image : runpod_8400_Steps.png

  • For Windows only folder paths would change

21 May 2024 Update

  • 20 Example images shared here please consider to upvote : https://www.reddit.com/r/StableDiffusion/comments/1cwuxeb/newest_kohya_sdxl_dreambooth_hyper_parameter/

  • Extensively compared different LR rates for both Text Encoder and U-Net

  • This was mandatory since recent Kohya updates changed how LR is applied

  • The new config files are tested on Version 24.1.4 and commit id 5ed56238b2c5c93e1510876c20524d391793161d

  • My newest strategy is setting 200 repeat and 1 epoch and saving every n-steps checkpoints to compare

  • This way it is able to use maximum number of regularization images

  • If you don't use reg images, you can directly set number of epochs

  • I also have prepared 20 amazing prompts to compare checkpoints

  • I am using x/y/z checkpoint comparison and Prompt S/R

  • Check How-I-Do-Test.png to see how I am testing

  • Check test_prompts.txt to see prompts

  • You can use their Prompt S/R version from prompt_SR_test_prompts.txt

  • I am also sharing 2 comparison files (comparison 1 , comparison 2) with you so you can see different Learning rate values difference

  • Comparison files are huge 24346 x 22772 pixels

  • The naming of comparisons meanings are as follows : 8e_6_TE_2e_6 : 8e-06 (0.000008) is the U-Net learning rate and 2e-06 (0.000002) is the Text Encoder 1 learning rate. Text Encoder 2 never trained.

  • I have used RealVis XL version 4 for trainings : https://huggingface.co/SG161222/RealVisXL_V4.0/resolve/main/RealVisXL_V4.0.safetensors

  • You can also use SDXL Base but for realism I prefer RealVis XL 4

  • Tier1_24_GB_Slower.json config file uses 16.3 GB VRAM currently - same quality config on OneTrainer is only 10.3 GB since it has Fused Back pass : https://www.patreon.com/posts/96028218

  • Tier1_48_GB_Faster.json config is same quality as 24 GB but faster since doesn't use Gradient Checkpointing. Gradient checkpoint reduces VRAM usage but also makes it slower with same quality

  • Sadly there isn't anymore lower VRAM since both xFormers and SPDA attention didn't reduce VRAM usage

  • However you can still try on 12 GB card config : Tier2_LowVRAM.json

  • I strongly suggest you to use OneTrainer if you don't have 24 GB GPU

  • I personally find that 8e-06 for U-Net and 3e-06 for Text Encoder 1 as best for flexibility and resemblance

  • If you need more like anime 3d style, you can reduce learning rate to 6e-06 (0.000006) for both Text Encoder 1 and for U-Net and lose some resemblance but obtain more styling : click to download comparison

  • Currently save every n-steps is set to 451

  • You need to set it according to your total number of steps

  • So after setting your Pretrained model name or path,
    Trained Model output name, Image folder (containing training images subfolders), Output directory for trained model, Regularisation directory (Optional. containing regularisation images) click Print training command button and look for max_train_steps 

  • Then you can divide it to the number of checkpoints you want to obtain

  • In my case I had 15 training images, 150 repeat and use our very best ground truth regularization images with batch size 1 : https://www.patreon.com/posts/massive-4k-woman-87700469

  • So my max_train_steps was = 15 x 150 x 2 = 4500 steps

  • Therefore I made save every n epoch = 1, train epoch = 1 and save every n-steps = 451 - we add +1 so at last step we don't save twice, and thus I got 10 checkpoints

  • The experiments are made on below training dataset with 15 training images and 150 repeat and 1 epoch

  • I didn't use any captioning. Only ohwx man for training images from folder names and man for reg images from folder names

  • Hopefully will record a new tutorial for Kohya later for Windows, RunPod, Massed Compute and Kaggle

  • You can see image captioning effect comparison here : https://www.patreon.com/posts/compared-effect-101206188

  • You can also see different Text encoder Learning rates comparison by downloading this zip file : https://huggingface.co/MonsterMMORPG/SECourses/resolve/main/text_encoder_comparison.zip

  • I think best strategy would be training 2 models

  • 1st 8e-06 U-Net and 3e-06 Text Encoder 1 and second one is 6e-06 U-Net and 6e-06 Text Encoder 1 and then generate images on both of them and use the best ones. Since each model will perform best on individual prompts.

  • As a sampler I find that UniPC is best, Schedule type doesnt matter, 40 steps for generation and 70 steps for ADetailer with 0.5 denoise

20 Example Images Please Also Upvote : https://www.reddit.com/r/StableDiffusion/comments/1cwuxeb/newest_kohya_sdxl_dreambooth_hyper_parameter/


Files

Comments

Anonymous

Hi Furkan, this is great information as I'd been struggling to get Dreambooth to produce anything decent on SDXL. Would you suggest any major changes for training on a larger imageset besides the formula = train_imgs * 2(if class img used) * repeating_count * number_of_epochs? Keep up the great work!

Anonymous

How is realism of photos and consistency of person between photos of SDXL compared to dreambooth 1.5 stable diffusion training?

San Milano

Amazing work! I've been waiting for this, collecting images and I want to test it very soon

So Sha

Great result! So, how long did it take for you to train, and what was your hardware configuration? I tested your setup on 4090 24GB and 64GB of RAM. Here are the results: 135 hours without Transformers and gradient checkpoint. 19 hours with gradient checkpoint and Transformers. Screenshot: https://ibb.co/kKZW0nh It means I wasn't able to continue the training process. Therefore, we need to take hardware configurations into account when selecting our parameters.

Furkan Gözükara

unfortunately 4090 is having major problems. for example 3090 is getting 1 it / second when doing training with xformers enabled fast config. how much it / s you are getting? by the way 24 gb GPU has to use xformers. otherwise it will bottleneck the GPU VRAM. so with xformers enabled gradient checkpoint disabled 7200 steps taking 2 hours for RTX 3090

So Sha

For /it I sent you an screenshot through discord. For me it showed 15 hr at start but it took 2:30 hours finally. But didn't get good result.

Jorge Reverte Sevillano

Is there gonna be a step by step video on this? Because i feel absolutely lost 🙄

Anonymous

For the regularization images, how many should you strive for (with the 40 training images you mentioned). Also should the regularization images be 1024x1024 or can they be multi sizes.

Anonymous

P.s. - I don't understan 'The very best found command is as below' - do you use that in addition to the .json config file? or if you use the config file it is baked in??? (sorry i'm a dumb video editor)

Furkan Gözükara

1st just download the links i shared. they should be sufficient. number is calculated as number of repeat x your training images count. make all of your training images 1024x1024 then you can use 1024x1024 class images

Furkan Gözükara

you can ignore it. use the gradio interface training button. the gradio interface will execute same command as that for you. just make sure you have prepared your dataset folders properly via dataset preparation tab of kohya

Ec Jep

What platform do you recommend for performing 48gb training. Kaggle? I don't mind paying for a few trainings to get the quality.

Ec Jep

I tried doing kaggle 24gb DB training using your suggested parameters (bf16) but I get this error. Maybe I need runpod instead? "ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device."

Anonymous

I've noticed that you don't use bucketing in those configs, is there a reason?

Furkan Gözükara

since all of my images are 1024x1024. also bucketing system had errors in past so i dont enable. moreover bucketing causes more VRAM usage. but if you want to train different aspect ratios you have to enable bucketing which i dont suggest until you become experienced

Anonymous

Thanks, that is interesting. I've been using dreambooth quite a lot over the last few months and I usually use Bucketing. That said, I've noticed that some trainings just don't work as well as others and I can't always tell why. Is it possible that bucketing has something to do with it? I remember reading somewhere that if your buckets are unbalanced, some have just one image for example, then it can bias the training. Is that the kind of problem you are trying to avoid here?

Furkan Gözükara

yep 100% could be related to the bucketing system bug. even in 1 case different aspect ratios were causing training to be completely broken and error

Ec Jep

The kaggle options available on my free account are T4x2, P100 & VM v3-8. I was using T4x2

Anonymous

Where can I find regularisation images?

Anonymous

what ratio do you recommend for training images : class images when doing lora or dreambooth training of a character with unique token?

Furkan Gözükara

I suggest this. make your all training images 1024x1024. use the 1024x1024 class images. if man , ohwx man, if woman ohwx woman. so ohwx is rare token man or woman are class tokens

Keith F

if fine tuning on runpod, which gpu is best for the best configuration?

Anonymous

Thank you for answering. So I should always use all the class images you have provided on this page, regardless of how many training images I have for the subject? What I mean is like if I have 15 training images or 30 training images, I should always use all the 3000+ training images for dreambooth training?

Keith F

Thanks man, these jsons save so much time, appreciate your work

Anonymous

I've been trying to run this configs during the last week on py pc (Windows) and I had no success at all. Using the generated Loras has no effect at all. Using the same training images with but creating the configuration from zero using the kaggle training tutorial works perfectly (local or in kaggle). Can you spot the error in my configuration? accelerate launch --num_cpu_threads_per_process=4 "./sdxl_train_network.py" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --train_data_dir="E:/training2\img" --reg_data_dir="E:/training2\reg" --resolution="1024,1024" --output_dir="E:/training2\model" --logging_dir="E:/training2\log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --network_dim=8 --output_name="best_DreamBooth" --lr_scheduler_num_cycles="8" --no_half_vae --full_bf16 --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="12160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0 I'm running this in a Windows pc with a 4060 ti 16gb. Have tried to change bf16 to f16 with no success at all

Furkan Gözükara

because your config is wrong. you have --network_dim=8 which means you are training LoRA however you are using learning rate of DreamBooth :)

Furkan Gözükara

Use best_settings_24_gb_VRAM_config_no_xformers.json . if you get out of vram error or if it uses shared vram enable xformers too

Samuel

160 hours for 13K steps in a 3090... what am I doing wrong?

Furkan Gözükara

you pc must be using shared vram. so you must have a config error somewhere. are you on pc or runpod? if on runpod did you kill auto1111 web ui instance?

Anonymous

Thanks, I used these settings on lora with default LR and got flexible yet fairly accurate results. Right now I have 8GB VRAM so I cannot do dreambooth training yet.

Anonymous

thank you for this valuable information just have few questions How many images (minimum) I should provide? How long it takes to finsh the training? What if i wnat to enable the gradient checkpointing will it reduce the quality?

Furkan Gözükara

1 : between 10 to 20 2 : it depends on gpu and number of images. on windows i get like 1.5 second / it with rtx 3090 3 : quality will be same. it saves vram but slows down training

Anonymous

how about not using reg images? will it give realistic results?

JS

Very best one : 48_gb_VRAM_config_best.json What would be used to achieve 48gb Vram? I am using a 4090 what is the best config file for it? Also is this the latest method for training SDXL or is there another post that you are working on?

Furkan Gözükara

use this version. it is same as 48 gb a little bit slower best_settings_24_gb_VRAM_config_no_xformers.json

Furkan Gözükara

I am also working on text encoder trained version. currently we don't train it. I will update this post once I have even better config

JS

Is there a video or text tutorial on how to get this all to work with SDXL?

JS

I guess I will wait for the video you mention in this response on your YouTube channel: @SECourses 4 days ago thank you so much. the biggest speed up would come from higher batch size like 2 or 3 and lesser number of steps. both way things require a precisely found new learning rate. moreover I suggest you to try my dreambooth workflow which is 10x better literally in terms of quality and speed almost same : https://www.patreon.com/posts/very-best-for-of-89213064 - a video for this workflow coming soon hopefully

Anonymous

So I am running best_settings_24_gb_VRAM_config_no_xformers.json on dreambooth the only change I did was I edited 8 epochs to 1, this means I will only have one .safetensor model to train, do I lose anything lowering down this value? I did this because the training time was to dramatic, now with 1 epoch It is one hour. Second question would be I was monitoring GPU stats for this training and it rarely used it, always taking power of CPU and Memory, is this right? Thanks for the grind, excited about this upcoming dreambooth for sdxl tutorial. Cheers.

Anonymous

If I do 160 repeating count which I believe is equal to Repeats on Dataset preparation field, it just increases me the training to +100 hours. I have a Nvidia 4090 and I'm using best_settings_24_gb_VRAM_config_no_xformers, what suggestions would you give me? accelerate launch --num_cpu_threads_per_process=4 "./sdxl_train.py" --pretrained_model_name_or_path="C:/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1 .0.safetensors" --train_data_dir="C:/Users/alexr/Documents/Stable Diffusion/Models/Alexandre Belorio/Output-160\img" --reg_data_dir="C:/Users/alexr/Documents/Stable Diffusion/Models/Alexandre Belorio/Output-160\reg" --resolution="1024,1024" --output_dir="C:/Users/alexr/Documents/Stable Diffusion/Models/Alexandre Belorio/Output-160\model" --logging_dir="C:/Users/alexr/Documents/Stable Diffusion/Models/Alexandre Belorio/Output-160\log" --save_model_as=safetensors --full_bf16 --output_name="negao-160" --lr_scheduler_num_cycles="1" --max_data_loader_n_workers="0" --learning_rate="1e-05" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="4160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --bucket_no_upscale --noise_offset=0.0

Furkan Gözükara

that means your system is using shared RAM due to VRAM insufficiency. how much vram your computer is using when you don't do training? 4160 steps is normal count that you should train

Anonymous

"./sdxl_train.py" - where can I find this script?? Also, how many input images did you upload?

Furkan Gözükara

it is directly inside kohya folder : https://github.com/bmaltais/kohya_ss - please also watch this tutorial : https://youtu.be/sBFGitIvD2A

Doc Snyder

The config shows 4 epochs and your .txt tells 8 epochs? What is useful?

Anonymous

error: unrecognized arguments: --train_text_encoder Is there a newer kohya script that can handle this command?

Anonymous

when we should use the text encoder one and when not ?

Đạt Nguyễn

The formula = train_imgs * 2(if class img used) * repeating_count * number_of_epochs In my case : 13 * 2 * 40 * 4 = 4160 Sorry for not understanding, can you explain more clearly why it is multiplied by 2?

Anonymous

Is there a tutorial for using this to train dreambooth model using Kohya?

Hassan Alhassan

should we keep these the same ? Network Rank (Dimension) 8 Network Alpha 1

Furkan Gözükara

you can increase. like 32 64 128. as you increase it will learn subject more but the model general knowledge will get reduced. 8 is pretty low. try 32 first . by the way DreamBooth don't have rank. only LoRA has rank. Don't get confused. DreamBooth trains entire model and it is better .

JS

SECourses: Tutorials, Guides, Resources, Training, MidJourney, Voice Clone, TTS, ChatGPT, GPT, LLM, Scripts by Furkan Gözükara 1w Tutorial very soon hopefully Has this video been created yet?

Anonymous

I will ask the most innocent question of your entire channel, once I finish the training in runpod, how can we initialize the SD UI? In the future when we want to use our trained models, is it okay for us to use the webUI that runpod gives us by default in its SDxl template? Thank you so much !!

Anonymous

The accelerate command for 48 GB setup at the end doesn't work. It always says 0 train images and 0 reg images even when the directory has images. Please update the documentation. It is misleading.

Furkan Gözükara

Hello. you must have a config error. i used them so many times. can you show me your executed command and how did you setup your data folders and their paths? please watch this to understand : https://www.youtube.com/watch?v=EEV8RPohsbw

Anonymous

Yeah, it was an issue with me setting up the data folders. It works fine now. What is the repeat count ?

Furkan Gözükara

i used 40 repeat count and trained up to 8 checkpoints. then compared checkpoints and found that 4th checkpoint was best. used 13 training images.

Anonymous

Do you think this would work for multiple rare tokens in the same class? I have used your config from last month and also compared with LoRA, Dreambooth is still the best. Thanks for this update I will try it on my pod!

Ec Jep

Fantastic collection of parameters for everyone. Thank you very much for the research and sharing.

So Sha

Good Job Dr, I’ll try it and let you know the results. 👍

Anonymous

What's your recommendation for regularization images – how many, is there a particular set you've found best, captions or not?

Furkan Gözükara

captions still testing. i haven't made a new video about this yet but my newest suggestion will be make 1 epoch 200 repeat and get checkpoints based on number of steps

Anonymous

Thanks. What about the regularization set? Have you seen better results with AI generated images, as many seem to recommend, or real photos such as your "Massive 4K Resolution Woman & Man" set?

Furkan Gözükara

don't use AI generated images unless you have to. because the model was not trained with AI images but it was trained with real images

Anonymous

I tried this yesterday but the rare tokens all mixed in to the class token (man) and one of the rare tokens ended up dominant. It was possible to repair this with negative prompts but not to a high enough level. Very interesting outcome but obviously not the desired one!

Anonymous

I get CUDA out of memory using the A5000 on runpod if I use the 24gig, I have to remove the VAE and the additional parameters to get it to run. Any help to use properly?

Furkan Gözükara

you need to upload relauncher.py restart pod then before start training kill web ui with fuser -k 3000/tcp

Anonymous

hello sir, i appreciate all your work, but when I am training lora or dreambooth following your instructions, i am facing the problem of likeness, i am using training 12+ images and 2000+ reg images and i am using sdxl 1.0 model, do you have any tips to improve likeness ?

Anonymous

Hi Furkan, thanks for your work on his. What settings do you use when you are prompting your trained models? sampling method, step count, cfg scale. Are you changing any other settings in auto1111?

So Sha

Furkan, I wanted to setup your recommended text encoder for kohya on ubuntu, but I couldn't understand what should I do : Currently text encoder training of SDXL is only supported in sd-scripts-dev branch So open a cmd and do git pull in your Kohya GUI folder Then do git checkout sd-scripts-dev Then do another git pull Would you please explain this part more?

Furkan Gözükara

I prefer to use adetailer for faces, 40 steps, cfg 7 and DPM++ 2M SDE Karras sampler. no other settings are changed as for faces photo of ohwx man or photo of ohwx woman sorry for late reply

Furkan Gözükara

you just need to do basic git branch switch. i explained in this video thoroughly : https://youtu.be/kvxX6NrPtEk 18:08 How to switch to dev branch of Automatic1111 SD Web UI for SDXL TensorRT usage

Arcon Septim

Can you please make a full video tutorial with all the steps and in detail? Like from choosing best training images and then fine tuning afterwards, thaank you!

DAVID PEREZ

Great news, also if you could make a github update or and update on the Kohya kaggle patreon page would be awesome

Dallin Mackay

great params. have you tried training on finetuned base models? with loras I didn't find any advantage training on finetunes but I used them for inference for a guaranteed improvement in quality. but I figured DB might be different

Dallin Mackay

will do. also I tried extracting a Lora with your recent settings but I run out of memory. not sure if its ram or vram but i have 48gb and 24gb respectively. any idea?

Meito

even with lora extraction i get - Text encoder is same. Extract U-Net only.

Dallin Mackay

from "Extract Lora" tab on kohya, with 192 dim and alpha. And using a lower dim and alpha made no difference. I also get the same message Meito posted ^^ before it OOMs

Anonymous

Do you have regularization images of dogs?

Khoa Vo

I see that the number of epochs for the 24GB config is set to 8. Isn't that really high? I'm using around 15 images.

Furkan Gözükara

yes it is high. you don't have to train that much. usually 150 epoch is good. so if you make repeating 3 epochs. if you make 150 repeating 1 epoch.

Khoa Vo

Can you elaborate on that a bit more? If I have 15 images are you saying I should repeat 150 times and just do 1 epoch? Right now I am doing 15 images and 40 repeats and 8 epochs. Which takes quite a long time to train.

Furkan Gözükara

do like this. 200 repeat. 1 epoch. so it will make 3000*2 = 6000 steps. save checkpoints every 1201 steps. so you will get 5 checkpoints to compare

Anonymous

Hello, could you share settings, I mean a .json file but for stable diffusion 1.5 models too?

Anonymous

I appreciate all the work you've put into this! I'm trying to understand whether or not to use text encoders and unfortunately the links you have on your page are dead. Should I use them? Happy New Year! https://twitter.com/GozukaraFurkan/status/1710995764162216205 https://twitter.com/GozukaraFurkan/status/1720942143143895357 https://twitter.com/GozukaraFurkan/status/1721845175478083958

Furkan Gözükara

Hello. You should use text encoder it really does improve. I will remove them from the post and add notice. thank you for support

Felix Rockwell

In this video https://www.youtube.com/watch?v=EEV8RPohsbw you say: save the lora file into /workspace/stable-diffusion-webui/models/Stable-diffusion/model But the lora files are here: workspace/stable-diffusion-webui/models/Lora So I don't understand why are you not saving it directly to Lora folder?

Anonyme pas trop anonyme

Is there ANY parameter to make a way longer training and get better results ? if yes, what dataset size and how many repeats ?

Furkan Gözükara

There is no parameter. we are using very best ones. so to make it better you need to improve your training dataset. lets say you have collected 50 images then do this. 5200 / 50 = 104. so do 104 repeating and 2 epochs and save 10 checkpoints . to save 10 check points make 102 * 2 * 50 * 2 / 10 + 1 = 2041 . so save once every 2041 steps usually 200 repeating and 1 epoch is good. but you can train longer and compare more checkpoints

Anonymous

how many repats should i use if i have a dataset of 319 images. And how many is they are 100 images.? There is a formula or how i calculate how many repeats should i use, im using your reg images.

Furkan Gözükara

sadly there is no formula. but as the image count increases you need to reduce number of epochs. lets say 319 images and we have 5200 reg images. so make repeat 16, make 2 epoch training and save every 639 steps.

Anonymous

why 801? how do you calculate the num of steps? and why 2 epoch?

Furkan Gözükara

those comes with experience. you can also set other numbers. just do more experiments. i gave rough numbers with comparing my 15 images 150 repeat 1 epoch

Anonymous

mm im having trouble to decide how many should i choose as i have other dataset of 41 images and another of 50 images..how could i calculate the number of steps, at least approximately

Furkan Gözükara

hard to decide i agree. you can do more training and more frequent checkpoint saving to see where it starts overfitting

Anonymous

I tried every single step of your video from signing it up to your Patron and trying the json on here, to using only 14 images. My computer still says it's going to take 27 hours. I have a 4070ti. Do you have a possible resolution to this issue?

Furkan Gözükara

Hello. You have 12 GB GPU. Therefore SDXL DreamBooth will run very slow. I suggest you to try OneTrainer. It has better VRAM usage. We have config here : https://www.patreon.com/posts/96028218

Anonymous

Thank you. I actually have 16GB in my GPU, i'll try this method.

mike oxmaul

Have you done any trainings or other models? Instaed of the sdxl 1.0 or 0.9 base. I know we used to train on realistic vision when doing SD1.5. Looking at realism in particular.

Furkan Gözükara

yes i tried and they didn't perform better for realism for me so far. but for stylization i got better results

mike oxmaul

Are their particular models that are suitable for dreambooth training compared to others? What makes them suitable? I'm going to do some 48gb test encoder trainings today. I could test a specific model for you compared to base if you'd like.

Anonymous

Hi Furkan, with the latest khoya I am getting this error with your 24GB text encoder parameters: sdxl_train_network.py: error: unrecognized arguments: --train_text_encoder It seems they removed or changed something ? could you please take a look ?

Furkan Gözükara

hello. you are trying to train LoRA. This config is for DreamBooth. I just tested and verified and working

Anonymous

You are right! I was training a lora, sorry my bad for the false alarm. Thank you!

Anonymous

Hi Furkan, since kohya ss doesn't have an inpainting training, which other tool would you recomend to train an inpaint model ?

Anonymous

I see that you are not training the second text encoder. Can you let me know why?

Furkan Gözükara

I tested it thoroughly and it doesn't increase likeliness a lot but causes model to overfit. So model generalization degrades.

Anonymous

This is what I get when trying both the no text encoder and text encoder 24gb (3090) The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. usage: sdxl_train_network.py [-h] [--v2] [--v_parameterization] [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH] [--tokenizer_cache_dir TOKENIZER_CACHE_DIR] [--train_data_dir TRAIN_DATA_DIR] [--shuffle_caption] [--caption_separator CAPTION_SEPARATOR] [--caption_extension CAPTION_EXTENSION] [--caption_extention CAPTION_EXTENTION] [--keep_tokens KEEP_TOKENS] [--caption_prefix CAPTION_PREFIX] [--caption_suffix CAPTION_SUFFIX] [--color_aug] [--flip_aug] [--face_crop_aug_range FACE_CROP_AUG_RANGE] [--random_crop] [--debug_dataset] [--resolution RESOLUTION] [--cache_latents] [--vae_batch_size VAE_BATCH_SIZE] [--cache_latents_to_disk] [--enable_bucket] [--min_bucket_reso MIN_BUCKET_RESO] [--max_bucket_reso MAX_BUCKET_RESO] [--bucket_reso_steps BUCKET_RESO_STEPS] [--bucket_no_upscale] [--token_warmup_min TOKEN_WARMUP_MIN] [--token_warmup_step TOKEN_WARMUP_STEP] [--dataset_class DATASET_CLASS] [--caption_dropout_rate CAPTION_DROPOUT_RATE] [--caption_dropout_every_n_epochs CAPTION_DROPOUT_EVERY_N_EPOCHS] [--caption_tag_dropout_rate CAPTION_TAG_DROPOUT_RATE] [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [--dataset_repeats DATASET_REPEATS] [--output_dir OUTPUT_DIR] [--output_name OUTPUT_NAME] [--huggingface_repo_id HUGGINGFACE_REPO_ID] [--huggingface_repo_type HUGGINGFACE_REPO_TYPE] [--huggingface_path_in_repo HUGGINGFACE_PATH_IN_REPO] [--huggingface_token HUGGINGFACE_TOKEN] [--huggingface_repo_visibility HUGGINGFACE_REPO_VISIBILITY] [--save_state_to_huggingface] [--resume_from_huggingface] [--async_upload] [--save_precision {None,float,fp16,bf16}] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS] [--save_every_n_steps SAVE_EVERY_N_STEPS] [--save_n_epoch_ratio SAVE_N_EPOCH_RATIO] [--save_last_n_epochs SAVE_LAST_N_EPOCHS] [--save_last_n_epochs_state SAVE_LAST_N_EPOCHS_STATE] [--save_last_n_steps SAVE_LAST_N_STEPS] [--save_last_n_steps_state SAVE_LAST_N_STEPS_STATE] [--save_state] [--resume RESUME] [--train_batch_size TRAIN_BATCH_SIZE] [--max_token_length {None,150,225}] [--mem_eff_attn] [--xformers] [--sdpa] [--vae VAE] [--max_train_steps MAX_TRAIN_STEPS] [--max_train_epochs MAX_TRAIN_EPOCHS] [--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS] [--persistent_data_loader_workers] [--seed SEED] [--gradient_checkpointing] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--mixed_precision {no,fp16,bf16}] [--full_fp16] [--full_bf16] [--ddp_timeout DDP_TIMEOUT] [--clip_skip CLIP_SKIP] [--logging_dir LOGGING_DIR] [--log_with {tensorboard,wandb,all}] [--log_prefix LOG_PREFIX] [--log_tracker_name LOG_TRACKER_NAME] [--log_tracker_config LOG_TRACKER_CONFIG] [--wandb_api_key WANDB_API_KEY] [--noise_offset NOISE_OFFSET] [--multires_noise_iterations MULTIRES_NOISE_ITERATIONS] [--ip_noise_gamma IP_NOISE_GAMMA] [--multires_noise_discount MULTIRES_NOISE_DISCOUNT] [--adaptive_noise_scale ADAPTIVE_NOISE_SCALE] [--zero_terminal_snr] [--min_timestep MIN_TIMESTEP] [--max_timestep MAX_TIMESTEP] [--lowram] [--sample_every_n_steps SAMPLE_EVERY_N_STEPS] [--sample_every_n_epochs SAMPLE_EVERY_N_EPOCHS] [--sample_prompts SAMPLE_PROMPTS] [--sample_sampler {ddim,pndm,lms,euler,euler_a,heun,dpm_2,dpm_2_a,dpmsolver,dpmsolver++,dpmsingle,k_lms,k_euler,k_euler_a,k_dpm_2,k_dpm_2_a}] [--config_file CONFIG_FILE] [--output_config] [--metadata_title METADATA_TITLE] [--metadata_author METADATA_AUTHOR] [--metadata_description METADATA_DESCRIPTION] [--metadata_license METADATA_LICENSE] [--metadata_tags METADATA_TAGS] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--optimizer_type OPTIMIZER_TYPE] [--use_8bit_adam] [--use_lion_optimizer] [--learning_rate LEARNING_RATE] [--max_grad_norm MAX_GRAD_NORM] [--optimizer_args [OPTIMIZER_ARGS ...]] [--lr_scheduler_type LR_SCHEDULER_TYPE] [--lr_scheduler_args [LR_SCHEDULER_ARGS ...]] [--lr_scheduler LR_SCHEDULER] [--lr_warmup_steps LR_WARMUP_STEPS] [--lr_scheduler_num_cycles LR_SCHEDULER_NUM_CYCLES] [--lr_scheduler_power LR_SCHEDULER_POWER] [--dataset_config DATASET_CONFIG] [--min_snr_gamma MIN_SNR_GAMMA] [--scale_v_pred_loss_like_noise_pred] [--v_pred_like_loss V_PRED_LIKE_LOSS] [--debiased_estimation_loss] [--weighted_captions] [--no_metadata] [--save_model_as {None,ckpt,pt,safetensors}] [--unet_lr UNET_LR] [--text_encoder_lr TEXT_ENCODER_LR] [--network_weights NETWORK_WEIGHTS] [--network_module NETWORK_MODULE] [--network_dim NETWORK_DIM] [--network_alpha NETWORK_ALPHA] [--network_dropout NETWORK_DROPOUT] [--network_args [NETWORK_ARGS ...]] [--network_train_unet_only] [--network_train_text_encoder_only] [--training_comment TRAINING_COMMENT] [--dim_from_weights] [--scale_weight_norms SCALE_WEIGHT_NORMS] [--base_weights [BASE_WEIGHTS ...]] [--base_weights_multiplier [BASE_WEIGHTS_MULTIPLIER ...]] [--no_half_vae] [--cache_text_encoder_outputs] [--cache_text_encoder_outputs_to_disk] sdxl_train_network.py: error: unrecognized arguments: --train_text_encoder Traceback (most recent call last): File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe\__main__.py", line 7, in File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\thriv\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Users\\thriv\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', './sdxl_train_network.py', '--pretrained_model_name_or_path=E:/SD 1/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors', '--train_data_dir=E:/2.SD Tools/Fine Tuniing\\img', '--reg_data_dir=E:/2.SD Tools/Fine Tuniing\\reg', '--resolution=1024,1024', '--output_dir=E:/2.SD Tools/Fine Tuniing\\model', '--logging_dir=E:/2.SD Tools/Fine Tuniing\\log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--network_dim=8', '--output_name=Julie_Hadi_1.0XL', '--lr_scheduler_num_cycles=8', '--no_half_vae', '--full_bf16', '--learning_rate=1e-05', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=3120', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Adafactor', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--gradient_checkpointing', '--bucket_no_upscale', '--noise_offset=0.0', '--max_grad_norm=0.0', '--no_half_vae', '--train_text_encoder', '--vae=stabilityai/sdxl-vae']' returned non-zero exit status 2.

Anonymous

I am trying "How To Do Stable Diffusion XL (SDXL) DreamBooth Training (Full Fine Tuning) On Windows and RunPod" I am training with 8 (1024 x 1024) images, getting my regularisation images, with download_man_reg_imgs, and using 24GB_TextEncoder for my config file on a runpod RTX 3090 26 vCPU 93 GB RAM, with 150 GB Disk. But imaterial of the disk size I use (I have tried 10, 20, 50, 150), the training dies at 38% with this messsage, after running 6 epochs. shutil.Error: [('/workspace/train/DSC_0190.JPG', '/workspace/stable-diffusion-webui/models/Stable-diffusion/img/40_ohwx man/DSC_0190.JPG', '[Errno 122] Disk quota exceeded') Any feedback would be appreciated.

Furkan Gözükara

well that means you are out of storage. do this, 120 gb volume disk 200 repeat, 1 epoch, 8 images, save every 8 * 1 * 200 * 2 / 10 + 1 = 321 steps. it will save 10 checkpoints for you and it will work with our 24 GB text encoder config

Anonymous

Hi Sir, I would greatly appreciate guidance on fine tuning my steps epochs/repeats. Currently I've been doing my training on Kohya SS using prodigy following this guide here >> https://civitai.com/articles/3522/valstrixs-crash-course-guide-to-lora-and-lycoris-training I've been playing with a couple checkpoints EpicRealismXl V1 and NightVisionXL, (going for pure realism) do you recommend base sdxl? or go for a checkpoint already tailored for my use. Also I'm having trouble wrapping my head around steps ect and overbaking. My dataset consists of 179 photots of 1 subject (wife) I have a mixture of dynamic full body/half/closeupface. I was using WD14 captioning but as im typing this im using your personal tagger setup blip2 and the bg something lol from your yt video. Anyways do you recommend captions if i only have 1 subject, and also i downloaded your regurlization woman set both uncropped and 1536x1536. Appreciate any guidance! Thank you. "TLDR 179 images in dataset 1 subject, caption yes or no?, how many epoch and how repeat ex 20_woman ect what settings with your female dataset."

Furkan Gözükara

hello. you have too many images. i dont know their quality. if their quality is good and high then ok so here my suggestions. you should go step by step experiment and compare my first suggestion : pick 15 very good images like the dataset i have shown. you can even pick better one : use SDXL base 1.0 : train 150 repeat , 1 epoch, our regularization images : make all training images 1024x1024 so your training becomes 1024x1024 : use our 1024x1024 woman reg images dataset after you obtain some base results with this do this : use your 179 images dataset : make sure all cropped to 1024:1024 - we have auto cropper too : use repeating 29 : use our 1024x1024 reg images : train 2 epochs : save every : 179 * 29 * 2 / 10 + 1 = 1039 steps this will give you 10 checkpoints. compare them with x/y/z do not use captions. train with only ohwx woman after all these done let me know the results

Anonymous

ok ill do that, my training images are a mix of 768,768 and 1200x1800 minus some weird ones where i cropped any oddities, was reading through your pdf and downloaded both 24 gb text and no text config files for kohya. Thank you for the help, lora training on sdxl i have found to be trickier than 1.5.

Furkan Gözükara

ye lora harder. before using mixed resolutions with bucketing, crop all to 1024x1024 and do that way training. after that try different resolutions with bucketing and compare. so you will have baselines for each case

Anonymous

Thank you for your super quick response - I will try that. Thanks for all the videos and effort.

Anonymous

clarifying that the change where the changes need to be made ... where would the 321 steps go to? Dreambooth/Parameters/Basic - Epoch=1 Dreambooth/DataSet Preperation/Dreambooth/LoRA Preperation/Training Images Repeats= 200

Anonymous

Hello, which configuration should I use for the NVIDIA RTX A4000 -20GB VRAM and 64GB RAM, 16GB or 24GB? With TextEncoder and without it?

Anonymous

I get an image for a prompt like "photo of ohwx man", but for prompts below I do not get the character (after trying to generate 20-25 times). Is the issue with that my training images need to show enough to match the prompt? (like a full-body image). Also in your YT video there is a ADetailer section - does the model need to be downloaded into a certain directory for it to show up? "High-resolution, full-body photograph of an (ohwx man:1.1) suitable for a popular Instagram post etc. etc." or "photo of ohwx man walking in new york city, shot on Fujifilm Superia 400 ..."

Furkan Gözükara

that is accurate prompt. in after detailer use photo of ohwx man as prompt. also test just solo prompt ohwx man and see if you are getting your training images face

Arcon Septim

Is this getting outdated or still the best? ^^

Franco Acosta Diaz

Hi, I don't understand this, how do I implement it in Runpod?

Davit Sharian

Hi, if I have 30 images for training, do I need to change repeats count? or it will work for me with 40 repeats too?

Furkan Gözükara

well for 30 images i suggest this. do 150 repeat, 1 epoch and save every 30 * 150 * 2 / 10 +1 = 901 steps. so you will have 10 check points and you can compare all

Davit Sharian

train_imgs * 2(if class img used) * repeating_count * number_of_epochs according to your formula, is there any difference if I use 10 epochs with 15 repeats, and 1 epoch with 150 repeats? in two cases I'll have 9000 steps for 30 images

Davit Sharian

which one is better more epochs less repeats or more repeats 1 epoch, or they are the same, for the same steps count, thank you

Furkan Gözükara

it depends. if you are using our reg images, which are highest quality, more repeats better since it will use more variety of reg images. if you are not using reg images, then 1 repeat and more epochs better.

Anonymous

Any tips on the parameters to set when generating images with the trained model? I've imported the model into Automatic1111 but can't seem to get the quality and sharpness shown in the SDXL examples.

Furkan Gözükara

hello . sure. you can download images here. they have png info data. so you can use them in png info of automatic1111 and see all parameters : https://civitai.com/user/SECourses/images

Anonymous

Nice thank you, I've been using the juggernaut model for a while now and it's really high quality compared to these. Any ideas on how to get as good of quality in terms of realism. For example: https://civitai.com/images/7022338

Furkan Gözükara

it looks good but it gives the vibes of 3d render. currently i did training on RealVisXL V4 and testing the results in terms of realism

Anonymous

Do you think it makes sense to train on top of juggernaut with the same settings as in this post? Happy to try it out and share my results with the community, any tips before I start the training would be appreciated!

Anonymous

Hi, If I have 100 images for training, should I use all of your Reg images (5200 images) ? If I use less Reg images, should I use more Repeat for Reg images?

Anonymous

Looks like training other models isn't as good as the base, so will stick with the original method which is giving the best results.

Anonymous

When the training is done, should the final output model be used? or do we need to look at intermediate checkpoints to try and find the best model (like the xyz analysis in this post)

Furkan Gözükara

if you have 100 images make repeating 52. it will use all reg images. so it will be total 5200 steps * 2 = 10400. if you need more than 1 epoch you can increase count. to save checkpoints use save every N steps. like for 10400 use 1041 to save 10 checkponts

Furkan Gözükara

yes do x/y/z checkpoint comparison. that is the best way. the number of checkpoints you are going to get depends on you. my newest strategy of using higher repeat count and 1 epoch explained in this video : https://youtu.be/16-b1AjvyBE

Dallin Mackay

Could you in theory drop the second text encoder from an existing finetune when extracting lora from it by modifying extract_lora.py?

AmunRaw

Will you please update the x/twitter links in your post? They are no longer valid in this section. With the current volatility of your x hosted material it may be beneficial to use some other hosting or patreon directly: The text encoder on and off difference is huge .e.g : https://twitter.com/GozukaraFurkan/status/1721845175478083958 The text encoder enabled training uses same VRAM but a little bit slower : https://twitter.com/GozukaraFurkan/status/1720942143143895357

Kadir Nar

The links here are not working. https://huggingface.co/MonsterMMORPG/SECourses/resolve/main/comparison1.jpg https://huggingface.co/MonsterMMORPG/SECourses/resolve/main/comparison2.jpg

Kadir Nar

The image file may not open because it is large. These links may be correct. https://huggingface.co/MonsterMMORPG/SECourses/blob/main/comparison1.jpg https://huggingface.co/MonsterMMORPG/SECourses/blob/main/comparison2.jpg

Daniel Alderson Smith

all the links to the old one redirect here. i am still on the old kohya and need the old presets. can you post a link to those ones?

Furkan Gözükara

for old preset load this latest one. change unet learning rate to 1e-05 and Text Encoder-1 learning rate to 3e-06. it should work same. we dont train text encoder 2 so it is set to 0

Pew

Dr. Gözükara, pardon my forwardness here; the fact is I don't know what I'm doing and therefore follow your guidance with much success. I'm a fan and supporter, though I would like to ask the following. I've read many times that training over several epochs produces higher quality outcomes versus same steps over the course of a single epoch. Can you please speak to this? I understand with your process, you still arrive at 10 checkpoints, though is that at a reduced benefit to say actually training 10 full epochs set to a defined number of steps each? I look forward to your insight and commentary!

Furkan Gözükara

you need to understand logic of epoch and step. 1 step = 1 time GPU cycle. with batch size 1 : it process 1 image. 1 epoch = 1 time processing every image in the training dataset. so if you have total 100 images, 100 steps will get you 1 epoch i can give you private lecture if you need. so you can ask any questions

Jannik

Hello, can you make your 15 training images available? I just want to reproduce your results :) Thanks!

Furkan Gözükara

hello i cant make and it is not necessary either. i just trained a client and worked even better than my training because his dataset was better

Pew

Dr. Gözükara, do you happen to know when, date or version, that Kohya_ss made change to the Learning Rate and how it impacts training? I have tried looking at release notes, however, I don't see it mentioned and knew about this only when you brought it forward to the community. Context: I would like to download a prior version to archive and I'm not sure when the change happened. Thank you.