SD Character LoRA Training Tutorial (Part 03 - Image Gathering and Tagging) (Patreon)

Published:

2023-10-03 17:32:02

Imported:

2023-10

Content

INDEX:
SD Character LoRA Training Tutorial (Part 01 - Installation)
SD Character LoRA Training Tutorial (Part 02 - Preparation)
SD Character LoRA Training Tutorial (Part 04 - Testing)

This can be the fun part for some people or the most tedious one.

SUMMARY

1. Image Gathering
2. Image Tagging
3. Final Preparation and Adjustments

1. Image gathering

You need to gather images for your character. If you want to make it like me, you should only gather from the same source. In my case it is usually from the anime.

You need to gather good images with different lighting and expressions. Try to aim at 100 images, but in many cases, like this one, you will not be able to find 100 good images. It's better to have less images than having a lot of bad images.

Checklist that I usually do for images:

Only resolutions higher than my setup for training: (576px on the smallest side)
No other characters on the images. You should clean it in an editing software if needed. It's better if you don't have even a single hair string from another character, but sometimes it's unavoidable.
Try to pick simpler backgrounds. Having some backgrounds is not bad to help with the style though, but having a flat color background is better to train a character.
No weird impossible faces. Stable Diffusion will not make that meme face work and it will only hurt the model.
No weird impossible poses. Stable Diffusion can barely make hands, imagine something crazy with twisted arms.
No blurry/out of focus character.
No signatures. (You can clean or cut it)
Don't screenshot in too many different resolutions. Try fullscreen, square and some few variations if you need to cut something out.
Image should be JPG or PNG. No WEBP.

What to do if you are trying to make a character from old anime that has low resolution?

You can try to go to your stable diffusion web-ui. Extras tab and select R-ESRGAN 4x+ like this:

Set the resolution to a reasonable size, don't go overboard. If the image is 360x360 for example, you can use 3x on Resize value and it will generate a higher resolution image. Keep in mind that this is only to save you some trouble from very difficult characters. This is the last resource.

Okay, so you gathered all the images you could.

Check again to see if there isn't any weird mistake. Place them in a folder that you prefer for training. I usually have a "source" folder together with my other training folders so I can have a backup in there.

If you have ".JPG" and ".PNG" files in the same folder. DO NOT name them with the same name. You shouldn't have character.png and character.jpg in the same folder.

If you want to train a difficult character and you don't care about the style too much, you can search for other artists fanarts, but keep in mind that this will break the style. I will show 2 ways of training that one of them will also break the style in favor of flexibility. So you should use that one in this case, since style isn't so important.

2. Image Tagging

Now let's get back to kohya ss.

Access the Utilities tab. We will only care for 2 tabs in here. Basic Captioning and WD14 Captioning.

We will now tag our images for training. It will have description similar to what you use to generate an image. Like:

1girl, green hair, medium hair, breasts, yellow shirt...

The Basic Captioning tab will only be used to correct small things at the end of the process. Sometimes it's not even used.

Select the WD14 Captioning tab.

The interface here is simple.

Image folder to caption. Select the folder with your images by clicking on the folder icon.

The rest of the settings you can leave as it is, the only change is:

These settings will tell the AI how it should automatically tag your images. So for characters is a good idea to increase the character threshold a bit and reduce the general threshold. It will give you a bit more work to exclude useless tags, but it will help you more in the detail catch.

Okay now you COULD click Caption Images. But I have a suggestion of undesired tags and prefixes to be applied already. I will explain how tagging works before that.

For example, I'm training Lily Enstomach from One Piece.
It was requested that it should be flexible, so more description is good, but there are some things that gives this character her own characteristics. She has a green/aqua hair, blue eyes, dark-skin and lipstick. She also has a low-tied hair that can be considered her characteristic, but I will consider it to be something that is not natural. The rest of the things are clothing which can be changed, so it's not natural.

Okay, after guessing what is natural and what is not, I should type them in the UNDESIRED TAGS field:

IMPORTANT NOTE: Leave no spaces between commas and words.

These are tags that shouldn't be on the model, because when you call it, it should always have them, right? It's basically creating a package. Now we should add a few more things to avoid errors and uncessary tags. I'll list them here for you to copy, they will be useful in other cases.

1girl,1boy,2girls,2boys,multiple girls,multiple boys,meme,parody,style parody,reference,scene reference,anime coloring,anime screencap

The reason why I list 1girl here is to save some time, because every image will have 1girl, so we will add it together with lily enstomach tag. So in the Prefix to add to WD14 caption we should add the character tag and 1girl. It should have any other tag that you have customized, but in this case it's just these. I want to call her lily enstomach, if she had the same name as other famous character then I suggest you change it. For example, if her name "naruto" I suggest you use a different tag like "nrto" or something like that.

So we use: lily enstomach, 1girl,

NOTE: This time we should use spaces and a comma at the end of the tags (leave no spaces at the end). The reason for that is because of how the software works to tag and remove tags.

There are other softwares that you can use to do this, but again, I'm telling you how I do it.

Ok, now click on CAPTION IMAGES. This may take a while, depending on your PC specs and how many images you have. You may watch the progress in your CMD window like this:

Your computer will freeze a little bit, so don't do anything heavy with it.

After it's done it will show a lot of the tags that were added and their frequency.

Your folder will also have a lot of text files.

Okay, now we should do some hard-work here and clean the wrong or useless tags.
The first thing to pay attention is the style tags. I prefer to have a bit of style to the character, so I remove the tags and it will be "pruned" with the character too. Basically everything that we don't describe it's attached to the character. The mistakes will not be included because the AI will not pay attention to it if we don't describe it. It will pay attention to things that are more repetitive.

So I will remove tags like, official style, 1980s (style), digimon (creature) and any other weird thing. Also if it has any color related to the character. For example if it says "purple eyes" or something like that you should also remove it.

The way I remove a lot of the tags is by captioning again, but this time I include what I don't want in the undesired tags field. I added these tags to it:

retro artstyle,1980s (style),blurry,digimon (creature),male focus

Then Caption Again.

I left a mistake on purpose. It's the tag "blue hair". We should fix this using basic captioning tab.

On basic captioning to fix this it needs to be exactly as it's written. So it's SPACE+blue hair, yes with the comma. Replacement text should be empty so it will only erase the tag.

Mark this option or it will not work. Then click Caption images. It will be super fast.
Congrats, we are done tagging. We can come back here if we find any problem later.
Let's go and make the final adjustments for our model.

3. Final Preparation and adjustments

Now we should think about how many steps our model will have for training and what style of training we should do.

As from experience a model is good from 1800-3600 steps. Sometimes more than that, but almost never less than 1600.

How to know how many steps your model will have?
The answer is a formula. Number of images x Number of Epochs x Number of repetitions

You can set how many repetitions your model will have by each epoch by typing the number on the folder's name. For example:

You should name your folder with a number underscore name of the subject. So if we have 35 images, 10 repetitions and 10 epochs it will have a total of 3500 steps. You should copy this folder to your image_dir. Like this:

After the training would be complete, we would have a total of 10 models to test, because we are saving 1 model every 1 epoch. The reason for that is so we can test multiple models and see at which point the training is good and discard when it's too strong or too weak. You can have a better choice if you increase the number of epochs and reduce the number of repetitions. The ideal number of repetitions per epoch would be 1, but that's too crazy for testing.
As a rule of thumb I try to leave from 200-800 steps per epoch. Ideally something close to 300 or 600.

I also don't like to have too few steps for complex characters. So I usually aim at 5400 steps+- with the highest epochs. So for this character we could try to name the folder as 15_lily_enstomach or increase the number of epochs to 15. I prefer to have more epochs. So I will increase the number of epochs in the parameters.
Like this:

Now before finally starting the training we have two types of training with this method.

Regularized training or Non-Regularized training.

Sometimes Regularized training can save you and sometimes it can cause more problems. If you want more flexibility you should try regularized training before the other.
If you want more stylized training you should try with no regularization.

What is this?

It's comparison that the training will do with your training images and other images that are generated from the base model.
For example, you can compare 1girl from NovelAI and this 1girl from the training. It will help the model understand the differences. However, the style will be affected. It's good too because it will take longer to overfit, so it's a safer training, but again, if you need style, don't do it this way.

This is how it should be in class_dir, there will be a folder with the topic you want to train. This is 1_1girl, which means repeat 1 time and the prompt of the image is 1girl.

This folder has over 2500 images which I generated from NAI with the tag 1girl and no other thing included. You can make your own, by going to your stable diffusion web-ui and generating a lot of images 512x512 resolution with the positive prompt 1girl and empty negative prompt.
If you are using AnyLoRA checkpoint, be sure to generate them with it instead of NAI.

I will use this method of training instead of non-regularized. Because it was requested to be more flexible.
So if you are using regularization images, you should reduce your steps multiplication by half, because it will double the total. So I was aiming at 5400, I should reduce to 2700. An easy way to calculate how many repetitions you need is by dividing 2700 with 35 (number of images). Which is 77~. So if we want 15 epochs, we should have 5-6 repetitions.

There is another thing. If you are using regularization images you should name your folder this way: 5_lily_enstomach 1girl

Yes SPACE 1girl at the end. If you were training a boy you could use space 1boy and use 1boy for the regularization folder.

If you are NOT using regularization, don't name it like this and remove the adress from Regularization field on Folder tab, this one:

Yes, remove the adress and leave it empty.

Now the last thing, we should name our model properly.
There is a common sense that you should name it the same as the prompt you're using to activate, so it's easier to know. My models are usually artkoikoi_prompt, but for this example I'll do it without the artkoikoi.

Now we should finally hit the training button. Be sure to check if everything is set, like base model, parameters, folders, etc.

BE SURE TO BE ON LORA TAB

This will start the training process, you can check it on your CMD window.

If you did something wrong and want to interrupt the training you can click this button:

This is how it will look if things are working correctly:

After the process is done you will have your epochs saved in output_dir like this:

Now for testing I'll do a Part 04.

Part 04 Link HERE