Home Artists Posts Import Register

Videos

  • input.mp4

Downloads

Content

Quick guide on how to do AI animation with AUTOMATIC1111. This is for stablediffusion 1.5 models (most common models). I haven't tried it on SDXL models (bigger models that require more resources) but it should work with those too.

The process is very simple, basically you install some extensions. Then when you want a gif or video, you enable that and when you hit Generate you get a gif/video. So don't let the wall of text discourage you, I just tried to cover all the stuff that might be useful.

OLDER POST: About AUTOMATIC1111 & ControlNet https://www.patreon.com/posts/ai-art-vam-to-80441941

STEP 1 - Install extensions

In AUTOMATIC1111 go to Extensions > Available and click the Load from button. From the list click on Install for the following:

  • sd-webui-animatediff
  • Deforum (it will appear as  deforum-for-automatic1111-webui after installed)


Wait for them to install (screen goes white for a few seconds and a message is printed above the table). Then go to Extensions > Installed and click Apply and restart UI


STEP 2 - Download an animation model

Download the file temporaldiff-v1-animatediff.safetensors

and put it in the AUTOMATIC1111 folder: extensions\sd-webui-animatediff\model and restart AUTOMATIC1111

STEP 3 - Generate

Now in the options in the left you'll see a new box named AnimateDiff:

You have to pick the motion module and enable Enabled AnimateDiff. Now when you hit generate, instead of generating one image, Automatic1111 will generate a GIF (or other formats you picked). For videos you'll very likely need to install ffmpeg (tutorial) if you don't have it already.

OPTIONS

You can read more about the AnimatedDiff extension on github. Particularily the WebUI parameters section. Here's a quick & rough explanation:

  • Number of frames and FPS - here you'll have to do a bit of math to set the duration you want. In my example I put 12 frames per second for a total of 60 frames, resulting in a 60/12= 5 second gif
  • Frame interpolation - when enabled (FILM), it will use the 2nd extension installed (deforum) to make the gif smoother and have more frames. Basically for each original frame it generates more frames, see the option below.
  • Interp X - For the Frame Interpolation option above, when enabled, controls how many frames should be generated out of the original frames. I set it to 2. So basically for each frame AnimateDiff generated, 2 frames were added in the final GIF. The animation will look more smooth but it will seem a bit slower sometimes. As mentioned above
  • In Settings > AnimateDiff enabling Calculate the optimal GIF palette should make the GIFs look better
  • In Settings > Face restoration you might want to increase or decrease that to get better face consistency. I haven't figured it out yet but I think if faces are small in the image, having face fix lower will result in more shapeshifter-like faces. I need to expirement more with it but letting you know about it.

Everything else is can be left as default and I think it's unlikely that you need to change it.

IMAGE SIZE & TESTS

The time it takes to generate the gif depends on the number of frames and your A1111 settings. I recommend starting for a first test with something like:

  • Size: 320 x 512
  • Number of frames: 16
  • FPS: 8
  • Frame interpolation: Off

to see how long it takes to generate on your PC. After that increase from there.

CONTROLNET

You can also give it a video as an input (drag it in the video source section). Now if you enable ControlNet, the generated GIF will get generated based on the input video, frame by frame. So it's important to give it small videos of a few seconds as input, maybe cropped in case the subject of the video is a person. If you have a lot of background and the person covers a small percentage of the video that might end up with the person in low quality.  You want the person to cover as much of the video as possible.

In the ControlNet box you don't have to add any image, just to enable it. The video from the Animatediff box will be used as input. In my youtube video above, you can see in the second & third example animations made using controlnet and vam video as input. I recorded a few seconds directly from VAM (using OBS) in portrait size.

When using input videos, you'll most likely need to optimize the videos first to get quicker output videos. One way to do it is to use ffmpeg (installation guide here, see first answer). FFMPEG is a library & command line utility that allows doing very advanced video conversion. Lots of software use it and you might have it installed already if you used my TextAudioTool, there it's required for speech recognition. Once you have FFMPEG installed, you can use one of my attached bat files to quickly convert videos. How it works:

  • put the .bat files in a folder where you have videos that you want to convert
  • run one of the .bat and it will convert all the videos in that folder to videos that have the height 512 px (you don't need large video resolution for controlnet)  and the framerate 24 FPS or 8 FPS. You can edit the files in a text editor, make your own variations with different framerates or sizes. The "-an" part removes the audio from the output.
  • 24 FPS is best for good quality movies, 8FPS is better suited for gifs and quicker generating

You can find attached an example of such an optimized video that I used for the gifs in the video. This one was at 24 fps.

I recommend testing ControlNet on a small video like my example first. I did a ControlNet test for this tutorial to show you my exact settings:

and it generated this after a few minutes:

So it's a bit slow! In retrospect I shouldn't have used 30 Sampling steps. Likely using 20 wouldn't have made a huge difference visually. Also I should have changed the Sampling method from Euler to DPM++  2M SDE Karras or other, for the images to get to look better after fewer steps. Also 24 FPS is A LOT for gifs, that's more better suited for movies. Doing something like 12 or 8 should look ok too and take only a fraction of that time. Using a LORA or adding a person name to the prompt should improve face consistency. With the prompt I used in this example, the AI model doesn't have a lot to work with and ends up shapeshifting lots of random faces. So there's plenty of room for improvement for both quality and time.

AI ART

You can get the AI to build very trippy dream-like videos by giving it a prompt and letting it generate TONS of frames (which might take forever, beware). Someone doing outrageously cool stuff like that for inspiration: https://www.youtube.com/watch?v=PkLSguDNGiw

For one of my basic test examples:

I used this prompt: renaissance painting, ancient rome, centurion, ancient greek, medieval

and set it to generate 36 frames at 12 FPS with FILM enabled (the interpolation X value set to 2), so about 60 frames in total. Since the prompt is very generic, the AI will end up 'hallucinating' a lot and built a sort of dream that in some ways reflect that AI model's "knowledge". So most models out there will drift towards porn here and there. If you don't want that, you can add "porn, nudity" to the negative prompt.

One trick to get the generated video to 'react' to a song is to make a music visualization video first like milkdrop effects in music players and use that video as input and enable ControlNet as per the example above with VAM. The generated animation i think should visually look a bit in sync with the music. I haven't tested it yet.

For people that want to go further with this, there's this extension prompt travel that allows setting different prompts at different frames, meaning you can make a video that starts like my example in acient rome and make it end up sci-fi, in space. I didn't test that yet though, it's a bit too complicated for me.

If you have any questions or need any help with this, just let me know!

Files

AI: making animation with AUTOMATIC1111

tutorial & more info at https://www.patreon.com/spqr_aeternum ----------- Music by INOSSI Listen: bit.ly/3mIA24Z Watch: youtu.be/mQLeMfRLIG8 https://soundcloud.com/inossi/awaken-free-download Awaken (Free download) By INOSSI is licensed under a Creative Commons License.

Comments

chris r

OMG this is SOOOOO F N cool, SPQR, the thing out of all your provided things I'm most excited about and extremely happy to be a supporter. Thank you for this. Going to work on getting this setup today!!!

chris r

Oh boy, right out of the gate I see I need help - is there a guide to install AUTOMATIC1111 ? Can we use this easy install method or does this need something else? https://www.youtube.com/watch?v=HK3pluCh9Cs

chris r

So sick of problems man. Right away after using this (https://www.youtube.com/watch?v=HK3pluCh9Cs) to install 111 I go to find the deforum-for-automatic1111-webui in the list, sd-webui-animatediff is there but no sign of deforum-for-automatic1111-webui - why is this **** so difficult - so now what @spqr ? My load from says this: https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui-extensions/master/index.json Should it be something else?

SPQRAeternum

Check just "Deforum". I don't know why but in some versions I think there's a UI bug regarding names. After you install it, in the installed tabl it will appear like deforum-for-automatic1111-webui instead, it's something mixed between extension name and extension pretty-name Yeah, this AI stuff is a bit messy. But it's developing extremely fast in real-time. Last year around this time we were just discovering that images are possible. Now it's the same with videos. Next end of the year we'll maybe have AI 3d stuff directly in VAM lol, spawning objects. It's all bleeding-edge and everyone contributing is focusing on the next step for their own experience. People don't get paid to do develop this so they don't care for company-like practices, making it accessible or a pleasant experience for the mass audience. So there's gonna be a bit of a learning curve and things breaking, that's the cost of moving so fast. But IMHO figuring it out it's way easier than figuring out basic VAM.