Home Artists Posts Import Register

Downloads

Content

Project Description: 

In this video, I utilized artificial intelligence to generate an animated music video for the song Canvas by Resonate. This tool allows anyone to generate beautiful images using only text as the input. My question was, what if I used song lyrics as input to the AI, can I make perfect music synchronized videos automatically with the push of a button? Let me know how you think the AI did in this visual interpretation of the song.  

After getting caught up in the excitement around DALL·E2 (latest and greatest AI system, it's INSANE), I searched for any way I could use similar image generation for music synchronization. Since DALL·E2 is not available to the public yet, my search led me to VQGAN + CLIP (Vector Quantized Generative Adversarial Network and Contrastive Language–Image Pre-training), before settling more specifically on Disco Diffusion V5.2 Turbo. If you don't know what any of these words or acronyms mean, don't worry, I was just as confused when I first started learning about this technology. I believe we're reaching a turning point where entire industries are about to shift in reaction to this new process (which is essentially magic!). 

Important note:  While this AI is impressive, it still required additional input beyond just the song lyrics to achieve the music video I was looking for. For example, I added keyframes for camera motion throughout the generated world. These keyframes were manually synchronized to the beat by me. I also specified changes to the art style at different moments of the song. Since many of the lyrics are quite non-specific, even a human illustrator would have a hard time making visual representations. To make the lyrics more digestible by the AI, I sometimes modified the phrase to be more coherent, such as specifying a setting or atmosphere.   

This was my first time working with DDV5, and I'm very happy with the results! There were many times where my jaw dropped upon seeing what the AI came up with. I haven't felt this sense of wonder from technology since I first experienced a HD videogame as a child.

Things I wish I knew before getting started: 

1. This Disco diffusion cheat sheet is an absolute godsend. 

2. You don't need to have an expensive GPU to get started! Here is a link to a google collab where you can get free access to a GPU through google and run all code through a simple browser. Simple run the code blocks one by one and set the parameters to your desired values, following the above cheat sheet when you need further explanation. 

3. If you want to do animations longer than 15 seconds, you'll likely want to run the code locally on your machine. Google will eventually throttle your GPU usage if you are using it too much. Shoutout to MohamadZeina for this helpful repository. To run things locally, it's easiest to work in a linux environment. If you're on a windows machine, you can run WSL, or Windows Subsystem Linux. This will allow you to run a virtual Linux environment inside of windows 11. 

4. With WLS, you can avoid filling up your C:Drive by storing all the processed images on a separate drive. Here are the resources I used to mount a hard drive to my WSL.

- https://docs.microsoft.com/en-us/windows/wsl/wsl2-mount-disk 

- https://phoenixnap.com/kb/mount-ntfs-linux 

5. Flowframes is the best FPS multiplier I've found so far. I doubled the FPS from 15 to 30 for this video. 

FAQ: 

Q: How long did this video take to render?

A: This question varies based on your hardware and the multitude of parameter settings you've chosen for the render. I started out testing with a 1000 step count, which would have taken me about 8 days of compute 24/7 on my RTX 3090. I ended up going with a 250 step count, which took about 2 days to compute. 

Q: What <setting_name> value did you use in this video?

A: For your convenience, I've attached the full settings file of my final render for the video below as "CanvasFinalV2(0)_settings.txt". 


Files

I asked AI to make a Music Video... the results are trippy

In this video, I utilized artificial intelligence to generate an animated music video for the song Canvas by Resonate. This tool allows anyone to generate beautiful images using only text as the input. My question was, what if I used song lyrics as input to the AI, can I make perfect music synchronized videos automatically with the push of a button? Let me know how you think the AI did in this visual interpretation of the song. After getting caught up in the excitement around DALL·E2 (latest and greatest AI system, it's INSANE), I searched for any way I could use similar image generation for music synchronization. Since DALL·E2 is not available to the public yet, my search led me to VQGAN + CLIP (Vector Quantized Generative Adversarial Network and Contrastive Language–Image Pre-training), before settling more specifically on Disco Diffusion V5.2 Turbo. If you don't know what any of these words or acronyms mean, don't worry, I was just as confused when I first started learning about this technology. I believe we're reaching a turning point where entire industries are about to shift in reaction to this new process (which is essentially magic!). Important note: While this AI is impressive, it still required additional input beyond just the song lyrics to achieve the music video I was looking for. For example, I added keyframes for camera motion throughout the generated world. These keyframes were manually synchronized to the beat by me. I also specified changes to the art style at different moments of the song. Since many of the lyrics are quite non-specific, even a human illustrator would have a hard time making visual representations. To make the lyrics more digestible by the AI, I sometimes modified the phrase to be more coherent, such as specifying a setting or atmosphere. This was my first time working with DDV5, and I'm very happy with the results! There were many times where my jaw dropped upon seeing what the AI came up with. I haven't felt this sense of wonder from technology since I first experienced a HD videogame as a child. If you would like to learn more about how this video was made, try this yourself, or ask me any questions, I'll post a more detailed explanation of how to get started on Patreon (link below). The post is free to the public, no need to pay. If you do want to support me and become a member that would be much appreciated, you'll also automatically be entered into the end screen minigames where you earn points on each video and move up the leaderboard! Join on Patreon to automatically have your name included in the next video: https://www.patreon.com/doodlechaos Twitter: https://twitter.com/doodlechaos Discord: https://discord.com/invite/7FCrWAzDY7 Tiktok: https://www.tiktok.com/@doodlechaos Shorts Channel: https://www.youtube.com/channel/UCMqgJk1o2eWE7WeNtRIfnpg Instagram Shorts: https://www.instagram.com/doodlechaos_shorts/ Email: contact@doodlechaos.com Music: [Indie Dance] - Rezonate - Canvas [Monstercat EP Release] : https://www.youtube.com/watch?v=i0Ew3cl1gyc

Comments

Normi13

Hi, how do I turn the generated images into an animation. I am very new to Disco Diffusion. I would like to make a similiar style video.

doodlechaos

There are multiple ways to turn an image sequence into a video. An easy way would be to use video editing software like premiere pro and import the images in as a sequence. I like to use this command if you are familiar with ffmpeg: ffmpeg -r 60 -f image2 -s 1920x1080 -i pic%04d.png -vcodec libx264 -crf 25 -pix_fmt yuv420p test.mp4

Brian H

I have a question about aspect ratio. I see in the settings you generated the frames at 1216x640. I know DD requires multiples of 64, but I'm wondering if there is a particular reason why you chose an aspect ratio that's not 16:9, instead of something like 1024 x 576 for example. Since Youtube displays at 16:9 I'm curious how that worked when you uploaded the final video. Did you adjust the aspect ratio at some point in the process? Thanks!

doodlechaos

Hi Brian! There is no particular reason I ended up with that aspect ratio. You’re correct that it should be a multiple of 64. My later upscaling using flowframes normalized the aspect ratio to 16:9.