Turn Noise into Butterflies

Noise! It’s all around us—static, random bits of information floating across the Earth, colliding, separating, and reforming. Our atmosphere creates chaotic radio symphonies as the sun’s solar radiation dance across the ionosphere. Beyond the shell of our crystal blue globe, our galaxy hisses with low-level radioactivity, silently bombarding us with its celestial signal. And just outside the milky arms of our galactic mother, a low-level cosmic radiation sings an unending anthem about the birth of all creation. The universe has a dial tone.

Growing up, I recall watching TV via an aerial antenna. Often, many of the channels would have static—a snowy, gritty, confusing wash that would show up in waves. At times, it would completely take over the TV show you were watching, and all you’d get was a screen full of static. To get a good picture, you needed a strong signal. Otherwise, the picture was buried in the noise.

This past weekend, I started building my own AI diffusion models. I wanted to see how to train an AI to create images from nothing. Well, it doesn’t work. It can’t create anything from a blank sheet. It needs noise. No joke! Turn up the static! I discovered that the way to create an AI model that generates images is to feed it noise. A lot of noise, as a matter of fact!

In a recent talk, “GenAI Large Language Models – How Do They Work?”, I covered how we use the science behind biological neurons to create mathematical models that we can train. Fundamentally, these are signal processors with inputs and outputs. Weights are connected to the input, amplifying, or attenuating the signal before the neuron determines if it should pass it along to other connected neurons (the nerdy name for that is the activation function).

One technique we use to train neural networks is called backpropagation. Essentially, we create a training set that includes input and output target data. The input is fed into the model, and we measure the output. The difference between what we wanted to see and what we actually got is called the “loss.” (I often thought it should be called the “miss,” but I digress.) Since the neural network is a sequence of math functions, we can create a “loss function” with respect to each neural connection in the network. We can mathematically determine how the parameters of each neuron reduce the loss. In mathematical language, we use this derivative to compute the slope or “gradient.” To force the network to “learn,” we backpropagate a tiny learning rate that adjusts each parameter using its gradient, slowly edging the model toward producing the correct output for a given input. This is called gradient descent.

Who cares? I’m sorry, I got lost in the math for a minute there. Basically, it turns out that to create a model to generate images, what you really want is a model that knows how to take a noisy image and make it clean. So, you feed it an image of a butterfly with a little bit of noise (let’s call that image “a”). It learns how to de-noise that image. You then give it an even noisier image of the butterfly (image “b”) and teach it to turn it into the less noisy one (image “a”). You keep adding noise until you arrive at a screen full of static. By doing that with multiple images, the model learns how images should be created. From its standpoint, all creation comes from noise, and it’s ready to create!

I took 1,000 images of butterflies from the Smithsonian butterfly dataset and trained a model using the diffusion method (see https://arxiv.org/abs/2006.11239). I ran those through an image pipeline that added different levels of noise and then used that dataset to train the model. After running the training set through four training iterations, this is what it thought butterflies looked like:

Yes, a work of art. I confess, my 3-year-old self probably made butterflies like that too. But after running it through 60 iterations, about 30 minutes later on a 3090 GPU, the model had a slightly better understanding of butterflies. Here’s the result:

Yes, those are much better. Not perfect, but they’re improving.

Well, there you have it folks—we just turned noise into butterflies. Imagine what else we could do?!