To me amongst many baffling concepts with the current stream of interest? concern? reaction? to generative AI, especially more recently weighted towards the systems that generate written content, how the concept of “training” confounds our experience of past understanding of reuse. Even more so as for the most part, we have no way of knowing how the content is actually created.
And more on a side of how we can use and help others use generative content, our notions of attribution and licensing are stretched.
It’s a good thing Creative Commons is around. In a series of posts on AI under the umbrella of the CC campaign for “better sharing”, I found more helpful insight from this current post by Stephen Wolfson which looks at the way fair use might be applied for AI content (though never as simple as a “rule”):
Among many good points, I want to key in parts where Stephen provides me better awarwness of what is going on behind the black curtain when Midjourney or Stable Diffusion spits out an image from a given text prompt. It gets interesting because there are no images in the LAION training dataset, just information about images and nothing seems copied from the original in a way we think of copying:
Stability AI used a dataset called LAION to train Stable Diffusion, but this dataset does not actually contain images. Instead, it contains over 5 billion weblinks to image-text pairs. Diffusion models like Stable Diffusion and Midjourney take these inputs, add “noise” to them, corrupting them, and then train neural networks to remove the corruption. The models then use another tool, called CLIP, to understand the relationship between the text and the associated images. Finally, they use what are called “latent spaces” to cluster together similar data. With these latent spaces, the models contain representations of what images are supposed to look like, based on the training data, and not copies of the images in their training data.
I am fuzzy on what “latent” spaces mean, but it feels like an effort to create a statistically similar result from a mixture of sources, not one (?).
Also
Turning back to fair use, this method of using image-text combinations to train the AI model has an inherently transformative purpose from the original images and should support a finding of fair use. While these images were originally created for their aesthetic value, their purpose for the AI model is only as data. For the AI, these image-text pairs are only representations of how text and images relate. What the images are does not matter for the model — they are only data to teach the model about statistical relationships between elements of the images and not pieces of art.
And again to help understand how that these systems are doing is so different from our conceptions of putting images together
The models do not store copies of the works in their datasets and they do not create collages from the images in its training data. Instead, they use the images only as long as they must for training.
I really encourage you to read the full post, and let us know if this adds clarity or raises more questions? It still is not certain if we can say with certainty that images one creates with Midjourney/Stable Diffusion truly fall under fair use (which always means leaving it to a court to arbitrate).
It feels right to me that this is the right approach, but it does not prelude the chance that something they create have a high degree of similarity to original works.
I am keeping tuned to the Creative Commons series on AI and urge anyone interested to weigh on their next series of public forums (happening in 3 time zones tomorrow).