Enter “Teddy’s working on a new AI about the moon in the 1980s” into any recently released text-to-image generator, and in seconds, sophisticated software will produce a strangely consistent image.
Seemingly bound only by your imagination, this latest trend in synthetic media has delighted many, inspired others, and frightened some.
Google, the research company openAI and AI providers Stable AI once developed a text-to-image generator powerful enough that some observers are questioning whether in the future everyone will be able to trust the visual record.
As a computer scientist, who specializes in forensic imagingI’ve been thinking a lot about this technology: what it’s capable of, how to deploy each tool to the public, and what lessons can be learned as the technology continues its ballistic trajectory.
Although they digital precursor Beginning in 1997, the first composite images appeared on the scene just five years ago. In their original version, the so-called common adversary network (GAN) was the most common technique for compositing images of people, cats, landscapes, and anything else.
A GAN consists of two main parts: a generator and a discriminator. Each is a kind of large neural network, which is a collection of interconnected processors that closely resemble neurons.
Tasked with compositing an image of a person, the generator starts with a random type of pixel and passes this image to a discriminator to determine if it can distinguish the generated image from a real face. . If possible, the discriminator will provide feedback to the generator to modify some pixels and retry. These two systems pit against each other in an antagonistic loop. Finally, the discriminator is not capable of distinguishing the generated image from the real image.
convert text to image
Just as people began to grapple with the consequences of GAN-generated deep works—including videos showing someone doing or saying something they didn’t—a new competitor emerged at the scene. the scene: deep works that transform text into images.
In this latest version, a model is trained on a large set of images, each annotated with a short descriptive text. The model gradually corrupts each image until only image noise remains, then trains a neural network to invert this corruption. Repeating this process hundreds of millions of times, the model learns to convert pure noise into a consistent image from any annotation.
While the GAN is only capable of generating images of a general catalog, the text-to-image compositing tools are more powerful. They are capable of producing nearly any image, including those with complex and specific interactions between people and objects, such as “The President of the United States burns classified documents when sitting around a bonfire on the beach at sunset.”
OpenAI’s text-to-image generator, DALL-E, took the internet by storm when it was disclosure on January 5, 2021. The beta version of the tool has been released ready-made to 1 million users by July 20, 2022. Users around the world have found seemingly endless ways to prompt DALL-E, delivering interesting, weird and wonderful pictures.
However, many people, from computer scientists to legal scholars and regulators, have weighed in on the potential for misuse of the technology. fake deep yes used to create non-consensual pornography, engage in small- and large-scale fraud, and promote disinformation campaigns. These even more powerful image generators can add jet fuel to these misuses.
Three image generators, three different approaches
Aware of the potential for abuse, Google declined to release its text-to-image technology. OpenAI has taken a more open-minded but cautious approach, initially releasing its technology to only a few thousand users (myself included). They also put railings on permitted written reminders, including no nudity, hatred, violence, or identifiable people. Over time, OpenAI has expanded access, lowered some of the railings, and added more features, including the ability to modify semantics and retouch real photos.
Stabilizing AI has taken a different approach, opt-in full release about their stable diffusion without guardrail on what can be synthesized. Faced with concerns about potential abuse, company founder, Emad Mostaque, said, “Ultimately, it’s everyone’s responsibility to see if they’re ethical, moral, and legal in how they operate this technology. or not.”
However, the second version of Diffuse Stable removed the visibility of NSFW and child content because some users created child abuse images. In response to calls for censorship, Mostaque pointed out that because Stable Diffusion is open source, users free to re-add these features at their discretion.
The genie is out
Regardless of what you think of Google’s or OpenAI’s approach, the stable AI has made their decisions largely irrelevant. Immediately after Stabilizing AI open source announced, OpenAI has lowered their barriers to creating images of recognizable people. When it comes to this kind of shared technology, society is subject to reliance on the lowest common denominator—in this case, stable Artificial Intelligence.
Stable AI boasts that its open-minded approach beats powerful AI technology from the few, put it in the hands of many people. I suspect that few will be quick to applaud an infectious disease researcher publishing the recipe for a deadly airborne virus made from kitchen ingredients, arguing that the This news should be widely disseminated. Of course, image compositing doesn’t pose the same direct threat, but the continued erosion of trust has dire consequences, from people’s confidence in election results to how society behaves. response to global pandemics and climate change.
Going forward, I believe technologists will need to consider both the positives and negatives of their technology and develop mitigation strategies before predictable harms occur. I and other researchers will have to continue to develop forensic techniques to distinguish real images from fake ones. Regulators will have to start looking more seriously at how these technologies are being weaponized against individuals, societies, and democracies.
And people will have to learn to be more discerning and critical about how they consume information online.
quote: Text-to-Image AI: Powerful, easy-to-use technology for creating artwork—and counterfeits (2022, December 6) accessed December 6, 2022 from https://techxplore. com/news/2022-12-text-to-image-who-strong-easy-to-use-technology.html
This document is the subject for the collection of authors. Other than any fair dealing for private learning or research purposes, no part may be reproduced without written permission. The content provided is for informational purposes only.