OpenAI, the San Francisco-based company best known for its massive GPT-3 natural language model, announced on Wednesday that it is releasing a second version of its text-to-image AI model.
Like its predecessor, the new DALL-E 2 is a neural network that creates images based on natural language phrases entered by the user. But during the original DALL EThe images from were low-resolution and conceptually simple, the images generated by DALL-E 2 are five times more realistic and accurate, OpenAI researchers say Fast company. Also, the second DALL-E is actually a smaller neural network. (OpenAI declined to specify the dimensions of DALL-E 2 in parameters.)
DALL-E 2 is also a multimodal neural network, meaning it can process both natural language and visual images. For example, you can show the model two different images and ask her to create images that combine aspects of the source images in different ways.
And the creativity The system appears to be showing up, well, a little disconcerting. During a demonstration on Monday, DALL-E was given two pictures – one that looked like street art, the other something akin to art deco. It quickly created a set of about 20 images, arranged in a grid, each distinct from its neighbor. The system combined different visual aspects of the source images in different ways. In some it appeared to allow the dominant style in one source image to be fully expressed while suppressing the style of the other. Taken together, the new images had a design language distinct from that of the source images.
“It’s really fascinating to see how these images are generated using mathematics,” says Prafulla Dhariwal, a researcher in OpenAI algorithms. “And it’s very beautiful.”
OpenAI engineers have taken pains to explain the steps they take to prevent the model from creating unwanted or harmful images. They removed any images with nudity, violence, or blood from the training dataset, says OpenAI researcher Mark Chen. Without this, Chen says it’s “extremely unlikely” that DALL-E would inadvertently produce such things. People at OpenAI will also monitor the images users create with DALL-E. “Adult, violent or political content is not allowed on the platform,” says Chen.
OpenAI plans to gradually expand access to the new model to groups of “trusted” users. “Ultimately, we hope to be able to offer access to DALL-E 2 via an API [application programming interface]’ says Dhariwal. Developers can then create their own apps based on the AI model.
Looking at practical applications of the model, Dhariwal and Chen both envision DALL-E 2 being helpful for graphic designers who could use the tool to open up new creative avenues. And the developers who eventually access DALL-E 2 through the API are likely to find new and novel uses for the technology.
Chen says DALL-E 2 could be an important tool because while creating speech feels natural to humans, creating images isn’t quite as easy.
But DALL-E 2 is worth forgoing immediate practical application at all. As a multimodal AI, it has fundamental research value that may benefit other AI systems in the years to come.
“Vision and language are both key elements of human intelligence; Building models like DALL-E 2 connect these two domains,” says Dhariwal. “It’s a very important step for us as we try to teach machines to perceive the world the way humans do and then eventually develop general intelligence.”
https://www.fastcompany.com/90738554/openais-dall-e-ai-is-becoming-a-scary-good-graphic-artist?partner=feedburner&utm_source=feedburner&utm_medium=feed&utm_campaign=feedburner+fastcompany&utm_content=feedburner OpenAI’s DALL-E AI is shaping up to be a scary good graphic artist