Hello~ 沙丘研究所（Dunes Workshop）是一个独立的线上原创内容发布平台，内容有关于城市、建筑、文学、艺术以及留学生活，成员来自哈佛大学设计学院以及麻省理工大学设计学院。同样，欢迎关注我们的微信公众号“沙丘研究所”，第一手的推送内容会发布在这里；以及 Instagram账号@dunes.workshop 第一手的图像/视频内容将会发布在这里。

"A lemon resting on the beach with sunglasses" - artificial intelligence please create

Jul 16, 2022

Artificial, is it still imagination?

Let’s talk about the recent boom in artificial intelligence (AI) art creation.

At the end of May (and later June) at AI Factory AI International Forum, another important event happened - two major AI image generation software, DALL·E·2 and Midjourney , both began to open beta internal invites . Dune members were also invited by Midjourney to enter the beta version of the discord community, where they were able to observe the generation, screening and adjustment of countless images, and also tried their own input prompts to generate AI images.

In our expectation, we thought that the AI image generation interface was a simple prompt word input box plus an image generation page - similar to the Google image search page, except that "search results" was replaced by "generated results". However, the actual situation is that all new testers invited by Midjourney will join a discord community, which is further subdivided into fifty "new groups". When a new person joins, Midjourney's robot (bot) will first automatically send a message in the "notice group", assigning a new person to the new group No. XX.

In this "group chat" mechanism, the user will enter prompt words in the appropriate format - such as "a lemon wearing sunglasses, resting on the beach, photorealistic style", and the robot will respond after about a minute , reply to the four AI images generated according to the prompt words in the group chat interface, and mention (@) the new person in the new message. Notably, this means that all user-requested images—whether it’s a prompt word entered, or an image generated—will be visible to everyone.

Screenshot of Midjourney's Discord community. On the left are the different channels of the new crowd, the image shown on the right has the word "fireman, 1970s Polaroid style", the U1 below the image stands for upscale the first picture, and VI stands for the first picture The graph makes further variations, and so on. Source: Author.

On this basis, the user can further select the four obtained images, and request to make other variants (variation) to one or several of them, or to enlarge the size and increase the resolution (upscale). Interestingly, because all these steps are in a group chat interface, all users can select images requested by other users, and the robot will respond to these requests one by one and post them in the group chat.

We are very interested in this form of interaction/organization that the Midjourney team has chosen. I have to admit that fifty groups of continuously rolling new news are very impactful. The huge amount of information and the ever-increasing accumulation rate are destined to be no single human brain can keep up-the beginning of such a mechanism It also makes the newcomer a little dizzy. But after getting used to it, we also probably appreciate the beauty of this format - we feel like we are in the middle of a huge experimental public art project , which is a single-point, individual user-centered interface (such as Google Image search box) incomparable.

Same cue: "A lemon with sunglasses, resting on the beach, photorealistic style." Midjourney's generated image on the left, DALL·E·2 on the right. Source: MattVideoProductions.

First of all, this volume of images that keep rolling and rushing towards people like a flood or snowball may also be an important feature that artificial intelligence art wants to convey to us - no human artist or team of human artists can Responding to "customer" requests in such a large and fast manner, and constantly producing different variants, further modifying and expanding, 24 hours a day.

Secondly, the mechanism of this group chat also makes the identities of inputrs, viewers and AI robots unprecedentedly equal, and the boundaries are blurred. There is no binary opposition between the author and the audience here, and the authorship seems to be out of the question-whose work is a stunning picture? Is it the original prompt word inputter? Is it an AI robot? Are you an algorithm engineer on the Midjourney team? Is it another user who helped choose a variant or asked for an increase in size halfway through? This is a process of multi-party collaboration and decentralization.

Third, each user keeps seeing other users' prompt words and new AI-generated graphs, which also constitutes a seminar-style learning from others how to input prompt words better and more creatively. place. In addition, when you see images that other people ask for, and filter them, you are essentially helping the Midjourney team voluntarily train their algorithms. These also raise questions that have never existed in the era of human artists. In the back-and-forth communication of AI creation, who is the real beneficiary? Between architects, inputters, filters, audiences, and machines, who is training whom and who is learning from whom?

Prompt word: "A Japanese woman sitting on a tatami mat, photorealistic style." Generated image by Midjourney. Source: Author.

In fact, these issues were also mentioned in the 2022 International Forum on Art and Artificial Intelligence at Aichang. We thought it was a great opportunity and time to write our own thoughts. Aichang's forum is themed on "artificial imagination", and guests from the fields of art, design, literature, computer science and philosophy share and discuss this topic (for specific information about the forum, click here to jump) . The Dune Institute was also invited to participate as a special observer. However, as listed above, we don't have a manifesto view on this, but rather want to share some of what we're thinking about in question form.

After trying the internal test of AI images, the members of Dune and our friends at Media Lab sincerely sighed: The impact of such a technological revolution on images and creation may not be less than the impact of photography on painting a hundred years ago. impact. As Benjamin quotes Paul Valery at the beginning of his famous work The Work of Art in the Age of Mechanical Reproduction:

The great technological innovations that the world is developing will change the whole art of expression, which will certainly affect the making of art itself , and ultimately, perhaps in the most fascinating way, the very concept of art itself .

For Benjamin, the rise of cinema at the time made art no longer a collectible from the masses, because it was by its very nature. Today, artificial intelligence art platforms seem to make everyone a creator. On the other hand, the redefinition of the image seems to further reshape our essential relationship with the world, after all, vision is the (most) main channel through which humans perceive the world. Just as the location of the "camera" in a film creates a whole new way for the viewer to observe and empathize, the artificial intelligence in artificial intelligence art also seems to provide us with a way of thinking that is different from human creation.

01 Are imagination and creativity unique to humans?

For many people, the words "artificial" and "imagination" are destined to be a set of contradictions; "artificial imagination" simply cannot exist, and there is no room for comparison and discussion. The term "artificial" refers to "artificial" and "artifacts" as opposed to the imagination that seems to be innate, "natural" rather than "manufactured". In addition, imagination is often thought of as a uniquely human ability that distinguishes us from other non-human "things" - be it animals and plants in nature, organic and inorganic things, or tools and machines Various artifacts.

This dominant view is especially prized by anthropocentrism, because people acquire subjectivity through this unique creativity. In both the Renaissance and heroic modernism, we can see many "standalone geniuses". These artists, architects, and writers are widely known for their aura of genius that distinguishes them from their collaborators in their creations and lives, and their creative power is mysterious (or arguably sacred) - later generations study their lives, works , creative process and techniques, but their imagination and creativity are transcendental or transcendental, such ability is like a divine descent, only belonging to oneself; this mysterious black box cannot be penetrated by others, let alone copied. Because of this, these talented creators, as individuals, are separated from the rest of their contemporaries, like "existing alone".

Prompt word: "A lemur in a constellation map" Midjourney's generated image. Source: Author.

However, both Object-Oriented Ontology and posthumanist art, design, literary practice and philosophical research challenge this anthropocentric view. In the forum, the guests also criticized and thought about this concept from different aspects. For example, in Xu Yu’s sharing, by interpreting Kant, he emphasized that “imagination” itself has an “artificial” component, because the process of image formation always needs to involve artificial systems such as “symbols”; and Joanna Ze Linska also cites post-humanist scholar Claire Colebrook to criticize the idea of humans as the only creators of art.

This question is not only at the heart of understanding artificial imagination, but also a further reflection on human imagination. Joanna Zelinska shared the images drawn by the "Senseless Drawing Bot" designed by Japanese designers Kannosuo and Yamaguchi Takahiro - these images are like children's doodles, It also has a highly similar quality to the art of Jackson Pollock and Cy Twombly.

For Joanna Zelinska, rather than seeing this work as a parody of human graffiti, it may be understood as a rethinking of human creativity - and perhaps human creativity is not From human intellect and subjectivity. All these make the proposition that "imagination is natural and not artificial" becomes unstable.

"Senseless Drawing Bot" by Sugano and Yamaguchi. Source: Yohei Yamakami 2011.

Cy Twombly's "Bacchus" series (2005), art critic Arthur Danto called these paintings "Dionysian orgies", a state of intoxication that only a god can achieve. Credit: Rob McKeever/Gagosian Gallery.

02 To whom does authorship and autonomy belong?

Today, digital literacy is almost a necessity for a new generation of humans. The mechanical, digitally reproduced image materials produced by AI also provide new stimuli and raw materials for today's human artists who have all but exhausted their creative possibilities. The artificial imagination is both autonomous and ubiquitous, and its aesthetics are dizzying.

But developers and artists obviously don't stop at AI art as an ever-expanding library of inspiration. We are also curious, if imagination is not unique to humans, can artificial intelligence be able to create independently? In the forum, we saw a number of artists and designers share the works produced by artificial intelligence as co-creators, but what would a work of art done only by humans look like?

Hint: "Suburban American homes, 1960s collage." Image generated by MIdjourney. Source: Author.

This is obviously still difficult. Artificial intelligence comes from people, and the imagination and creation of existing artificial intelligence are also accompanied by humans throughout the whole process like the care of their parents. One of the most problematic aspects of artificial "authority" is that, first of all, the library of machine algorithm learning and training is still designated by humans, and the output is also selected by humans. It still needs human "processing" before it can be "digested" by human eyes.

At the forum, Liu Yukun said that he tried to use AI to learn his own writing to create new texts, but found that the results were not amazing, and even difficult to borrow. He had to revise it substantially, adding many of his own paragraphs, and finally published "50 Things Every AI Working with Humans Should Know". In the same way, the aesthetics generated by the algorithm through analysis and recommendation are straightforward and highly similar, and sometimes very eccentric. Even so, many designers consciously collect these images, edit and integrate them into a new atlas, as a moodboard for their own creation.

In addition to the production of images we mentioned at the beginning of the article, AI can also further process existing images to create new creations within a certain style. It turns the image creator's style into a filter that can be applied to other images. For example, if you enter the image content in the AI art website Dream, and select "Ghibli style", the newly generated image will show a similar fantasy animation style, and if it is converted into a surreal style, an image similar to Dali's paintings will appear. .

Left: The result after entering the prompt word "Dune, Ghibli style"; Right: The result after entering the prompt word "Dune, surreal style". Source: Author.

Users provide propositions, and AI, as a producer, produces new images. Or the user provides the content, and the AI puts it into someone else's style frame, producing new images. So the identity of AI in this production process is the author or the tool? Who is the subject of this creation? AI, AI developers, users, or the artist himself?

This may also tempt us to further imagine: if no one is training the artificial intelligence, or screening the output, and not only consider using the AI to process the existing images, whether it can also produce some kind of more "autonomous". work? Such works may point to a more unknowable imagination , and the results may be beyond human comprehension and appreciation. Do Androids Dream of Electric Sheep by Philip Dick? and Lyme's Solaris (click here to jump) provide us with such paradigms: imagining the unimaginable.

03 Is the mass production of artificial intelligence a creation?

By analyzing a large amount of searched image data, AI extracts the existing artistic style, object shape, and character characteristics, integrates and produces them, and a new image work is born. In the AI image beta testing community we joined, new prompt words and new images were constantly generated, selected, iterated, and developed, which gave us a strong feeling that this was not so much a "production" production", rather it is a "reproduction" with numerous processes of mutation and selection.

These images also simulate the image of some creatures that do not exist in the real world. For example, we can superimpose architects and fashion brands with strong styles such as "Zaha Hadid" and "Balenciaga" (zaha hadid + balenciaga) in DALL-E, Midjourney or other AI image generation software to obtain A collection of silhouettes with smooth curves—a bizarre image that combines the genes of the two. Such an "atlas" of nine or four brand-new images just makes it stable to establish a new creative voice, as if there really is such a mixed-race designer in the world. In the same way, we can create new "artifacts" by mixing entries from different fields such as food and tools, architecture and art, painting and photography, etc. The pictorial reality of the electronic age begins to detach from our physical reality and reproduce freely. Are these new images that reproduce infinitely and autonomously, "works" created by artificial intelligence?

Nine images produced using Zaha Hadid and Balenciaga in DALL-E mini. Source: huggingface.

Indeed, in our traditional understanding of art, such creation can easily be seen as "reproduction". You could say it's just based on testable imagery, reinvented and collaged in a work that has already established a strong style.

If we believe that the starting point of creation is imagination, the original intention and instinct of human beings to create art, then when AI recycles some existing works of art, is this reproduction also a new imagination? Is this just a mapping of our imaginations? And on the other side of the question, if we think that what AI does is not new, how can we argue that the human imagination is new, not a recombination of multiple existing elements?

04 Is AI better at image processing than word processing?

AI creation may be like a mirror of human creation, and its creative vitality has a dangerous appeal.

There is often another important thing on this mirror - the filter. In fact, "Filter" means both a filter and a filter, both of which are particularly critical to AI art. In photography technology, we are familiar with the use of filters - in the early stage, photographers can add polarizers of different colors and reflectances in front of the lens to ensure that the light effect of the work meets expectations; in the later stage, photographers can also use filters such as Processing software such as Lightroom gives the original film more different styles of digital filters, such as "cyberpunk" that emphasizes purple and yellow, or "retro" that is desaturated and yellowish.

A filter that uses AI technology to turn images into night scene effects. Source: Cyanapse's Photorealistic Image Filters.

For the movie Delete My Photos, director Dmitry Nikiforov used the image editor Prisma. Source: Delete My Photos.

By adding "filters", an image creates a strong atmosphere or emotion, and it is often the key element that allows a photographer to create a personal style that quickly generates public recognition. It can be said that the addition of filters has changed the way photographers create. However, in the traditional (re)creation of filters, the atmosphere and mood brought by the style can rarely exist independently of the content of the work, it is like an add-on to the main body.

This leads us to think about the relationship between image filters and text style. On the one hand, art filters are already very common in image pre- and post-processing. AI filter processing of existing images has also reached a very mature level - not only the adjustment of light and shade, white balance and color, AI can modify the abstract lines in the image, the shape of people and things, brush strokes. Use both for reorganization. On the other hand, it seems difficult to apply filters in text.

Liu Yukun also mentioned in his sharing that it seems difficult to generate and add some kind of "style filter" to text through artificial intelligence. Perhaps the existing Internet ecology is completely dominated by images, so the processing of text is no longer the most favored field of capital, but here, we are also curious about the endogenous differences in stylization between images and text. Just as images have terms like "Ghibli," "cyberpunk," and "retro," the texts and narratives of different writers clearly have their own strong aesthetics. For example, we might say "Shakespearean", "kafkaesque" or "Orwellian", but compared to the lively market of image filters, it is a piece of text. AI processing that adds style is still very rare.

We may be able to make some guesses: for AI development, compared to the clear superposition relationship between images and filters in image processing, the style of text seems to be not just an addition to the text itself, but itself dissolved in the text, which cannot be Simply strip it out. The Kafkaesque style is not entirely due to the author's preference for a particular word collocation, or a preference for a certain dialectized expression, but rather the world he created, and the totality of the characters based on his narrative. The situation constitutes his unique style. Similarly, the Orwellian style does not lie in the particularity of his words and sentences, but in his understanding and description of a certain totalitarian system. If AI wants to learn such a large amount of text and extract "Kafka-style" or "Orwellian-style" filters, which can be easily applied to any text given by the user, perhaps the difficulty is that, How to avoid such processing to stay on the superficial imitation of words and appear lame.

A still from the movie Nineteen Eighty-Four. Source: "Nineteen Eighty-Four".

But literature is not a virgin land that AI creation has not set foot in. Among texts, poetry has been relatively successfully created using AI and algorithms. By comparison, both fictional and non-fictional creations require authors to stitch together plot or thinking into a readable, coherent whole, but poetry seems to save AI from this step. By dismantling and reorganizing some words and phrases, many poems created by AI will combine images that are not common and not often juxtaposed together, and the jumping space in them is again handed over to the imagination of human readers to complete. This, in turn, can bring other kinds of inspiration to human authors.

Poetry, however, may be a medium in writing that is closer to the way images are created. We are still curious about how AI will advance in literature: will it be able to reshape the way we tell it with words, just as it can reshape the way we see the world with images? Through deep learning, can AI improve the sense of chain-like sequence between elements, in terms of storytelling, have "Shakespearean", "Kafkaesque" or "Orwellian" (Orwellian)" filter? If AI is better at processing images after all, will the narrative of a movie, cartoon, comic or graphic novel be mastered by AI in the first place?

Regarding the relationship between text and images, another question worth pondering is the way we started to create - several of the main AI image generation models we introduced are still often prompted by a text in the form of a human input (prompt). Start, and then converted to images by AI. And is this the best, or the most humane way?

We know that human image creation often starts from a simple form, a vague feeling, or a fragment of behavior in memory. It cannot even form a clear text description, but the image creation of painters, directors, etc., It was from this wonderful haze that it began. The works of Cy Twombly mentioned above often give people a feeling of subconscious scribbling. The beginning of his creation is close to a natural behavior that precedes language or even complete images. The current AI image generation model still uses text as the beginning, which seems to also construct a new mainstream way for future artistic creation - but this is just an understanding, and a very engineer-style understanding, and whether we will lose the understanding of certain A different kind of imaginative imagination? Of course, there are also AI software that generates images by drawing sketches, but we are more curious about the new works that can be brought about by more diverse creative methods based on text and images.

Welcome to the WeChat public account "Dune Research Institute", where we will update the first-hand original content.

You are also welcome to pay attention to the public account "Aichang Artificial Intelligence Art Center". The content of the forum mentioned in the article can be viewed in the Aichang Artificial Intelligence Art Public Account -> Video Account (AiiiiiiArtCenter) -> Live Playback.