Hal Berghel: I have been critical of what I call multi-mediocrity in the digital arts for decades. In particular, I find much, if not most, of the online content, including commercial websites, gratuitous and tiresome. They lead to unnecessary viewing, scanning, scrolling, linking, etc. Even U.S. entertainment seems to engage in the same abuse in emphasizing special effects over thoughtful scripting. But I don't have the background in graphics, animation, multimedia, and film that you do. What is your impression about the content of modern media? Do you disagree with me?
Judson Rosebush: When we produced the Isaac Asimov CD-ROM, or the Gahan Wilson CD-ROMs which were distributed by Microsoft, one of the lessons I learned was the importance of minimizing click count. I have been a ‘click counter' ever since and must agree that user interface design today is seldom up the standards of information design pioneer Edward Tufte - either structurally/graphically or navigationally. One might observe that some of the highest-ranking sites are completely utilitarian in design and function, low bar and easy to use vs. pretty and vapid.
Many sites also come out of turnkey boxes, with some twiddling that is often left to young raconteurs, who are encouraged to add flash to the site which delights the owners, but which ‘exit clicks' a percentage of potential visitors who don't have the patience to sort through inconsequential content and unproductive navigation. Many professional sites, appear to be designed by committees who decide what branches of their business get displayed in where and on what menu, rather than what might interest or help a user. Further, many website designers are untrained about how to guide the human eye as it moves about a screen.
The problem is exaggerated by the fact corporations see web interfaces as a way to reduce their customer service costs by automating humans out of the loop, sometimes with the introduction of so-called AI chat robots (“Hi, my name is Igor and I'm here…”). This approach is not without merit, but often has insufficient human-factors design, such as a decision tree to handle the easy and fast 90% of the calls, chat, and help. But automation is only partly able to do this, and since it is a decision tree, it works quite well by forcing the caller to push buttons. It doesn't need to struggle to recognize your voice. While efficient from the point of view of the corporations, it is a huge time-sink for customers.
There exists a semantics of what used to be called user-machine interface (aka human-computer interface), and there remain to this day things like modal dialogs, menus and sub-menus, command keys, and a hierarchy or network of pages with links. Business-oriented sites that are run by or for billion dollar corporations often display an awkward clustering of functions, represented by different styles and inconsistently mixed through several layers of interface. At best, this approach requires users learning the quirks; unfortunately, this trend could well develop into a genuine hatred of the technology, when the hatred should be directed to the nameless decision makers who thrust these poorly conceived user interface experiences upon their “valued customers.”
I appreciate your impatience with fanfares and splashes. My pet peeve is websites that push competing animated gifs. One job of the interface designer is to guide the eye. One job of the researcher to measure what people click and when. Once upon a time, social scientists did time and motion studies, to measure interface behavior but somehow this expertise is thin in many levels of corporate design of online resources today.
As to the loud, special-effects driven movie franchises featuring comic book heroes, guys with guns, and lots of flashy stunts and explosions, I would agree that all-too-often, I find the fantasy tiring and the story development too slow. There default seems to be that screen time equates with ticket value. The regrouping of armies, the fatigue of endless combat, the dispirited subplots of romance, the endless chases, and a great deal of unnecessary filming, are all regrettable.
We are in an awkward moment in society right now. Social media threatens the establishment press. It can propagate formalized messaging by individuals and corporations, but it can also spawn and propagate anti-social misinformation AI adds uncertainty to what we are already seeing, reading and hearing. Of course, people have been fed fake media forever. AI can make critical thinkers more analytical of what they digest. But it can also make sub-critical thinkers believe they're critical and also provide powerful tools to those who seek to manipulate people.
HB: Do you see a role for AI in the arts and humanities?
JR: We are all curious to see to what extent AI will be able to innovate and create. Might it put together ideas that weren't put together before, reveal concepts we don't even know about?
We know that AI software has some pattern recognition utility, and can include media applications including speech recognition and closed captioning, facial recognition (type casting), location analysis, and audience analysis. Perhaps AI could examine rock cracks and predict how collapses might occur or analyze weather data and make accurate predictions, or show how to make synthetic river rapids, or optimize traffic control over large urban areas.
In a more generative way we can dictate storyboard frames and the AI can visualize them. This can happen now, but the question is to what extent will it create unique storyboard frames. How would such an activity expand beyond the input data? How might an automated storyboard project into the future? Might this involve a regression calculation based on one million possible future scenes? We know that AI can assist at image manipulation at a tactical level, again, this is not a new capability, only a different approach. But can it manipulate media at a satisfying strategic level? Maybe AI can get good enough to ‘put some noire lighting on my virtual reality character' and maybe not, but can it write stories and cast acts with compelling players? I doubt it has much capacity here.
But in the creative realm I find AI to be much more tactically useful than strategic. My experience with text to image AI suggests it is able to fabricate a convincing likeness of human models, but it tends to produce a less convincing context (scene). It is good at creating fashions and hair styles, which is tactically useful. But this is a very tactical activity, very constrained and with based on pre-determined rules. The operative question is whether this tactical activity is worthy of the label ‘intelligently designed.” .
What we know how to do well is construct image both with real actors and sets and with synthetic actors and sets. And what we also know how to do well is tell stores, sing songs and dance, and gather around a storyteller. The end goal of creating media is not to demonstrate technical competence; the goal is to tell a story, communicate an idea, and by guiding the eye and the ear, one seeks to guide the mind. By story I think of a narrative, possibly a linear complex as The Wizard of Oz or an apparently static image like the Last Supper. A painting might guide the eye with windows and doorways, guide the eye with the gaze of the people in the painting, draw the eye to props, and portray not just the people in a setting, but situational relationships, personal as well as power exchanges. Obviously these kind of media require a lot of planning and continuity.
People who have stories to tell, and who share a storyteller/singer tradition, are motivated to provide feelings for the moment, reflections of the past, and opportunism for the future. But what would AI motivations amount to beyond ad hoc, “put-together,” and keyword or pattern-based constructs. It can predict the next word in a sentence based upon histories and graphically fabricate the image. One can throttle the AI's statistical variations and censor its vocabulary. But that's the sense of creativity that we normally use in the arts.
Further storyline, image-making, song and dance, and acting styles are reflective of the world surrounding us. It is uncertain if AI will be as socially alert to something it has never seen before, like the emergence of rock and roll music in the 1950s and the soul of Sam Phillips.
Rare humans are able to look at previously intractable problems and gain insights toward solutions. Gauss created a formula for the sum of a series of numbers that uses one addition, one multiplication, and one divide, no matter how big the series. Do we expect AI will replicate this kind of insight? And if it does, how does it determine, or who determines, what problems need solving? And how does it tell us if orange is better than purple or not?
HB: We hear a lot about AI Art. What are your thoughts?
JR: It has been said that LLM-based neural network generative images can't truly originate anything because they are trained on existing datasets. But AI is rather good at digestion, and we have seen this year that AI produces photorealistic results, AND it can produce images that appear realistic and contain styling that you have never seen before. So it is definitely creative in some sense. Text to image AI is also able to fabricate products which at first glance appears integral, but upon closer inspection contain surrealist mannerisms, to use a loose term, and at more extremes the AI blesses us with chimeras, Siamese twins, mutants and the like. Sometimes AI regurgitates its training inputs with agility, but it struggles with the balance between getting finger count right and holding a great pose. Generating images with AI does require work with an “intelligence,” but that Intelligence often has “a mind of its own.”
So to whatever extent AI attempts to be predictive, using it to create images for storybook sequences is difficult. Working with AI and trying to be dominant – telling it what you want – often only makes it more slippery and prone to visually coagulate. Conversely, under-definition may be greeted with unbounded creative fabrications, which can be welcomed in the experimental realm, but are not entirely welcome when one's goal is to form ideas and concepts into a narrative message. Making storybook frames and sequences requires a great deal of patience, diligence, experimentation, and attention to detail. What is rewarding is the unpredictability of what it makes next, and that is very engaging.
Fun as it is to create AI Art, the results resemble a surreal 3D kaleidoscope, fascinating to the eye, non-repeatable, and exhausting because before one can digest the particulars of the image displayed and another creation is on the way. Creating stories with AI seems to be a two-way struggle: not only must the creative artist guide the AI, but the AI also teases the artists with possibilities and opportunities.
Right now AI image generation is extremely hard to direct and have it produce what you desire. On the other hand, one of the great charms of AI is the unpredictable images it produces, and so I think we do have a new category of Art, as revolutionary as photography and computer graphics, and a way of image synthesis. Right now it is able to cast a convincing someone on a sandy beach, but, left to its own devices, it is also able to conjure up assemblages that fuse the real and the unreal (at one extreme) and engulf the image space with a brain wash of structure and pattern, reminiscent of some of the messages of the surrealists and cubists were conveying, yet remarkably of our own era, catching our eyes, and amazing us. The unique mass production, aesthetic randomness yet structure will no doubt saturate the media scape.
Right now, we are in a golden era of AI Art, first generation, first explorations. It remains to be seen what happens next, although realism, filtering, and specialization seem not far ahead. Animated gifs may well be within reach, but narrative action may be harder to digest. The fact AI art cannot be copyrighted must be a concern to galleries hot to identify stars they can market, but also invites some new ways of thinking about ownership and the sharing of ideas visually.
This ability of AI to fabricate counterfactuals is most likely a concern of its masters and gatekeepers, who seek to keep guardrails on their products. They want to protect their audience from unseemly words, unseemly visuals, unseemly thoughts. They are very controlling, worried that their ingestion of intellectual property includes trademarks, logos, faces, and the like might invite litigation. Many want to avoid being accused of propagating extremist opinions on gender, race, national origin, and so on. Many are aware that the chat bots can be repetitive and misinformative, and ingest their own outputs, akin to a genetic regression I suppose.
There is also randomness to generative AI text-to-image (and text-to-video) systems today. They are somewhat repeatable but not exactly so. I suspect systems can be built which retain parts of the assembled image, but right now my attempts to direct anything tactical is way above the Generator's head, “make a hard fist, touch your nose,” aren't understood yet, although “sit, stand” is for the most part. This is a different problem than hallucinations. If there weren't any hallucinations, making pictures using AI would be boring. Often lacking discrimination, such AI images amalgamate visual components in unexpected and jarring ways. It is in this way somewhat creative.
HB: And will AI replace humans in the entertainment industry? Will there be an AI-generated Pop Star?
JR: Following this period of infatuation, we might see several rapid cycles of craft skill (details, resolution, lighting, vocabulary), such as we saw in the rapid increase in the development of digital sensors and screens. But beyond storyboards and greeting cards I'm not sure. Image cleanup and retouching are areas of exploration. Perhaps one might use it to help draw comic books, but I suspect someone still needs to write the story. I think the guardrails will continue to be frustrating, both for users and providers. I would suspect that AI assistants will need tactical training to increase their usefulness, and competition may emerge. And we have to be vigilant embracing the answers it gives to our questions and the visuals it fabricates for us, because they are sometimes fabrications presented with authority.
AI is not the first tool that can fabricate disinformation.
I suspect its impact in graphics will be similar to the introduction of picture manipulation software, video editing software, CGI software, and so on. AI will become one of the tools we use to fabricate images.
Your question as to whether it can go from staging models in locations to creating a Pop Star is the more dramatic question because it tests the envelope. Certainly we have had successful synthetic heroes before (Bugs Bunny, James Bond, Superman), but they have always been the result of story and performance attitude. These synthetic Stars can have lifetimes in decades.
So synthetic characters, no matter how you design them, have been around for a long time, and audiences can form bonds both with the character as well as the actor or actress beneath, as they can with the actors who play them. So might AI assist in a digital reconstruction of Dorothy and “learn” to play her role in a sequel to The Wizard of Oz?
Realize that prior to any media creation, there are a minimum of three layers of performance: There is the character being played (Dorothy), the actor or actress (Judy Garland), and the real person (Frances Ethel Gumm). And after learning to become Dorothy, would our AI be able to replicate the actress (Judy Garland), and equate her abilities to play many different roles in many different performances? And how do we cast and direct this sometimes contrite answer-giver?
Performance includes a bond between the performer and the audience, and the uniqueness of an actress or actor across many roles or many songs. Actors bring characters to life, they step into the mind and body of the character, but because they are human animals, they bring an individual personality to each posture and action. Some actors resonate with audiences more so than others. The public seeks role models, idols, stars. I suspect AI lacks the functionality to compete with the next rising star or storyline. Furthermore, that relationship, e.g., between the Grateful Dead and its Fans, extends not only to the songs, but how the interpretation changes from night-to-night. This will be difficult for an AI to achieve, and even if it can create drone music, drum machines, or a sonification of solar winds, there's quite a gap between this an engaging music. And a lot of fans search for performers who demonstrate musical ability with empathy, spirit, and soul. No two nights at the opera are identical.
So we are left with questions: Will AI generated pop and movie stars become real and conquer the entertainment venues? Will AI spawn multiple Stars, or will different AI systems compete for market share, like the movie studios, networks, and websites do (or did)? Now that the AI giants are starting to think big, like acquiring their own nuclear power generators to power their data centers, clouds, augmented reality apps, are they the next Hollywood, and will human performers compete?
At least some of us realize that the voice talking to us in our car isn't a real person. But how will we react to a fancy contrivance that is linked to a vast infrastructure of databases, graphical display technologies, a substantial amount of rule-based computation, and locally determined position data. So some of us may be able to distinguish AI-bots from the human kind, but the voice in the car is inspires confidence.
Is it possible that an AI could make a hit movie? At present it is humans who have stories to tell, stories about their own lives, stories about failures and achievements, stories to shock us, stories to bring us together. Many people have many stories to tell. Will AI systems have stories to tell about how it views the world? There are hints of that in the 2D/3D world of image synthesis. Right now AI has limited ability to “hold a thought” (a set, a costume, an action); but it can be full of consolidations and endless new configurations - some apparently insightful, some loony.
HB: You seem to suggest that Artificial Intelligence is incapable of creation yet is a sense creative. Explain.
JR: The problem with creation, including the creating of Art, is that Art is motived by something. It may be motivated by the spinning of Earth, night and day, by the seasons, the visual recording of a ceremonial event (Signing of the Declaration of Independence), and by the beat of a drum. Social forces motivate Art (Jean Courbet), science motivates Art (Georges Seurat), mythology motivates Art (Jean-Leon Gerome), war motivates Art (Pablo Picasso). But AI art is not motivated by any of these forces. It reacts to a prompt by applying a formulary that results from a synthesis of what it has seen/read before, based on statistical probabilities and a whole bunch of rules. Good for some things, but even if it assesses every bit of collective memory on the planet, that still accounts for only the tiniest sliver of reality. One doubts it is able to keep track of the location of individual butterflies of the great Monarch migrations, let alone the activity of their individual ribosome factories busy manufacturing new proteins as they fly along. So AI systems don't really know what is going on, and, looking backward, they only sample history. Not that what AIs have learned is all wrong, only that what they have learned and do learn can only be a fraction of reality. AI can't appreciate the food that we eat, the water we drink, and the air we breathe. Nature supplies us with a rich source of alternatives, and there exist competing strategies for processes that are both man-made (gasoline production) as well as more natural (farming and herding). So I am not quite sure how AI could lead the next Art movement or music revolution. It would be disappointing if one of the greater legacies of AI would be marked by energy consumption and
Art is also selective. Art often deconstructs and simplifies reality (Edward Hopper). It can sanitize and glorify yet simultaneously break the bounds of gravity (Marcel Duchamp). Johannes Vermeer paints Dutch interiors with mirrors that create layer perspectives. J.M.W. Turner painted ships and water and weather in a loose brush style that balanced realism and impressions. Sergei Eisenstein provided realistic political drama. Jackson Pollock throws and drips paint on the canvas, so that the construction becomes a recording of his physical ethos toward the canvas. Each of these artists remain interesting because their styles encapsulate the world around them, transmogrified from reality onto canvas or film, always biased by the artists' decisions to alter the course of events by capturing light in a particular way or by reflecting it in a reconstituted manner. Paintings are functional artifacts in space-time projected into the current moment.
HB: Would you care to speculate on the future of AI in the arts?
JR: One assumes that as text to visual AI generation evolves it will become more familiar with vocabulary and more experienced in the visual analogs. I would expect to see larger vocabulary produce more sophisticated images. Face mapping seems to be a tactic of some systems, and assembly strategies for handing real bodies and heads appear to have arrived. Willful positioning of AI actors is in its infancy, and it remains to be determined just how advanced the process will be once action is involved.
I think we can use AI to help us solve existing problems, but many of the problems we need to solve in the future haven't presented themselves yet. And, as we saw in Arthur C. Clarke's Space Odyssey series, goal-directed approaches can produce untended results. And we know from real life how competing and conflicting goals can engender nonsense.
[Judson Rosebush, PhD, is a director and producer of multimedia products, a widely published author, an artist and media theorist. He is the founder of both Digital Effects Inc and the Judson Rosebush Company, and is the former editor of Pixel Vision magazine. He has worked in radio and TV, film and video, and hypermedia, including contributing to Walt Disney's TRON. Rosebush has produced many successful DVDs, including Gahan Wilson's The Ultimate Haunted House (Microsoft,1994), Ocean Voyager (Smithsonian/Times Mirror Magazines 1995), The war in Vietnam (CBS News and The New York Times, 1996), Look What I See (Metropolitan Museum of Art, 1996 and 2000), and Landmines: Clearing the Way (Rockefeller Foundation and the US Departments of State and Defense, 2002)]