AI in Advertising: Entering the wonderful world of generative video... or not

Welcome to the AI in Advertising digest. This month we turn our attention to the latest developments in generative video – in some ways the holy grail of the generative AI industry.

By Jarred Cinman

26 Nov 2024

Source: © NBC News The Toys R Us made with generative video was depressing but that will change and fast

Source: © NBC News NBC News The Toys R Us made with generative video was depressing but that will change and fast

What makes generative video hard?
Will AI kill the video star?
Does it work?
Are there practical reasons why this technology will never deliver?
Major players
Winners (or losers) gallery
Final word

What makes generative video hard?
To state the obvious, a video is a series of images flashing every second. Typically, it contains 24 frames per second—literally 24 different static images that create the illusion of motion.
While this is nothing new, it is particularly relevant in the generative video space.A couple of years ago when tools like Dall-E came about we were all mind-boggled by the ability of an algorithm to magic up an image based on a mere text prompt.
Generative video has to perform that same feat 24 times for each second of content as well as accurately progress objects in the image to create movement.
The computing power involved is almost unimaginable and explains why current generative video models only offer around 10-second clips per output (although Sora can reportedly do up to a minute).
As with all computing trends, this limitation will be overcome in due course, and we aren't many years away from feature films being entirely generated in this way.

Coca-Cola's iconic Holiday Magic advert created by AI leaves fans feeling sick

22 Nov 2024

Will AI kill the video star?
Ever since motion pictures were invented in the late 1800s, the basic process of making films has been the same: “Lights, camera, action”, essentially.In parallel, the visual effects industry grew from hand animations to complex computer graphics, which eventually led to endless tedious superhero movies.
AI marks a fundamental break from both. In one move, it replaces the need for people to write scripts, build sets on location, cast and direct actors and capture film footage – as well as the painstaking work that makes it look like Spiderman is swinging between tall buildings.
From the perspective of a computer algorithm, it doesn't matter whether the subject in a film is a superhero, Brad Pitt or Mickey Mouse. And it doesn't matter if Brad is in New York or on the planet Tatooine.
Once you've figured out how to generate images from scratch, they can be of anything, and generative video can simulate the entire endeavour.
Most ads do not have the production budgets of Hollywood. This makes them even easier to replicate with AI and frees the imagination of the creative director without the limits of location costs, cast royalties or CGI armies.
Compared with the impact on writers that ChatGPT and Gemini foretell, or that Dall-E or Stable Diffusion has on graphic designers, generative video sweeps away a much larger industry.
For most clients, the ability to turn a multi-million-rand production into a few minutes of computing time will be irresistible.

Does it work?
The biggest splash in this space was made by OpenAI earlier this year when they announced Sora – their text-to-video platform.
They are far from the only game in town, however, with Google's Lumiere, Meta's Make-A-Video and Runway's Gen-3, and many others, all debuting this year.
Neither Sora nor Lumiere has been released to the public although both OpenAI and Google have made them available to some filmmakers and creatives to experiment with.
The depressing ToyRUs ad, released earlier this year and made with Sora, was among the unfortunate first results.
Example videos from all of these platforms are impressive and imaginative.
They look like they were shot with high-end cameras and had the best computer artists working on them. So, at least in theory, the situation described above has arrived.
In reality, working with the publicly available tools is still very hit-and-miss and it's not possible to go under the hood and find out why the model interpreted your prompt that way.
Bear in mind, though, that these are version one technologies, many are not even out of beta. We are seeing the birth of a whole new category of toolset and 99% of the improvement is still ahead.

Are there practical reasons why this technology will never deliver?
Sceptics argue that there are three primary flaws with these tools (and AI in general):
1. They cannot be truly "creative" since they are merely pattern-matching algorithms that mimic what has been done before.
2. AI is running out of training data and thus will plateau long before it threatens human work.
3. The desire from consumers, especially young consumers, for authenticity. And the belief that machine-generated work will give you the creeps rather than the positive emotion hoped for by brands.
I think "creativity" needs to be carefully unpacked.
Much of what we call "creative" are based on prior art applied to a new situation. Many films – and, certainly, adverts – are variations on a theme rather than remarkably fresh and unique.
On the issue of training data, this problem remains an open question.
If the copyright suits against the AI companies move quickly enough and turn out in favour of the rights holders, they may prevent these companies from training their generative video models on every film ever made and the entirety of YouTube.
But since one of the companies vying for this prize is Google, this seems improbable.
The "uncanny valley" problem is largely a product of these technologies being immature.
In the end, a video is pixels on a screen – if Marvel can make your pulse race with CGI renders there is no reason why AI can't do the same.
It now seems almost inevitable, that we will have the first feature film written, acted, soundtracked and produced completely by AI within five years.
The director of that film will still have a job at that point.
Everyone else, sadly, may well be a relic.
Major players
Here are some real-world tests with the available video generators out there. I am using this image to prompt each model along with this simple prompt:
Drive this go-kart off the screen and have a black cat walk on and smile at us.
Top contender
Runway
Time taken to generate: Less than a minute
Accuracy to prompt: 6/10
Video quality: 6/10
Cost: About 5 credits per second of video. A free account gets you 125 credits; $12/month gets you a standard account with 625 credits. That’s around US10c per second of video. A typical blockbuster movie costs about $18k per second.
Verdict: By far the most sophisticated and production-ready platform. The results were impressive, albeit rather creepy (apologies to my wife). Having said that, they vary wildly and even with infinite patience you can’t generate exactly the video you want. Yet.

Runners up

Imagine.art

Time taken to generate: About 5 minutes
Accuracy to prompt: 1/10
Video quality: 2/10
Cost: Free for limited generations and functionality; from $8/month for standard; from $13/month for premium
Verdict: This tool is far behind Runway (and, indeed Sora) in that it struggles to produce photo realism and has real problems with understanding prompts and physics. It's more adept at simple transitions and illustrative-style animations but has a long way to go.

Luma AI

Time taken to generate: about 5 minutes
Accuracy to prompt: 4/10
Video quality: 4/10
Cost: From free with limitations to $400/month for premier, with other options in between.
Verdict: Handles basic motion and animation fairly well but still fails to follow the prompt completely and gets super confused about some of the objects in the scene. It also tended to look less photo-realistic the further into the film it got.

Also-rans

Pika Labs AI

There is no option to start from a static image, so I adapted the prompt to describe the scene I wanted. The quality was extremely poor and far more suitable for cartoony animations or animated GIFs than any serious video generation.

Plazmapunk

This tool is specifically for music videos. You upload a song, it analyses the music and lyrics and creates a music video, supposedly. I tested it with one of my own pieces of music and asked for an "abstract photo-realistic video". I'd say music video producers are safe, for now.

Honourable mention

I have focused here on text-to-video generation but, of course, many commercial products from Adobe, Apple, Avid, Da Vinci, Synthesia and others have embedded AI that can handle specific tasks during editing and post-production and are currently making a bigger contribution to the content out in the market.

Winners (or losers) gallery

These are recent real and concept ads that have heavily used AI in their production (via @minchoi). You can decide for yourself whether these prove the potential or just creep you out.

Thank you to Vincent Maher for pointing me to this thread and examples.

Mammoth Munchies Dog Food By Anton Polinski

Tools used: Luma, Midjourney, Dreammachine, Elevenlabs

Luminelle (Concept Commercial)

Tools used: unknown

Coca-Cola Christmas Ads

Tools used: Leonardo, Luma, Runway, Kling, Sora, Minimax (and a lot of human intervention)

All Purpose Everything Sauce (Parody Commercial)

Tools used: unknown

Final Word
There is every reason to be impressed, excited and worried about the role of generative video in the future of the entertainment and advertising industries.
If you are a film producer, actor or, indeed, a brand or agency, you can relax for the time being.
Having said that, things are going to change, fast. It's not a case of "if" anymore, but when.
Even if the eventual cost of making a Hollywood movie with AI is many times the cost in this article, it will still come in at a tiny fraction of what it does today.
And the costs that will be removed will be measured in humans.

About the ACASA

The ACASA is the official industry body for advertising agencies and professionals in South Africa, counting most major agencies among our members.ACASA Future Industry committee comprises Jarred Cinman, Vincent Maher, Musa Kalenga, Haydn Townsend, Matthew Arnold, Antonio Petra and Imke Dannhauser.

About Jarred Cinman

Jarred Cinman authored this article as an ACA board member and a member of the ACASA Future Industry committee, including Vincent Maher, Musa Kalenga, Haydn Townsend, Matthew Arnold and Antonio Petra.