How to bring your dystopian vision to life with Google Flow
AI training consultant Shaun Davies is a former product lead at Microsoft overseeing AI content moderation. In the first of a series of hands-on tutorials for Mumbrella, he looks at how to use Google's news AI models to create long-form video.
There’s a type of executive in Silicon Valley who brushes off people’s fears about AI job displacement with a mix of optimism, self-interest and condescension. Typically the argument goes something like this: “The industrial revolution generated so many types of jobs that people in the 1800s never would have dreamed of, the impact of AI will be exactly the same, and if you disagree you’re a luddite who should read some history.”
I’m an AI trainer and consultant, but I’m also a content guy, and Google’s new Veo 3 video model excites and scares me in equal measure. I do think it’s likely that there will be new types of jobs in the future, but there will also be a huge amount of disruption, and a lot of creative people could lose out. In my view the disruption is now unavoidable (the genie is out of the bottle and it’s not going back in), but the argument that “there will just be new types of jobs” is too cute by half, particularly coming from tech bros whose aim is to generate unlimited profits by automating away work that others depend on and love.
So for this, my inaugural Mumbrella column on AI tools of interest, I decided to leverage Veo 3 and its companion studio product, Flow, to bring a dystopian vision of our creative future to life. I stitched together 10 videos to depict a future where actors find themselves displaced from television and film work, and instead spend their days as “pet enrichment specialists” for the AI-rich elite. Check it out and I’ll think you’ll agree that this model has staggering capabilities. Then I’ll share what I learned about getting the most from this new technology.
What is Google Flow (and Veo 3)?
Let’s start with Veo 3. Google’s new model is a leap forward for generative video, outputting incredible audio and video together from simple text prompts. If you’re on TikTok, Instagram or YouTube you’ve definitely come across its outputs – all these platforms are seeing a flood of hyper-realistic videos ranging from terrible AI slop to smartly produced viral content generated by Veo 3.
Google Flow is a Google Labs product that is designed to streamline AI video creation. This is the landing page.
Here you can see your previous projects and start a new one. There’s also a link to Flow TV, which is totally worth checking out. It’s like a cable television network from an AI dimension.
Click on New Project and you’ll see an interface that resembles a chatbot’s, but with some different kinds of options. In the top right-hand corner you can see your account details, including your credits. Keeping an eye on your credits is essential as they can disappear very fast! As a Google AI Pro subscriber ($32.99) I get 1000 credits a month, but really committed users can pay $409.99 for Google AI Ultra and get 12,500 credits a month.
This matters because the highest-quality Veo 3 outputs require 100 credits per generation. That’s just 10 rolls of the dice on my plan. Thankfully there’s a Fast version of Veo 3 that’s 20 credits per generation, and a Fast version of Veo 2 (no sound) that’s just 10 credits. You can select your model using the settings button. The default option is a text-to-video prompt, but you can also use a frames-to-video prompt, which incorporates an image as a reference for the model, but for some reason means that audio cannot be generated.
My process began in Gemini Pro, where I created a 10-part script using the 2.5 Flash model. This involved outlining the narrative arc, from the initial displacement of actors by AI to the more unusual specifics of their new roles. Prompting for video is best done with very specific language about the type of shot, camera positioning, movement, lighting and so on. I find that working with an LLM speeds up the creation of the prompts substantially.
In the next screenshot, you can see an example of one of my prompts and the video that it generated. Note that in addition to the top-level description of the scene, there is a much longer prompt that explains the scenario and characters to help ensure some level of consistency between generations. I used some version of this high-level prompt in every single video I generated. You’ll also see that I click an option on the video that says Add to Scene. This takes me into a new UI that shows the video on a timeline.
As I chain together videos in Add to Scene, I construct a narrative from my various generations. There are some very rudimentary editing features (primarily clipping the start and ends of videos, and moving clips around) but for now this feature is very basic, though I’m sure it will become more fully featured down the track.
The inevitable errors
This is astounding technology, but like all generative AI, it comes with problems.
A notable challenge involved maintaining consistent audio accents across the generated clips. Despite my specific prompt instructions for an Australian accent, Veo 3’s interpretations varied. Sometimes they sounded American, sometimes British.
This led me to use Eleven Labs for the narrator’s voiceover, which delivered the intended tone and accent far more reliably. Integrating this external audio then required additional editing in Microsoft Clipchamp, adding a layer to the production workflow.
Visual consistency also proved to be an area requiring some persistence. While the overall look-and-feel and character prompts were largely effective, the main character did occasionally exhibit minor visual deviations across scenes. This necessitated regenerating certain clips to achieve the desired continuity, which, of course, consumed more tokens.
A particularly recurrent (and annoying) issue with Veo 3 was the unsolicited generation of subtitles. Despite including clear instructions like “no subtitles” within my prompts, the AI frequently added text overlays. It took some trial and error to discover that this instruction needed to be placed at the very beginning of the prompt to be consistently effective. This detail, while minor in isolation, resulted in several wasted generations and underscored the need for precise prompt structuring.
Practical lessons in dystopian dreams
Despite these operational quirks, Google Flow and Veo 3’s capabilities are undeniably impressive. The ability to transform a detailed script into a visually cohesive 10-part video series, even with supplementary audio and visual adjustments, in approximately two hours is quite remarkable.
This project was, frankly, quite satisfying. No, this video won’t reshape the creative world, but I produced something both entertaining and with a message, which simply wouldn’t have been possible for me without this tool. Even my children found it tolerable. While it’s certainly no masterpiece by traditional filmmaking standards, that’s often how disruptive technology operates. What starts out as a raw, sometimes frustrating tool quietly accrues power, eventually enabling entirely new forms of media that can become surprisingly potent – just look at YouTube’s impact on television.
So if you are working in the creative industries, I urge you to have a play with this insane tool. Maybe it’s the only way to avoid a horrible future making pet enrichment videos.
How to Access Google Flow
URL: https://labs.google/fx/tools/flow
Subscription: Google AI Pro ($32.99 per month / 1000 credits) or Google AI Ultra ($409.99 per month / 12,500 credits)
How to Use Flow and Veo 3
The Generation Page
- Prompt box: This is the default UI you’ll encounter once you open or start a new a project. Look for the prominent text input field at the bottom of the screen where you can type or paste your detailed prompt. This is where you describe your desired scene, characters, actions, and aesthetic.
- Select generation method: In the top left-hand corner of the box, choose between Text to Video, Frames to Video (a mix of text and image, but no audio even with Veo 3) and Ingredients to Video (a cool feature that’s only available to Ultra subscribers that lets you mash up a mix of images in one prompt).
- Settings: In the top right-hand of prompt box there’s a button that allows you to select the best model and number of outputs per prompt. Before hitting generate, pay attention to the indicated token cost for the generation. Veo 3 – Fast Text to Video (which I used) costs just 20 tokens for a single video, 40 for two videos, and so on. Veo 3 – Quality costs 100 tokens for video, so unless you’re an Ultra user, deploy with care.
- Edit and regenerate videos: Once you generate a video it will appear in a stream on the same page as the prompt box. You can easily generate a new video with the same prompt if you don’t like the first result (it’s always a bit different), or tweak the prompt for better results, by clicking the Reuse Prompt button (an icon with three lines and a return arrow).
Scenebuilder:
- After generating individual video clips, Scenebuilder lets you combine them into a cohesive narrative. You can add a single clip to Scenebuilder by clicking the Add to Scene image that appears above any video, or click on Scenebuilder in the top left-hand corner breadcrumb navigation.
- Scenebuilder area allows you to arrange your generated clips in sequence, much like a traditional video editor, though with simpler controls. This is where your individual eight-second clips can be assembled into a longer story.
- You can clip the start and end of individual videos or Arrange the videos into a different order. You can also generate videos within this UI using the plus (+) button next to the video clips – the Extend button is particularly cool as it allows you to seamlessly extend two clips with the same characters and look, but only allows for Veo 2 generation at this stage, which means no audio.
Accessing Generated Content:
- All your projects are accessible (labelled by date) from the Flow landing page.
As you begin, don’t be afraid to experiment with your prompts. The system responds best to clear, concise, and descriptive language. And, as a friendly reminder from my own trials, always place “no subtitles” at the very start of your prompt if you prefer your visuals unencumbered by unwanted text.

The author Shaun Davies