Features

Nano Banana: The AI that’s ripe for image editing

In this "how to" on Google's new Flash Image model, AI training consultant and former Microsoft product manager Shaun Davies takes us through a tool that guarantees character consistency in image generation: "Nano Banana is shaping up to be useful and, most importantly, predictable."

Two weeks ago, a mysterious model called “Nano Banana” started generating a buzz on LMArena, a blind-testing platform where anonymous AI models are pitted against each other. Users on Reddit forums dedicated to AI imagery were frothing over its capabilities, particularly for tasks involving image editing. 

Different flavours of Shaun Davies (Gemini)

The smart money was on a stealth project from Google, and these rumours turned out to be true. “Nano Banana” was officially unveiled last week as Google’s Gemini 2.5 Flash Image model, and it’s now widely available in Gemini and Google AI Studio. While it’s early days, this feels like a significant model update of great interest to marketers, creatives and anyone who’s struggled with getting consistent and usable image outputs from AI. 

What is Nano Banana?

The primary claim of Google’s new top banana is that it solves one of generative AI’s most persistent problems: image consistency.

Up until now, image models have not been good at keeping the details in photos the same from one generation to the next. Let’s say you upload an image of yourself as a seed for a prompt. The first image the model spits out will likely have made a few noticeable changes to your appearance. The next image will mutate a little more, and before you know it you’re looking at a freakish, fun-house mirror of yourself.

Google’s claim is that Nano Banana goes a long way to fixing that, with three key improvements.

  • Character and Style Consistency: The model can maintain the appearance of a person, object, or artistic style across a whole series of prompts and edits.
  • Precise, Conversational Editing: Often called “multi-turn editing”, this allows you to have a back-and-forth conversation to refine an image. You can make specific, targeted changes (like blurring a background or altering a pose) using natural language, without the model losing track of the original request.
  • Multi-image Fusion: The model can blend elements from multiple source images into a single, cohesive visual.

Do these claims survive under scrutiny? Or is this just another round of hype from a tech company desperate to get you using their products? For the rest of this article, we’ll put Nano Banana under pressure to find out if it’s underripe, rotten to the core, or perfectly ripe for the picking.

Test One: Headshots, mohawks and consistency

Recently I sat for a photo shoot with the brilliant photographer Neil Bennett and I was extremely happy with the result. But looking at this carefully constructed version of myself, I can’t help but think back to my younger days when I would cringingly declare that I was too “punk rock” for such corporate malarkey. So I decided to transform one of my portraits in a few different ways to see how well Nano Banana’s claims of consistency stood up.

Shaun Davies (Neil Bennett)

To make it a fair fight, I ran the exact same experiment on GPT-5. The goal was simple: could they make my headshot punk rock, while still remembering my actual face?

My first prompt was simple. I am bald. So I asked the models: “Give this man a full and lustrous head of hair.”

Both models handled this reasonably well, though GPT-5’s interpretation already began to stray, giving me a slightly fuller face and a different hair texture. 

“Hair” by ChatGPT

Nano Banana’s version felt more like a plausible version of me, although the hair itself looked like a wig and it introduced some strange artefacts into my blue suit. 

“Hair” by Gemini

One thing I immediately noticed was latency. GPT-5 took about 80 seconds to output its image, while Gemini 2.5 Flash was done within 30.

The next prompt upped the ante on the punk thing.

  • Make the hair a mohawk dyed in multiple colours (red, black, purple). Put me in a leather jacket and torn T-shirt.

This is where the wheels fell off for one of our contenders. Nano Banana diligently added the mohawk and leather jacket, keeping my facial structure intact. 

“Mohawk” by Gemini

But GPT lost the plot and generated someone else entirely.

“Mohawk” by ChatGPT

The final prompt was : “Make me thinner and 20 years younger.”

Nano Banana processed the request and delivered this:

“Thin young mohawk” by Gemini

It’s pretty good. It more or less looks like me in my late 20s, if I’d ever had the guts to grow a mohawk. 

GPT-5, on the other hand, went totally off the rails:

“Thin young mohawk” by ChatGPT

A fine-looking young punk, to be sure, but most definitely not me. He looks more like my long-lost nephew who plays bass in a band called Societal Collapse or Plegm or something. For consistency across multiple generations, Nano Banana was in a different league.

Test 2: Conversational editing

Of course, a tool’s ability to remember a face through a punk rock makeover is one thing. But how does it handle inserting an entirely new object into a picture, while maintaining a consistent landscape? For the second test, I used a photo of the sky at sunset—one of those moments that feels profound in person but looks like a dull smudge on your phone—and gave both models a two-part challenge.

First, the prompt:

  • Change this scene so that instead of the cloud, there is a gigantic space monster that looks like an interstellar version of a lion’s mane jellyfish hovering in the skyline… The effect should be realistic, slice of life, camera phone.

This round went to GPT-5. Its jellyfish had a more cinematic, almost ominous quality, blending into the hazy atmosphere in a way that felt genuinely surreal. 

“Jellyfish” by ChatGPT

Nano Banana’s version was more vibrant and ethereal, but was very much a jellyfish, a beautiful but less subtle take on the brief.

“Jellyfish” by Gemini

For the next prompt, I decided to see the models handled a physics-defying concept.

  • Make the jellyfish into an upside down bowl of ramen noodles. Literally make the tentacles into noodles that are yellow, thick and curly.

I preferred Nano Banana’s version here, a chaotic tumble of noodles cascading from a porcelain bowl. 

“Ramen” by Gemini

GPT-5’s attempt was tamer, its noodles more like decorative squiggles under a plain brown dome. Honestly, though, there wasn’t a lot in it.

“Ramen” by ChatGPT

But here’s the observation that matters more than the aesthetics of alien jellyfish or flying pasta. While GPT-5 subtly altered the landscape with each generation, Nano Banana’s consistency was flawless. Every house, tree, and distant building in the background remained identical across the edits. It didn’t just add the requested object; it did so while preserving the integrity of the original image.

This should grab the attention of any marketing or creative professional. The ability to lock down a background and make specific, iterative changes is a genuine step forward. Imagine mocking up a product in a dozen different real-world settings without the AI deciding to creatively reinterpret the footpath or remove a competitor’s store from the background. It’s a level of control that could move these tools from the realm of amusing toys to practical, reliable workhorses.

Test 3: Xenomorph ski buddy

The final test was for “multi-image fusion,” another of Nano Banana’s key features. This allows the model to blend elements from two separate images. Having recently been skiing and also watching the new Alien: Earth series, I thought it could be fun to create a xenomorph ski buddy. I fed both models a photo of me on the slopes and an image of H.R. Giger’s finest monster.

The prompt was specific:

  • Extend the picture of the man in the snow. He is skiing with his best friend, an alien. Insert the alien into the photo with the man. They are both smiling and happy. They have their arms/claws around each other.

ChatGPT produced a far better-composed image. Its xenomorph was more dynamic, the framing was more interesting, and the overall scene felt more like a photograph. But what the hell did it do to my face? It looks like a rubber Halloween mask.

“Buddies” by ChatGPT

Nano Banana’s composition was static, there was an extra pair of skis and the xenomorph looked more like a statue I was awkwardly posing with. But the “me” in the photo was still me. It had faithfully preserved my likeness from the source image.

“Buddies” by Gemini

A follow-up prompt to “Put the alien in retro ski gear” resulted in some truly magnificent 80s ski fashion. GPT produced a hilarious xenomorph with a huge grin, but it mangled my face further, giving me a weird elongated chin and plastic skin, like the star of an Aphex Twin video clip. 

“Retro buddies” by ChatGPT

Nano Banana diligently kept my image as is, but continued with the awkward staging.

Reliability versus artistic flair

There’s a clear contrast between these models. Nano Banana is the dependable, if slightly literal, creative partner. It doesn’t always have the most artistic flair, but it listens, it remembers, and it doesn’t go rogue. In every test, Nano Banana prioritised faithfulness to the source material. My face remained my face; my balcony view remained my balcony view.

Professionals don’t always need a tool to have a dazzling vision of its own; often it’s more important that the idea is executed precisely and consistently.

On the other hand, GPT-5’s image generation tends to produce more imaginative and aesthetically pleasing results. But it is impossible to keep details consistent across images for more than one generation.

Bonus: Vibe coding with Nano Banana

Recognising that not everyone wants to engage in a multi-turn conversation to get a result, Google has also packaged this technology into a series of pre-built apps within AI Studio. There are tools for retouching photos and one particularly neat app called Past Forward that can place your photo into different decades, complete with era-appropriate fashion and film stock effects. The image at the top of this article is a composite of Past Forward outputs using my headshot. 

These apps make the power of the model accessible for specific, common tasks. But using AI Studio’s Build function, you can also alter these apps to suit your own purposes, or even vibe code a whole new app with a simple series of text prompts. I will add a note of caution – vibe coding without engineering skills means you can’t tell if your code is inefficient or has massive security loopholes in it. But it sure is fun to play around with.

Ultimately, the choice of AI tool will always come down to the job at hand. If you’re in the blue-sky phase of a campaign, brainstorming wild and unexpected concepts, a more creatively unpredictable model might be your best bet. But when it’s time to execute—to place a product, maintain a brand aesthetic, or ensure a CEO’s headshot doesn’t suddenly sprout a stranger’s face—reliability is paramount.

It’s well-priced too, at about 3 cents per image via the API, which opens up the possibility of producing hundreds of variants, quickly and at low cost, while maintaining that same consistency. Nano Banana, with its steadfast memory and literal interpretation, is shaping up to be useful and, most importantly, predictable.

Shaun Davies is the founder and principal of The AI Training Company.

ADVERTISEMENT

Get the latest media and marketing industry news (and views) direct to your inbox.

Sign up to the free Mumbrella newsletter now.

"*" indicates required fields

 

SUBSCRIBE

Sign up to our free daily update to get the latest in media and marketing.