minimaxir 2 hours ago

Everyone is sleeping on Gemini 2.5 Flash Image / Nano Banana. As shown in the OP, it's substantially more powerful than most other models while at the same price-per-image, and due to its text encoder it can handle significantly larger and more nuanced prompts to get exactly what you want. I open-sourced a Python package for generating from it with examples (https://github.com/minimaxir/gemimg) and am currently working on a blog post with even more representative examples. Google also allows generations for free with aspect ratio control in AI Studio: https://aistudio.google.com/prompts/new_chat

That said, I am surprised Seedream 4.0 beat it in these tests.

  • cosama 42 minutes ago

    I was trying to use gemini 2.5 flash image / nano banana to tidy up a picture of my messy kitchen. It failed horribly on my first attempt. I was quite surprised how much trouble it had with this simple task (similar to cleaning up the street in the post). On my second attempt I had it first analyze the image to point out all the items that clutter the space, and then on a second prompt had it remove all those items. That worked much better, showing how important prompt engineering is.

  • daemonologist an hour ago

    I don't think people are really sleeping on it - nano-banana more or less went viral when it first came out. I'd argue that aside from the capabilities built into ChatGPT (with the Ghibli craze and whatnot) craze it's the best known image editing model.

  • herval an hour ago

    Gemini is great when it gets it right, but in my experience, it sometimes gives you completely unexpected results and won't get it right no matter what. You can see that in some of the examples (eg the Girl with the pearl earring one). I'm constantly surprised by how good Flux is, but the tragedy is most people (me included) will just default to whatever they normally use (chatgpt and gemini, in my case), so it doesn't really matter that it's better

    • dimitri-vs 41 minutes ago

      Agreed, to the point where I built my own UI where I can simultaneously generate three images and see a before/after. Most often only one of three is what I actually wanted.

  • cpursley 26 minutes ago

    Meh, most Google AI products look great on paper but fail in actual real scenarios. And that ranges from their Claude Code clone to their buggy storybook thing which I really wanted to like.

lxe an hour ago

This is vastly more useful than benchmark charts.

I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.

  • estetlinus an hour ago

    I am trying Qwen Image Edit for turning day photos into night, mostly architecture etc. Most models are struggling, and Nano Banana misses edges and stuff, making the pictures align poorly.

amelius 3 minutes ago

A cat's paw has only 4 fingers.

hackthemack 2 hours ago

I do not use ai image generating much lately. It seemed like there was a burst of activity a year and half ago with self hosted models and using some localhost web guis. But now it seems like it is moving more and more to online hosted models.

Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.

George's hair, for example, looks over the top, or brushed on.

The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.

  • minimaxir an hour ago

    > But now it seems like it is moving more and more to online hosted models.

    It's mostly because image model size and required compute for both training and inference have grown faster than self-hosted compute capability for hobbyists. Sure, you can run Flux Kontext locally, but if you have to use a heavily quantized model and wait forever for the generation to actually run, the economics are harder to justify. That's not counting the "you can generate images from ChatGPT for free" factor.

    > George's hair, for example, looks over the top, or brushed on.

    IMO, the judge was being too generous with the passes for that test. The only one that really passes is Gemini 2.5 Flash Image:

    Flux Kontext: In addition to the hair looking too slick, it does not match the VHS-esque color grading of the image.

    Qwen-Image-Edit: The hair is too slick and the sharpness/saturation of the face unnecessarily increases.

    Seedream 4: Color grading of the entire image changes, which is the case with most of the Seedream 4 edits shown in this post, and why I don't like it.

keyle an hour ago

This was fun.

Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.

jimmyl02 2 hours ago

I think reve (https://reve.com) should be in the running and would be very curious to see the results!

joomla199 2 hours ago

Good effort, somewhat marred by poor prompting. Passing in “the tower in the image is leaning to the right,” for example, is a big mistake. That context is already in the image, and passing that as a prompt will only make the model apt to lean the tower in the result.