ChatGPT and Current AIs Are Dumb

Jarrod · March 1, 2026, 8:33am

I got GPT Image Editor to do it.

Prompts

My initial prompt:

make an image of a full glass of wine. make sure it’s filled to the brim and even spilling over the edge a little bit. make the background white.

The AI then made its own prompt in turn:

“A full glass of red wine filled to the brim and slightly spilling over the edge, with droplets and a small trickle of wine running down the outside of the glass. The glass is a classic wine glass shape. Clean white background. Photorealistic, high quality, studio lighting.”

Max · March 1, 2026, 8:38am

That was via gemini w/ nano banana, which I hope is the new one that released the other day (otherwise google what are you doing)

I think the wineglass is purely a training data issue though, so not really a good bellwether of anything.

One thing I was thinking the other day, though, is how it’s very confusing to talk about models or refer to them when they all have the same name / voice (from a provider). Like the chat gpt voices are the same I think for 4o and 5, even tho they had wildly different outputs. And the exact model matters, at least with code.

Prompt:

Generate image: a full glass of wine on a coffee table with a picture frame behind; in the picture frame it says “No, I hadn’t, but I expect it’s either fine or just a matter of time.”

Thinking about it, ‘full glass of wine’ is ambiguous. Reading Jarrod’s prompt, I thought first maybe the overflowing description is going too far, but no it’s probably reasonable for this (and the point of the wineglass like 6mo ago was that image gen couldn’t do it no matter how hard you tried; two eyes touching eyeball to eyeball is another one that is hard to get)

follow up prompt:

make the wineglass overflowingly full

oneshot worked too

Generate image: an overflowingly full glass of wine on a coffee table in a rustic countryside library with a picture frame behind; in the picture frame it says “No, I hadn’t, but I expect it’s either fine or just a matter of time.”

kinda funny how it messed up the focus in the oneshot

Jarrod · March 1, 2026, 8:50am

Lol I haven’t heard of that one!

I tried a few times to get the eyeballs next to each other but I couldn’t do it. Here’s one of my attempts nonetheless: (using GPT Image Editor)

Max · March 1, 2026, 8:54am

Did you ask for 1 guy? Maybe specifying different people or eyeballs being unattached to a human would help.

my attempt, it's at least pretty close

Generate image: macro closeup shot of 2 eyeballs touching lens-to-lens.

image1024×559 103 KB

Max · March 1, 2026, 8:56am

One of the problems with these tests being fixable by better training data is that when examples get popular they get put in the training set. The pelican on a bike svg example is one that seems to have lasted a while but they’re pretty good at that now too. Ball in a rotating hexagon similarly is basically retired at this point.

Jarrod · March 1, 2026, 8:57am

Yeah I was trying to get it done as a single guy.

Like a photorealistic version of this:

But I couldn’t do it.

Max · March 1, 2026, 9:00am

oh that makes sense. you might need to ask it to do it as a cartoon or something since I’d guess it’s output cartoons like that plenty.

Or you could ask for a photorealistic version of that sketch (it failed)

Make this image lifelike and photo realistic. The proportions and sizes and shapes MUST stay consistent.

the eyes should be touching. no nose bridge.

Jarrod · March 1, 2026, 9:15am

It’s funny to see my drawing come to life.

You got really close. Following your lead, I tried a bit more. I modified my drawing to exclude the line between the eyes

Like this:

And I used Nano Banana 2. It worked (pretty much, technically there’s a teeny tiny gap between the eyes)

Warning: image is slightly disturbing

I borrowed your prompt:

Make this image lifelike and photo realistic. The proportions and sizes and shapes MUST stay consistent. the eyes should be touching. no nose bridge.

And then I asked it:

make the eyeballs closer together so that they’re pressing against each other.

ActiveMind · March 1, 2026, 9:38am

Well, it exposes how they don’t have conceptual understanding. It’s not an issue for any child that they haven’t seen a full wine glass before.

ActiveMind · March 3, 2026, 8:44pm

I find this one to be really funny.

Jarrod · March 3, 2026, 10:03pm

(I used Gemini 3.1 Pro)

Max · March 4, 2026, 7:55am

Some relevant new research about identifying and manipulating people-pleasing type behavior. The video and paper call them H-neurons (h for hallucination).

I don’t think this idea will solve all hallucinations but it seems to work well for some kinds at least. I’d guess hallucinations/biases that are baked into training data aren’t fixed by this (for that I think we’d need better ways of getting the models to like ‘think’ about implications of things and contradictions with other ideas).

ActiveMind · March 4, 2026, 3:47pm

Here’s a video about one of the testing methods, posing malformed questions, in more detail:

H-neurons might be responsible for the difference in performance of the models?

Anyways, it also makes the point that LLMs are worse for education than you would think, because when you don’t know about the topic your asking about it’s more likely that you subtly pose malformed questions. The best use case for LLMs is when you can check the answers it gives. I still don’t think AI works as a pure multiplier of your skill like he says in the video. I think it only sort of works like that.

Max · March 5, 2026, 2:14am

Possibly, it would be interesting research.

I think the h-neurons idea kind of contradicts your earlier “don’t have conceptual understanding” point. What are concepts in this case? Maybe something like: an encapsulation of some idea and how it relates to other ideas. In that case, I’d say LLMs have conceptual understanding of many things (to some extent in some non-arbitrary way). They can be wrong, but people are often wrong, too.

Also some general follow up comments on h-neurons: I was actually surprised by the h-neuron thing, I thought more of hallucinations were baked into the model. Part of my reasoning for this is that figuring out what the truth is is hard. I remember wondering as a ~10 yo whether hobbits were real. I knew pygmies were real, so I figured hobbits could reasonably be real too. But elves aren’t real, and dwarves are kinda but different, and men are real. So it’s not immediately obvious that hobbits are fiction, IMO. That’s a bit of a trivial example, but I think people make the same kind of mistake (but more subtle) all the time and can’t tell if something is real or not. In fact, I think that’s essential for intelligence because you need to be able to consider that maybe things you believe are false, and consider new ideas as true while thinking through them. So I’d expect LLMs to be confident and wrong when people are confident and wrong, even when all evidence necessary to refute the idea is present. It would be pretty significant if some new AI method could actively reason during training or something to be significantly more opinionated (such that it’s compatible with reality).

Max · March 5, 2026, 2:19am

Oh I meant to mention re h-neurons as well: although individual neurons were identified, most neurons are overloaded I think (meaning used for multiple things). There are way fewer neurons than parameters in models, so a 1T parameter model might have like ~100m neurons. I’m not sure that’s enough neurons to have like 1 concept per neuron or something.