ChatGPT and Current AIs Are Dumb

AI agents are really useful for software development now. I just had Codex fix a bug in Ghost (the open source project that the CF blog uses). It did it largely by itself just based on me saying what the bug is and giving a screenshot. It took a while, downloaded code, installed some tools, did some workarounds to get stuff running on my computer, did some testing that its changes worked, etc., in addition to figuring out the actual code changes to make. It also wrote the explanation of the changes and pushed it to github from my account for me.

I paid a bit of attention, reviewed its changes, and approved some commands. I’m not running it fully autonomously. There are security options like using a separate user account, separate computer, cloud server, git worktrees, etc. I think you can run it more autonomously than I did and still have reasonable security but I don’t know the full details.

Meanwhile on a different project I’m having Codex ssh into a staging server (not used by customers) and debug a high memory use issue. The server can be rebuilt if it breaks something, but it’s unlikely to break anything.

re Gemma 4, I didn’t try it yet but I heard it’s about as good as Claude Sonnet 4.6 which is decent and pretty usable though not the best.

For other types of productivity besides software development, I think AI is often more questionable but can be useful; it varies. And it can certainly be misused and get bad results for software.

1 Like