AI Predictions

With the AI discussions going on, I thought I’d post some predictions I have about AI and where it will go or how it will affect us. Feel free to post your own, too.

Mostly this is focused on coding since we are at the start of the S curve ramp and I’ve been thinking a lot about it.

I have some more (including some related to CF) that aren’t in this list. Will reply with them in the future.

AI Predictions: (I bolded some words to make scanning easier)

  • Coding as a practice moving towards error correction processes, higher level architecture, operating on a codebase in parallel, etc. Software dev becomes much more accessible but quality drops.
    • Longer term: language design shaped by LLMs (not all langs though). example: emphasis on very fast, highly parallel type checking / linting for saved files, easy and explicit typing so that compilers catch errors (but the compiler needs to be fast), smarter testing frameworks that handle more tests better (eg via memoization and caching depending on whether the explicit code paths for those tests have changed), more emphasis on functional purity to avoid side effects and hard-to-predict breaking changes.
  • Coding becomes largely remote (from the machine with the actual files). Running multiple agents on a large local codebase chugs. Unlike normal dev, you end up needing to run N\times the compilers, the tests, the LSPs, etc. Design will improve but with resources constrained for consumers (eg RAM situation), it only makes sense to move to ephemeral cloud instances or something.
  • Software licenses are dead if you can point an LLM at an example repo, port it to another language, and publish under a different license.
  • Software proofs become a lot more common. A common workflow is a human directing feature implementation + background agent automatically constructing proofs of behavior and patching issues. I’ve already had some success with formal verification (hard to tell if it matters yet, but current SOTA models can write proofs in coq/roqm and use with OCaml)
  • Rehab specifically for vibe coding by the end of 2026 (and other psychological/addiction based products)
  • Quality Human Time, Personal AI assistants & remotely directing coding agents: People will constantly be attending to small things with their agents, even while watching a movie with friends. It will be pervasive. Constant notifications and associated anxiety.

correct:

  • software license are dead: now there’s a startup (This took like less than a month from me having the idea to someone made a fully working product) Primagen Video discussing it and Malus (featured)

wrong:

  • AI coding agents being as good as they are. I thought there’d be more issues as things got more complex and that my job (programming on the more complex end) as it used to exist would be secure for longer. I was saying this 6-8 months ago.
1 Like

Personally I think https://malus.sh/ should be illegal in some way. If it’s legal, then I should be able to take a shitty photocopier, copy your art, and then say it’s legally distinct. However, this is a copyright issue then, not a software licensing issue.

Further implication: even though software patents are near-universally hated, they aren’t vulnerable in the way software licencing is. IDK how that will play out but I think it’ll bring some new fuel to that debate.

I think personal code will get a lot more common, too. This might actually have some benefits for security.

For example, I have been vibe coding an AI personal assistant in OCaml (like openclaw, but not shit). If openclaw has a massive vulnerability, it’s unlikely to work for mine. I don’t know if I’ll release it. While there’s a lot of code and a decent feature-set, there are dozens of these kind of projects at this point. Not to mention the effort of supporting users and all the bugs that affect them (but not me). The main benefit mine has is there’s a bunch of proofs about the code’s behavior – but what good are they if I don’t understand what they’re proving and no one ever checks? (To clarify: the proofs are valid proofs, but of what?)

I think most code in the future will be personal code. Or at least a very large chunk. Why buy a product when you can design it instead for pennies and add whatever features you want when you want? It goes for other things too that don’t need to be high quality: Why watch an average TV show when you can generate an average one instead? Why put on background music from a real artist when you can have an LLM listening to the games night you’re having with your friends and automatically generate tonally appropriate music for the moment in the story?

While there’s a lot of worry about the internet or whatever falling apart due to horrible vibe coded bugs; personal code works in the opposite direction (in some ways). There are some antifragile properties that emerge. But also if all of those vibed solutions use the same library that can be fragile again.

1 Like

Like 9 months ago I had an idea about co-training an LLM on both DNA and English and then just asking it about what the DNA does. (I don’t have a verifiable record of this.)

Anyway, seems like someone is on the way to doing this with evo2. It’s not there yet – it’s only the DNA half – but with similar sized datasets for training (9T base pairs vs ~10-30T tokens) I think we probably aren’t going to be constrained by current technical limits (SOTA models are like 1 step away from being able to handle both via multimodal training). Another interesting thing is that evo2 has a large context window: 1M tokens (or base pairs, not exactly sure). This is where SOTA models are atm, and means we’re pretty close to being ‘big enough’ to handle many genome related problems. When DNA becomes a mode it means synergy with other AI features like tool use, which can reduce context load by a lot.

Also now we need to worry about someone ‘jailbreaking’ an AI to get some super-pathogenic virus and synthesizing it.

https://www.nature.com/articles/s41586-026-10176-5

I think there will be a new field of epistemology soon to study how knowledge works in LLMs and similar transformer based AIs. Clearly there’s a lot of meaning that is embedded in the NN parameters, and I guess this will be an encoding more similar to DNA than to how ideas in our heads work (a more chaotic emergent structure), but who knows.

Researchers in this field might not consider themselves to be doing any philosophy work (kind of like how AI researchers are inductivists and don’t really think about epistemology much).

Do you think LLMs will be able to generate average TV shows any time soon?

I’m skeptical. AI is fine at generating average single images. The more complexity you add to a single image, the worse AI is at making an average one. And even “good” AI art is very distinct and obvious to anyone with a little bit of artistic training/skill.

Even a short TV show is exponentially more complex than a single still image. And approximately everyone is familiar with narrative structure, even if only implicitly. So I think the issue with AI art that mostly only applies to people who know art will apply to roughly all viewers of an AI TV show.

Is there any reason to think LLMs are close to being able to create more coherent long form narratives? My understanding is that even among AI advocates it is pretty well known that AI generated narrative writing needs extensive editing/rewriting in order to be coherent. But maybe I’m missing something.

Yeah – within the next few years. It might be possible to generate a full TV show by the end of this year (but expensive). Also we have video/image generators specifically for Anime now too (not just fine-tuned to prefer it from what I gathered), and it might be a lot easier/cheaper to generate Anime shows than live-action ones.

Have you seen those like micro-dramas that are becoming a thing with short form video? Each episode is like a minute long. I think we’re at or close to being able to generate those.

What are the impediments to generating longer TV slop? SOTA models today have issues with character and detail persistence, but those are being worked on actively. The script and stuff like camera direction, voice/actor direction, setting, all that can be done in text and therefore generated. TTS and voice cloning are pretty good now, and getting to the point of adding laughing and the like. Multimodal LLMs could provide feedback and edit instructions directly from video or ADR. We already have text-based photoshop, so text based video editing isn’t far behind. And if the characters look bad, well with deep-fake face swapping much of that can be solved in post. So I think we’re pretty close to being able to generate a TV show at all, then the only question is how long before it’s on demand.

They already can, I think. At least coherent enough. You might not be able to do this via chatgpt’s main UI, though. Really you want a coding agent (they’ll happily generate narratives) because you can easily do stuff like have overarching plot structure, world building docs, rounds of feedback (and instruct the LLM to impersonate various authors to give specific kinds of feedback), iteration for consistency… hell why not add an agent for boringness detection to reorganize scenes, change PoVs, set something up sooner to add tension, etc.

Just to be explicit: the bar I’m setting isn’t very high. It would still obviously be AI to anyone who is familiar with it, and a lot of people might not be able to stand watching or consuming the content. There can be plot holes, just not more egregious than those in average TV shows (and there are plenty of those).

I don’t know much about that side of things (the public discussion) besides hearing authors complain about it (on YT), and some experiments I’ve run. I agree that chat apps aren’t very good (though I got a passable hour-long audiobook out of gemini 2.5 – it was good enough to half-listen to while driving).

Maybe we have different standards of coherence? If you mean that there are no contradictions, yeah maybe we’re a ways away.

But if an ‘author’ (or book-orchestrator) has a good background doc for a character and can check each scene against that (\times n characters), I don’t see why those things can’t be ironed out. It’s just a matter of time, cost, etc. (Feeding a long book into claude could cost a few bucks each time, so doing that per character per scene might get really expensive, but all those costs will drop soon anyway.)

I saw this (from a university student) and it occurred to me that unis could or will go back through papers from 2022 onwards and rescan for AI. A lot of people that used simple methods to cheat are going to get caught (like ‘rewrite this in X style’). Detection methods keep updating, but a plagiarized essay uploaded to the e-learning portal is forever.

https://www.reddit.com/r/unsw/comments/1ru07cu/i_got_flagged_for_ai/

I mean I’m not gonna lie I pretty much used AI to do the whole thing ( I took out all the m dashes & words that were too big bc who knows what tf a heteroskeplasticity is) but they still caught me and gave me a 0

Tf do I do now imma lowkey fail this course

Gotcha, I think we have different standards for what we mean by “average.”

As an aside: I also think the term “plot hole” is overused to the point of almost becoming meaningless as a result of poor media literacy across culture. Like, plot holes are a thing that exists, but most things people call plot holes aren’t. And most people don’t really understand what they’re complaining about when they complain about plot holes. So I don’t have much confidence about your assessment about those being the main issue with AI narratives.

Yeah, different standards for coherence as well as different standards for “average” I think.

What you’re describing sounds really glaringly bad to me. But I don’t have any idea about the quality of like, the mathematical average of all TV shows or something. Such a thing might be quite bad/incoherent, or it might not, I have no clue. There are way more TV shows than I have time to watch or make judgements about, so I think determining what some true mathematical average is like is approximately impossible for me.

I was thinking of my own standards for average, which is something like: I find out about a show that in some way gets my curiosity/attention enough that I might want to watch it. Then I watch it and I do not think it is amazing, but I also don’t think it’s bad. I usually finish watching it and consider it okay and not a waste of time.

That’s what I mean by an “average” TV show. My suspicion is that is what most people will mean when they say “average TV show” whether they are consciously aware of their meaning or not.

By that standard, the types of shows you are describing sound way below average to me. But maybe I’m out of the loop on the quality of AI. You said you think this is achievable within a year… do you have any examples of AI generated narratives that you would consider close? Just a bit below average?

Ahh, I see one source of confusion. I’ll answer quickly first: no I don’t have any examples for narratives.

I should have made this more obvious, but my standards for what could be done this year are lower than ‘average’.

Specifically, I think by the end of this year we’ll be able to generate a tv show at all. My standards for that are more technical: enough consistency and control to generate characters and a world that is superficially a TV show. It’s on the order of difficulty of generating a movie’s worth of consistent footage. So at that point, with a human writing/directing (AI assisted or not), we could see a ‘fully’ generated TV show.

AI generating a complete ‘average’ TV show on demand I think is a few years away (1-3), assuming we don’t hit some kind of wall. I think we can probably do it with models the size we have now (~1T params) but fine-tuned to be better at writing fiction or creative direction or whatever. If we stop being able to make models bigger, the next logical thing is improving training (which is already going on with all the post-training being done with like opus 4.5 → 4.6, and gpt 5 → … → 5.4).

Maybe I think to little of ‘average’ TV. I’ve never watched much so maybe I’m not the best person to estimate it. Maybe I’m thinking of more like the 20th percentile or something.

Well you’d need to convert to a single unit and that probably doesn’t work well and would be hard to agree on (I’m reminded of MFDMM etc). But we might agree more about like buckets of TV shows (eg 5 buckets from really bad to really good) and that kind of breakpoint-y agreement is good enough.

https://www.reddit.com/r/ExperiencedDevs/comments/1rzp29a/this_week_ai_has_killed_one_more_thing_my_passion/

I thought this was interesting and relevant to what happens with AI in the future.

Re AI and creating TV shows, not about plots/writing but:

This lets you say “hey claude, trim all silences and add a cut at every scene change” and it will do that in Final Cut Pro for you.

Video editing is an overly manual industry. The major editing programs are not designed for use by programmers who want heavy automatization. They’re designed for people with a different skill set.

Why will costs drop? The AI datacenters are already at a big enough scale for efficiency. They’re making multiple big data centers instead of even bigger ones because even bigger isn’t clearly better. GPUs and other parts are already being mass produced at scale. Where will the big gains come from? Silicon design? Better training algorithms? I’m not saying it’s impossible but I wouldn’t assume it in the same way it’s a pretty safe assumption that new products selling 5000 units per year get a lot cheaper when they scale up to 5 million units a year.

1 Like

That makes sense to me.

Just to be clear: I am way less skeptical of the idea that large swathes of TV show creation could theoretically be streamlined with AI tools. I view that as fundamentally a very different question than the one of AI generating entire decent quality narrative shows with minimal human guidance.

The reddit post got removed by mods. First time I got to use cf-archiver for its purpose and I almost forgot it. Thanks @Max!

2 Likes

Uh yeah good question. I have a contradictory prediction that right now is maybe the cheapest inference we’ll see for a while due to the amount of subsidization happening – so planning around cheap inference might not be safe. At some point someone will hit the brakes and Anthropic will wind back the like 25x you get through the $200/mo plan to something more reasonable (on the $200/mo plan you can do up to like $5k/mo priced at API costs).

There are some reasons I can think of that LLMs might get cheaper, but these are post-hoc and I wan’t thinking about these at the time:

With chips, we have a few new chips coming online (see Groq (with a q) and Cerebras), so that might be one avenue. Another is architectural changes to models and how inference is done.

I think we’ll see small models continue to improve, both from fine tuning and distillation (training smaller models from outputs of bigger ones). There are also a few new architectures that are promising at the small scale (which may or may not pan out).