AI Predictions

Like 9 months ago I had an idea about co-training an LLM on both DNA and English and then just asking it about what the DNA does. (I don’t have a verifiable record of this.)

Anyway, seems like someone is on the way to doing this with evo2. It’s not there yet – it’s only the DNA half – but with similar sized datasets for training (9T base pairs vs ~10-30T tokens) I think we probably aren’t going to be constrained by current technical limits (SOTA models are like 1 step away from being able to handle both via multimodal training). Another interesting thing is that evo2 has a large context window: 1M tokens (or base pairs, not exactly sure). This is where SOTA models are atm, and means we’re pretty close to being ‘big enough’ to handle many genome related problems. When DNA becomes a mode it means synergy with other AI features like tool use, which can reduce context load by a lot.

Also now we need to worry about someone ‘jailbreaking’ an AI to get some super-pathogenic virus and synthesizing it.

https://www.nature.com/articles/s41586-026-10176-5