AI Predictions

2 recent things that I heard about which might drop costs, both knowledge based:

  • google’s recent TurboQuant research.
  • a way of eliminating the KV cache and using a residual vector instead. A home-researcher on YT had a video about using a mac studio and I think got a deepseek model with large context window from like 4 tok/s to like 20 tok/s + didn’t need to use 10s of GB of memory for the KV cache. (I can try finding the video again if anyone is interested)

Edit: for the second one it might not work well with batching, which means good for home users but not useful for providers. Or I might be mixing up a 3rd similar technique thing I heard elsewhere that was also about KV cache efficiency stuff, can’t quite remember.

I think, pretty soon, people will start thinking about an AI’s maximum Level of Indirection. I haven’t heard anyone discuss this, but it occurs to me that it’s (maybe) responsible for some of the leaps and bounds we’ve been seeing including mythos’s ability to break out of sandboxes (which can presumably be something it decides to pursue on its own if it gets stuck).

By indirection I mean the same thing as CF when it comes to goals and indirect goals.

Higher max level of indirection means keeping more of the current ‘stack’ of goals in mind, and allows it to go off autonomously and do complex stuff in order to continue with the main goal. For example, if you ask it to fix your microphone on linux, it might end up cloning pulseaudio and fixing a bug, or fixing a kernel driver bug in some kernel extension and setting it up for you.

This is interesting. is maximum level of indirection the same thing agent researchers are calling planning depth or task horizon, or are you pointing at something different?

and do you think frontier models actually have this built in somewhere internal, or is the apparent depth mostly coming from prompt chains, tool loops, the agent harness and other scaffolding keeping the goal stack for it?

Maybe it’s the same thing as planning depth, but time horizon is different. Time horizon would be improved by better planning depth and/or max indirection.

But it’s also different from ‘deep planning’ (like deep research) – this isn’t about task decomposition necessarily.

I’m not familiar with what ‘planning depth’ means is outside of explicit planning/task decomp so maybe they are the same thing.

By max level of indirection I mean the kind of thing where a model can ‘see’ a number of steps into a problem that let’s it link nonobvious things or know to go in an unintuitive direction. Before it prints out ‘oh I could try XYZ’, where does the XYZ come from? The context plays into it but isn’t sufficient to explain where the idea to link XYZ comes from.

Yeah I think there is something internal going on. There definitely is some apparent depth from the context (prompts and tools), but it’s also not about context size itself and seems to be somewhat independent of how much is in context.

My guess is that the residual vector (being in a super high dimensional space) contains a lot of like ‘raw information’ (below the level of tokens) that includes stuff about the stack above it (larger goals, position in goal stack, etc) but also about the epistemic context around it (kind of having an ‘awareness’ of many different facts and explanations at the same time and being able to find/detect relationships between them).