Discussion about this post

User's avatar
Thomas Larsen's avatar

Thanks for the post!

I agree with a lot of what you're saying at a high level -- in fact my median timeline to Superhuman Coder from AI 2027 is more like 2031.

I disagree that data is likely to be the key bottleneck, and am more sold on compute and algorithms as the main bottlenecks. Thought experiment: suppose the number of high quality internet tokens was 10x or 100x smaller. Would timelines lengthen a lot because we have even more of a data bottleneck? I don't think so.

A few specific comments:

> The Problem with Extrapolation

I think there are two arguments you might be making here.

1. My first interpretation was: To get AGI, you need to have broad capabilities. The time horizon trends depend on domain, and so naively extrapolating them isn't a reliable way to forecast AGI.

I think that's incorrect, because our model is that first AI research gets automated, then all of the other tasks fall (e.g. vision, driving, math, persuasion, etc). So we only care about the time horizon and capability level wrt coding/AI research, not the others.

2. Another thing you might be saying is: To get to superhuman coder to automate research, you need to be good at all sorts of domains, not just coding.

But I also disagree with this -- I think there are only a pretty small set of domains that we care about. For example, the Epoch post cites chess iirc. But I don't think that Chess time horizon will be a limiting factor for automating coding. I think that reasoning, coding, creativity, etc will be necessary.

> Moore’s Law & Paradigms

I'd be quite surprised if "RL with verifiable rewards doesn’t elicit new capabilities". Huge if true. But idk, it's confusing what this even means? For example, OAI and DeepSeek training shows huge performance gains from RL. I haven't read the paper yet, so might be misunderstanding.

I like the overlap of the two sigmoids, I think it's a helpful illustration that this trend is much much less of a steady curve than moore's law. I don't think we really have enough data to be confident about the extrapolation.

> Relevant Data is Key → Workflow Data is Key

Yeah, this is a classic crux. The million dollar question re: timelines is "to what extent do we need long horizon training samples". My guess is that the bottleneck is less data and more compute -- see the classic BioAnchors debates between Daniel and Ajeya. The short timelines view basically has to rely on huge amounts of generalization. Why might this be a reasonable view?

One intuition driving the short timelines view for me is that long horizon tasks can be factored into short horizon task. You can break down longer time horizon tasks into figuring out the first step, then executing it, then reorienting and thinking about what the next step is.

Expand full comment
Harjas Sandhu's avatar

This is an excellent post. To add to the argument,

> A fifth of new businesses fail within the first year, and the majority fail within a decade—unwittingly training on these processes seems undesirable, especially if you’re interested in creating a highly capable agent.

Even worse, it’s possible that some of these business were executed well and just got unlucky, whereas other successful businesses might have used bad processes and gotten carried by other structural factors like lobbying or capital.

This is also assuming that there are in fact generalizable lessons to take from business processes. I think that’s probably a true statement, but your opinion depends on your views about business successes and failures. How do you teach an AI to stop “resulting” when its entire training paradigm is probabilistic?

Expand full comment
8 more comments...

No posts