Context Window

In December 2025, Alex Zhang and collaborators at MIT published a paper called Recursive Language Models. The core result: GPT-5 scores 0% on a retrieval task over 1000 documents (10M+ tokens). RLM wrapping the same GPT-5 scores 91.3%. Same model. The difference is how context is managed. That result stuck with me. I’ve been building recursive agent systems on top of it since. This post is about what RLM actually is, why it works, and what I’ve learned running it on real tasks. ...