Excited for the future!

I’m Sabareesh, a curious researcher exploring Large Language Models (LLMs) and reinforcement learning. By studying the inner workings of LLMs, I’m working to better understand their capabilities, uncover insights, and contribute meaningfully to these transformative fields.

Let’s create something extraordinary together!

Four waterblocked RTX PRO 6000 Blackwell GPUs on an open test bench

4× RTX PRO 6000 Blackwell on Water, and the One Card That Wouldn't Behave

Converting four RTX PRO 6000 Blackwell cards to waterblocks, finding a VRM choke loose on the workbench, and getting back to 41k tok/s.

Many Contexts Is All You Need

In December 2025, Alex Zhang and collaborators at MIT published a paper called Recursive Language Models. The core result: GPT-5 scores 0% on a retrieval task over 1000 documents (10M+ tokens). RLM wrapping the same GPT-5 scores 91.3%. Same model. The difference is how context is managed. That result stuck with me. I’ve been building recursive agent systems on top of it since. This post is about what RLM actually is, why it works, and what I’ve learned running it on real tasks. ...

TTT-E2E: Models that learn during inference

Test-Time Training for LLMs: Models Should Keep Learning After Deployment

Pretraining taught us that neural networks can compress massive amounts of data into weights. But once we deploy an LLM, we usually stop updating those weights completely. The model becomes frozen — it reads new inputs but never learns from them. Test-time training asks a more ambitious question: what if the model kept learning while it was being used? TTT-E2E is one practical answer. It lets a language model adapt its weights online from the very sequence it is reading. One consequence is dramatically stronger long-context behavior — but the deeper insight is that inference and learning don’t have to be separate phases. ...

Teaching Qwen3-4B to Trade: From Hold-Collapse to +9.4% Returns

Can you turn an LLM into a profitable trader? I spent three months finding out. This post covers the full arc: a 5-stage supervised fine-tuning pipeline on Qwen3-4B, a catastrophic failure mode I had to diagnose and fix, the checkpoint that hit +9.4% returns with perfect format validity, and why supervised learning hit a ceiling that only RL can break through. The Setup The model is Qwen3-4B. The task: given 30 days of OHLCV data plus 20+ quantitative features (RSI, MACD, volatility, beta, etc.) for a stock, output a structured JSON trade plan inside <think>...</think><answer>{plan}</answer> tags. The plan includes decision (enter/hold), side (long/short), stop loss, take profit, holding days, and position size. ...

MCP Compact: Keep Agent Context Lean

The problem: MCP agents return bulky tool outputs (screenshots, DOM dumps, network traces) and quickly blow past context limits. Downstream steps stall or get fuzzy because the signal is buried. TL;DR: MCP Compact sits between your agent and MCP server, summarizes noisy tool outputs per-tool, and keeps context lean (e.g., 109k DOM -> 8.9k tokens) without changing agent code. What MCP Compact does: it sits between your agent and the upstream MCP server, forwards every tool call, and summarizes the response with an LLM. You set per-tool rules (token budget, what to preserve), and the proxy enforces them automatically. ...

All You Need is 4x 4090 GPUs to Train Your Own Model

How I built an ML rig for training LLMs locally, exploring hardware choices, setup tricks, and lessons learned along the way.

Defining AGI

A thoughtful exploration of Artificial General Intelligence (AGI) through three fundamental concepts.

Embarking on My Journey into LLM

Join a curious engineer’s quest into the fascinating world of Large Language Models (LLMs). From tinkering with GPUs to unraveling the mysteries of architectures like Llama2, this journey is filled with challenges, breakthroughs, and the relentless pursuit of understanding AI’s limitless potential.