Sabareesh Subramani
https://sabareesh.com | LinkedIn | [email protected]
Summary
Machine learning engineer building end-to-end AI trading systems — from pre-training LLMs from scratch to fine-tuning Qwen3-4B on proprietary financial datasets and training trading decision models with GRPO reinforcement learning. Operates custom GPU infrastructure (4x RTX 4090 + 2x RTX PRO 6000 Blackwell) for continuous experimentation. Developed a multi-stage waterfall pipeline (equity knowledge injection → instruct alignment → stock prediction → trade execution) achieving 100% format validity and +9.4% portfolio return. Designed microagent architectures for autonomous equity research and a multi-agent personal AI runtime. 14 years of production software engineering experience as CTO, with deep expertise in scalable systems, Kubernetes, and cloud infrastructure.
Technical Skills
- LLM Training: PyTorch, torchtune, torchao (float8/4-bit quantized training), TRL, NeMo-RL (GRPO/DAPO), DDP, FSDP/FSDP2, torchrun, vLLM, llama.cpp, Cut Cross-Entropy
- Model Architectures: Qwen3-4B, Llama 2/3, DeepSeek R1, GPT-2/NanoGPT, custom Transformer encoders
- RL for LLMs: GRPO, DAPO, PPO, TorchRL, volatility-normalized reward functions, counterfactual opportunity regret, hold-penalty scheduling, reward shaping for trading
- Agent Systems: OpenAI Agents SDK, MCP (Model Context Protocol), multi-agent orchestration, context compaction, semantic memory, WandB Weave observability
- ML Infrastructure: WandB, HuggingFace Hub, CUDA, Ray, multi-GPU training (4x RTX 4090, 2x RTX PRO 6000 Blackwell), checkpoint management, vLLM evaluation pipelines
- Languages: Python, Java, Swift, C, C++, TypeScript, C#
- Backend & Infrastructure: FastAPI, Spring Boot/Cloud, SQLAlchemy, Docker, Kubernetes, Azure, AWS, SQL Server, Kafka, Elasticsearch
AI & ML Projects
Qwen3-4B Financial Trading Model — SFT + RL Pipeline
Multi-stage waterfall fine-tuning and reinforcement learning pipeline for training an autonomous equity trading decision model on Qwen3-4B.
Supervised Fine-Tuning (5-stage waterfall):
- Built dataset pipelines exporting equity reports (5,176 reports), stock prediction data (103K records across 94 symbols), and trading decisions from SQL Server into structured JSONL.
- Trained through 5 stages: equity knowledge injection, Alpaca instruct alignment, stock prediction (achieved ~5-6% MAPE), stateful trading decisions, and distributed training.
- Developed weighted CCE loss (using Apple’s Cut Cross-Entropy — 4.7x faster, 3.2x less memory) with conditional per-field, per-sample weights. Enter-critical fields (decision, side, stop_loss) weighted 4-8x; hold-null fields suppressed.
- Diagnosed and solved hold-collapse failure mode where model learned trivial all-hold policy due to class imbalance. Identified epoch12 checkpoint as optimal recovery base over epoch15/18.
Key result — best checkpoint (v2 weighted CCE): 100% format validity, 13.3% enter rate, 136 trades, $109,428 final equity (+9.4% return) on 1024-record portfolio evaluation.
Reinforcement Learning (current phase):
- Pivoting from SFT to GRPO after SFT hit its ceiling — model imitates oracle at 80% match rate but captures <10% of oracle alpha (oracle uses future information).
- Designed volatility-normalized, bounded reward function [-2, +2]:
0.8 * pnl_score - 0.2 * dd_penaltyusingtanh(return / sigma_h)for enters; counterfactual opportunity regret for holds (penalizes missing >1.25 sigma moves). - Research survey covering Trading-R1 (Sharpe 2.72 on NVDA), FLAG-Trader (LLM as RL policy with LoRA + PPO), Alpha-R1, HCAPO (hindsight critic), and DAPO at FinRL Contest 2025 (230% cumulative return).
- Designed “Living Trader” architecture with 4 loops: real-time inference, daily experience collection, weekly RL training, continuous self-research.
Scientific methodology: Maintained experiment journal with 13+ dated entries, formal scientific reports, per-experiment leaderboards with 12 ranked runs, defined alpha thesis with explicit falsification criteria, and automated phase gates for SFT → RL → regime adaptation → paper deployment.
- Technologies: torchtune, TRL, NeMo-RL (Ray + vLLM), Cut Cross-Entropy, vLLM, llama.cpp, PyTorch distributed, SQL Server
- Hardware: 2x NVIDIA RTX PRO 6000 Blackwell (96GB each), 4x RTX 4090
Autonomous Equity Research Agent
- Designed and built an autonomous equity research and watchlist curation system using the OpenAI Agents SDK with MCP browser automation.
- Implemented a microagent architecture (Topic Scanner → Symbol Assessor → Job Orchestrator) to solve context overflow problems in monolithic agent designs.
- Integrated confidence-scored watchlist management, social sentiment analysis, and market cap filtering with full observability via WandB Weave.
- Technologies: OpenAI Agents SDK, MCP, SQL Server, SearXNG, WandB Weave, asyncio
Jarvis — Multi-Agent Personal AI Runtime
- Built a personal AI assistant runtime with event-driven Signal messaging, SQL-backed semantic memory, and a layered persona/identity system (SOUL, IDENTITY, BOOTSTRAP).
- Features heartbeat-driven periodic automation, local skill execution (E*TRADE integration, linked-account transactions, health data explorer), and multi-provider AI support (Claude, OpenAI, Codex).
- Deployed as systemd services with Signal CLI daemon for real-time communication.
- Technologies: Python, SQLAlchemy, Signal CLI (JSON-RPC + SSE), systemd, OpenAI/Claude/Codex APIs
Agent SDK — Reusable AI Agent Framework
- Built a clean, extensible SDK for AI agents with tool-calling capabilities, MCP integration, and automatic semantic context compaction at 80% token limits.
- Factory pattern enables multi-agent systems with specialist agent composition. Powers the equity research agent and other projects.
- Technologies: Python, OpenAI API, MCP, Playwright (browser automation via MCP), Logfire
LlamaCraft — LLM Pre-Training from Scratch
- Pre-trained Llama 2 models from scratch on FineWeb-Edu using a custom 4x RTX 4090 workstation.
- Implemented quantized training with torchao (float8, AdamW8bit, AdamW4bit, AdamWFp8) and CPU offloading. Trained with DDP via torchrun.
- Published results on Weights & Biases and exported models to HuggingFace.
- Technologies: PyTorch, torchao, DDP/torchrun, FSDP, HuggingFace, WandB
torchtitan — Custom Distributed Training Platform
- Extended PyTorch’s torchtitan with checkpoint utilities for cross-world-size resumption, gradient accumulation memory optimization, and custom finance dataset integration (FNSPID).
- Built HuggingFace inference pipelines and model conversion tooling.
- Technologies: PyTorch, FSDP2, Tensor/Pipeline/Context Parallel, Float8, DCP
MCP Compact — Context Optimization Proxy
- Open-source MCP proxy that applies LLM-based summarization to tool call responses, reducing context window consumption by up to 97% in agentic workflows.
- Technologies: Python, MCP SDK, OpenAI-compatible APIs, Docker
Foundary — Neural Transaction Classifier
- Production text classification microservice with a custom PyTorch Transformer encoder for financial transactions. Supports online learning, batch inference, and multi-backend (CUDA/MPS/CPU) serving.
- Technologies: PyTorch, FastAPI, Docker
Ember Pulse — iOS Health Data Platform
- Built the full stack: Swift iOS app collecting HealthKit telemetry with background sync (HKAnchoredObjectQuery, HKObserverQuery), and a FastAPI Python backend with WebAuthn passkey authentication and JWT rotation.
- Technologies: Swift, iOS 26 SDK, HealthKit, SwiftUI | Python, FastAPI, SQLAlchemy, WebAuthn/FIDO2, Docker
Karpathy LLM Training Contributions
- llama2.c (14 commits): Added FineWeb/Dolphin dataset training, GPU data loading and buffering optimizations, hyperparameter tuning for 4090, WandB logging integration.
- nanoGPT: Added torch.compile, pin_memory, and async data loading optimizations.
- llm.c: Containerized C/CUDA training with Docker.
- build-nanogpt: Fixed PyTorch autocast device type for non-CUDA backends.
RL Trading Experiments (Early Research)
- Built PPO reinforcement learning environments for automated stock trading using custom PyTorch models and TorchRL. Progressed from CartPole to custom stock trading environments with MLP and Transformer architectures.
- Technologies: PyTorch, TorchRL, PPO, custom RL environments
Professional Experience
GuidedChoice — Reno, NV
Chief Technology Officer | May 2022 – Present
- Lead architecture, engineering, security, and infrastructure teams. Maintain 99.9%+ uptime across all services.
- Migrated from Docker Swarm to Kubernetes, enhancing scalability and resilience.
- Technologies: Java, Spring Cloud, Docker, Kubernetes, Azure, Kafka, SQL Server, Elasticsearch, React
Product Manager, Architect, Lead Developer | Sep 2017 – May 2022
- Architected and built a scalable microservices platform. Migrated legacy systems to modern architectures.
- Managed Oracle to MS SQL data migration. Mentored team on Spring Cloud and microservices patterns.
- Technologies: Java, Azure, Spring Cloud, Docker, SQL Server, React.js
ARS National Service — Escondido, CA
Enterprise Architect & Senior Developer | Aug 2013 – Sep 2017
- Introduced Docker container architecture and Docker Swarm across data centers. Implemented WSO2 API Manager.
- Built REST APIs for legacy CRM, vendor integration systems, ETL tools, and document management systems.
- Technologies: Java, Spring Boot, C#, Docker, SQL Server, MongoDB
CSUSM — San Marcos, CA
iOS Developer | Sep 2012 – Mar 2013
- Built augmented reality iOS app overlaying 3D objects in real-time camera views. (Objective-C, iOS)
National Science Foundation | Sep 2011 – Jun 2012
- Enhanced web application with Google Earth SDK for interactive geography learning. (PHP, MongoDB)
Education
Master of Science in Computer Science California State University, San Marcos — 2013
Bachelor of Technology in Information Technology Easwari Engineering College, Anna University, Chennai, India — 2010