Writing

Technical. Opinionated.
High-signal.

Writing across distributed systems, streaming infrastructure, cloud-native platforms, and AI-native systems — published on serverless.fyi and snackonai.com.

serverless.fyi — 2 Minute Serverless snackonai.com — Snack On AI
serverless.fyi Streaming · Jul 2024

Stream Processing: The Future of the Modern Data Stack

The real-time data revolution is rewriting the rules of the modern data stack. Batch is dead — or at least, it should be. A deep dive into why stream processing is the foundational shift that defines how modern software handles data.

Read on serverless.fyi

Distributed Systems

Streaming

Cloud Infrastructure

AI + Infrastructure

AI + Infra snackonai.com

The Truth About Feature Stores in Production ML

Feature stores are often oversold. Here's what they actually do, where they fail, and how to think about the real-time feature engineering layer in production ML systems.

snackonai.com →
AI + Infra snackonai.com

LMCache + SGLang: The KV Cache Stack Your LLM Inference Deserves

The KV cache is the most underappreciated layer in LLM inference. LMCache and SGLang show what a proper caching infrastructure for large models actually looks like.

snackonai.com →
AI + Infra snackonai.com

RAG: The Foundation Layer

Retrieval-Augmented Generation isn't a pattern — it's the foundational data access layer for AI systems. Here's how to think about building it right.

snackonai.com →
AI + Infra snackonai.com

Gas Town: Orchestrating an Army of AI Agents

What does it actually take to coordinate multiple AI agents in production? Gas Town explores the orchestration layer — routing, state, and reliability at agent scale.

snackonai.com →
AI + Infra snackonai.com

The Ralph Loop: Why a Bash While Loop Is One of the Most Honest Architectures in Agentic AI

The simplest agentic architecture is often the best one. Why a tight feedback loop beats elaborate orchestration frameworks for most real-world agent tasks.

snackonai.com →
AI + Infra snackonai.com

TensorRT-LLM: NVIDIA's Inference Compiler Is Not What You Think It Is

TensorRT-LLM is widely misunderstood. It's not just a quantization tool — it's a full inference compiler that fundamentally changes how LLMs run on GPU infrastructure.

snackonai.com →

"The hardest part of systems design isn't choosing the right architecture.
It's knowing which tradeoffs you're actually making."

— Mohinish