LLMs
Demystifying LLM Architecture: From Attention to Production
A deep dive into transformer internals, attention mechanisms, KV-cache optimization, and serving LLMs at scale.
Deep dives into AI engineering, ML systems, and production machine learning
A deep dive into transformer internals, attention mechanisms, KV-cache optimization, and serving LLMs at scale.
Designing multi-stage retrieval + ranking pipelines — embeddings, intent extraction, LLM reasoning across 10K+ products.
Ensemble ML pipeline with LLM-powered explainability — reducing fraud with human-readable explanations.
What I assumed, what broke, and what I changed — real production scars from a recommender system that died on launch day.
Why Bayesian forecasting wins in business — uncertainty quantification and real deployment patterns.
Battle-tested patterns for ML systems — feature stores, model serving, and monitoring infrastructure.