Abhilash Ganji — AI Engineer & Applied Scientist

// 01

About Me

National Winner IIM Bangalore Case Study Competition

National Racer Finalist Men's 200cc Category

Building AI systems that don't just work in notebooks — they work in production.

I'm Abhilash Ganji, a Senior Data Science Engineer at EPAM Systems in Hyderabad and a former Amazon engineer. Over the past 7+ years, I've designed and deployed machine learning systems that operate at massive scale — from recommendation engines serving millions of users to transformer-based NLP models predicting business-critical outcomes.

My work sits at the intersection of research and engineering. I don't just build models — I build complete ML systems: data pipelines, training infrastructure, serving layers, and monitoring. Whether it's designing agentic AI workflows with multi-agent orchestration, fine-tuning LLMs with RAG architectures, or building real-time recommender systems, I obsess over taking AI from prototype to production.

"The best AI systems are invisible — they just make things work better. My philosophy is to build AI that's reliable, scalable, and creates measurable impact. Not impressive demos, but systems that survive Monday morning traffic."

Beyond engineering, I'm a national-level racer and competition winner at IIM Bangalore — I bring the same competitive intensity and precision to every ML problem I tackle.

My Journey

May 2025 - Present Applied Science Engineer — EPAM Systems

Agentic AI workflows, RAG pipelines, LLM-powered tools, recommender systems, AI forecasting platforms, fraud detection

Aug 2021 - May 2025 Data Science Engineer — Amazon

BERT fine-tuning, CNN/ResNet computer vision, 10B+ row pipelines, anomaly detection, forecasting

Jun 2019 - Jul 2021 Data Science Analyst — Solenis

Sales prediction, customer churn models, NLP web scraping, $1.4M cost optimization

Aug 2018 - Jun 2019 PGP Data Science — Great Lakes Institute

Top 5 percentile · Data Science & Engineering foundations

Aug 2014 - Jul 2018 B.E. Electrical — Osmania University

Top 10 Percentile · Engineering foundations

// 02

Impact Dashboard

Quantified results from production ML systems I've built and deployed

0+ Production Systems Built End-to-end ML pipelines at Amazon & EPAM

+23% Forecast Accuracy Improvement PyMC Bayesian forecasting across 5+ countries

+12% Offer Redemption Increase LightFM hybrid recsys serving 110 QPS

8%→1.2% Fraud Rate Reduction Ensemble ML + LLM explainability pipeline

-12% Churn Reduction BERT sentiment analysis on employee feedback

$418K Annual Savings CNN damage detection at Amazon fulfillment

10B+ Rows Processed PySpark + AWS Glue data pipelines

96% Manual Effort Reduced Prophet workforce forecasting automation

<800ms RAG Pipeline Latency Hybrid retrieval + re-ranking IAM policy gen

25+ Markets Covered LLM-powered reviews intelligence platform

// 03

Featured Projects

Real systems I've built — structured as Problem → Architecture → Trade-offs → Impact

View All Projects on GitHub

// 04

ML System Visualizer

Click any component to see how production ML pipelines work end-to-end

// 05

Decision Logs

How I think through ambiguity — real decisions from production systems

Problem

Cold-start in recommender system — new offers every 1-2 weeks with zero interaction data

Options Considered

1

Content-Based Only

Use offer features (protein, discount tier) to match similar items

Rejected

2

Popularity Prior

Rank by historical redemption rate until enough signals accumulate

Rejected

3

Hybrid LightFM with Feature-Sum Embeddings + Re-ranking

New items get immediate representation from categorical feature embeddings, re-ranking refines relevance

Chosen

Why This Decision

Balances exploration + personalization from day one. Content-based alone creates filter bubbles. Popularity prior ignores individual preferences. LightFM's feature-sum approach means new offers immediately get a meaningful embedding vector. Re-ranking layer ensures relevance at serving time.

Trade-off

Higher infra cost (feature store + per-market models + re-ranking) and more complex cold-start fallback logic. But the +12% redemption lift at 110 QPS justified the complexity.

Problem

Agentic commerce — product discovery across 10K+ products needed conversational understanding

Options Considered

1

Single-Shot LLM Query

Send entire product catalog context to LLM — simple but expensive and inaccurate at scale

Rejected

2

Keyword Search + Filters

Traditional e-commerce search — fast but can't handle complex intent or conversational queries

Rejected

3

Multi-Stage: Embeddings → Intent Extraction → LLM Reasoning + Re-ranking

Retrieve candidates via embeddings, extract intent, use LLM for reasoning and re-ranking

Chosen

Why This Decision

Multi-stage pipeline separates fast retrieval (embeddings) from expensive reasoning (LLM). Intent extraction narrows the search space before LLM sees candidates. Re-ranking adds 200ms but lifts precision@5 by 35%. Each stage is independently scalable and debuggable.

Trade-off

More pipeline complexity and latency vs single-shot approaches. But dramatically better relevance across 10K+ product lines, and the conversational UX enabled discovery patterns impossible with keyword search.

Problem

LLM serving for IAM policy generation — latency vs accuracy vs cost, targeting <800ms

Options Considered

1

Direct GPT-4 API Calls

Best quality output but high latency (3-5s) and $$ per request

Rejected

2

RAG + Hybrid Retrieval + Re-ranking with GPT-3.5

Retrieve similar policies from ChromaDB with hybrid retrieval + re-ranking, use as context

Chosen

3

Fine-tuned Open-Source LLM

Full control, no API costs — but massive training and infra overhead

Rejected

Why This Decision

RAG grounds the LLM in real policy templates, reducing hallucinations by ~70%. Hybrid retrieval + re-ranking balances retrieval accuracy vs latency to achieve <800ms end-to-end. GPT-3.5 is 10x cheaper than GPT-4 and with good retrieval context, achieves comparable quality for structured policy JSON output.

Trade-off

ChromaDB index needs maintenance as new cloud services launch. Hybrid retrieval + re-ranking adds complexity. But <800ms latency, 80% cost reduction, and grounded outputs made this the clear winner.

Problem

Fraud detection at 8% rate — needed both accurate detection and human-readable explanations for compliance

Options Considered

1

End-to-End LLM for Detection + Explanation

Single LLM handles both fraud scoring and explanation — simple but slow and non-deterministic

Rejected

2

ML-Only with SHAP Explanations

XGBoost + SHAP values — fast and deterministic but explanations are feature-level, not human-friendly

Rejected

3

Ensemble ML Detection + LLM Explainability (Two-System)

XGBoost + rules for fast deterministic scoring, LLM generates human-readable explanations async

Chosen

Why This Decision

Detection needs deterministic speed (<100ms) — LLMs are too slow and variable. But compliance needs human-readable justifications, not SHAP feature importance scores. Two-system approach: ML ensemble handles detection reliably, LLM generates explanations asynchronously where latency tolerance is higher.

Trade-off

Two systems to maintain and monitor. Explanation quality depends on LLM prompt engineering. But fraud dropped from 8% to 1.2%, and compliance team can now justify every flagged transaction in plain language.

Problem

Reviews intelligence across 25+ markets — multilingual root-cause analysis at scale

Options Considered

1

Traditional NLP (Keyword + Sentiment)

Rule-based keyword matching + sentiment scoring — fast and cheap but misses nuance and sarcasm

Rejected

2

Fine-Tuned BERT per Language

Separate BERT models per market language — accurate but massive training and maintenance overhead

Rejected

3

LLM-Powered Multilingual Analysis via FastAPI

Single LLM handles all languages, extracts root causes, and surfaces actionable insights in real time

Chosen

Why This Decision

LLMs handle multilingual content natively — no per-language model training. They capture nuance, sarcasm, and cultural context that keyword-based systems miss entirely. Root-cause extraction (not just sentiment) gives leadership actionable insights: "delivery packaging complaints spiked 3x in Germany this week."

Trade-off

Higher cost per review vs traditional NLP. LLM latency means batch processing for historical analysis, real-time only for incoming reviews via FastAPI. But the quality of insights across 25+ markets made it worth the cost — leadership acts on root causes, not sentiment scores.

// 06

Technical Blog

Failure stories, system deep dives, and production lessons learned

View All Articles

// 07

Let's Connect

Open to opportunities, collaborations, and interesting conversations about AI

Email

mailabhilashganji@gmail.com

LinkedIn

linkedin.com/in/abhilash-ganji

GitHub

github.com/GanjiAbhilash

Location

Hyderabad, India

Name

Email

Subject

Message

Hello, I'm A b h i l a s h G a n j i

About Me

Building AI systems that don't just work in notebooks — they work in production.

My Journey

Impact Dashboard

Featured Projects

ML System Visualizer

Decision Logs

Cold-start in recommender system — new offers every 1-2 weeks with zero interaction data

Agentic commerce — product discovery across 10K+ products needed conversational understanding

LLM serving for IAM policy generation — latency vs accuracy vs cost, targeting <800ms

Fraud detection at 8% rate — needed both accurate detection and human-readable explanations for compliance

Reviews intelligence across 25+ markets — multilingual root-cause analysis at scale

Technical Blog

Let's Connect

Email

LinkedIn

GitHub

Location