✳ Data Analytics · Predictive Modeling

Predicting Walmart Store
Underperformance

A segmentation & early warning approach — analyzing 143 weeks of sales data across 45 stores and 81 departments to flag which store-department pairs will underperform before it happens.

421K+
Observations
45
Stores Analyzed
3,331
Store × Dept Pairs
19.7%
Flagged at Risk

What I Contributed

As part of a 5-person analytics team, I led the underperformance definition framework and the feature engineering pipeline — designing the logic that separates true performance failure from structural or seasonal noise. My focus was building a model that gives Walmart store managers a meaningful early warning signal, not just a ranking of who sold the least.

I also drove the segmentation analysis to distinguish localized department-level issues from store-wide declines, and contributed to the final stakeholder recommendations.

Tools & Methods
Python Pandas NumPy Scikit-learn Matplotlib Seaborn CART (Decision Tree) Feature Engineering Binary Classification Rolling Statistics Time-Series Splitting

Ranking Isn't Diagnosis

Walmart's existing approach identified bottom-performing stores by raw sales rank — but "who sold the least" is not the same as underperformance. That method had three critical blind spots.

📊
Always Excludes Some Stores
Someone is always in the bottom 20%, even if every store is healthy. It's just a rank, not a signal.
🌨️
Ignores Seasonality & Context
A Neighborhood Market will almost always fall below a Supercenter in absolute sales — that's structural, not poor performance.
📉
Blind to Momentum
A store in the bottom 20% but trending up looks identical to one that has been declining for 12 months.
Core Gap: No baseline. No context. No way to separate expected from unexpected.

Explore. Predict. Intervene.

We designed a three-phase framework to move Walmart from reactive reporting to proactive, context-aware flagging of store-department combinations at risk.

Phase 01
Explore
Analyzed 143 weeks of historical sales data across 45 stores to map seasonality patterns, store type differences, and department-level variance. Identified $80.9M holiday sales spikes and structural size gaps between store types.
Phase 02
Predict
Engineered 11+ custom features capturing momentum, volatility, peer-relative positioning, and external shocks. Trained a CART binary classifier on an 80/20 temporal split to predict next-week underperformance per store × department pair.
Phase 03
Intervene
Translated predictions into a ranked risk heatmap across the top 10 stores × 10 departments, enabling managers to investigate early and adjust inventory and promotional strategies before declines worsen.

About Our Dataset

Sourced from Kaggle's Walmart Recruiting: Store Sales Forecasting competition — covering nearly 3 years of anonymized weekly sales, store characteristics, and macro-economic features.

File Contents Key Stats
sales.csv Weekly sales by store and department 421,570 observations · Feb 2010 – Oct 2012
stores.csv Store type (A/B/C) and physical size 45 stores · Types: Supercenter, Discount, Neighborhood
features.csv Economic & promotional markdowns, CPI, fuel prices, temperature 5 markdown fields · Holiday flags · Macro indicators

Building the Early Warning Signal

We engineered 11 features across four conceptual groups — each designed to capture a different dimension of underperformance risk that raw sales data misses.

Momentum & Trend Deterioration
Rolling Stats (4w Mean, 13w Std/CV)
Captures short-term momentum and scales volatility across departments.
Sales MoM Growth & Acceleration
Identifies slowing growth rates as an early warning signal before absolute declines appear.
Drop_4w
Flags sudden, sharp declines — stronger predictors of failure than gradual shifts.
Relative Positioning
Sales vs. Peer (Residual_Z)
Isolates local issues (staffing, inventory) from broader market trends using z-scored residuals.
Dept_Sales_Share
Distinguishes localized departmental weakness from general store-wide decline.
External Shocks
Macro Shocks (Fuel & CPI Deviation)
Captures inflation and fuel price spikes that suppress disposable income and purchasing power.
Context (Weeks_to_Holiday, Temp_Deviation)
Adjusts expectations for high-demand windows and weather-sensitive seasonal departments.
Operational Signals
Markdown Intensity (Has_Markdown, Ratio)
Heavy discounts relative to baseline sales signal potential inventory distress or weak organic demand.
Revenue Efficiency (Sales_Per_SqFt)
Normalizes performance across different store footprints, enabling fair cross-store comparison.

What the Data Revealed

Before modeling, the exploratory analysis surfaced three critical patterns that shaped our entire prediction strategy.

📅
Seasonality Dominates
Sales are heavily dominated by year-end holiday peaks. Any underperformance signal must account for this seasonal baseline to avoid false positives during off-peak periods.
$80.9M peak week · Dec 2010
🏪
Store Type Creates Structural Gaps
Type A Supercenters (150k+ sq ft) generate fundamentally different sales volumes than Type C Neighborhood Markets. Comparing raw sales across types is misleading — size-normalized metrics are essential.
3 distinct store formats identified
⚠️
Underperformance is a Minority Signal
~19.7% of store-department combinations were flagged as underperforming in any given week, validating that our 20% threshold is selective rather than overly aggressive.
1 in 5 combinations at risk weekly

Binary Classification with CART

We trained a Decision Tree classifier (CART) on a temporal 80/20 split — preserving the time sequence rather than randomizing — to predict whether each store × department pair would underperform the following week.

Base Model Performance

Accuracy 81.5%
Precision 62.6%
Recall 5.95%
F1 Score 10.9%
Class imbalance caused the base model to miss ~94% of actual underperformers. Threshold tuning and class weighting were applied to improve recall.
🎯 Top Predictors

Residual_Z — highest importance (0.30). Detects when sales fall below peer expectations, flagging localized failure vs. market trends.

Rolling_Std_13w / CV_13w — second and third. Volatile departments face significantly higher underperformance risk.

Drop_4w — sudden sharp declines proved stronger predictors than gradual drift.

🗺️ Risk Heatmap Output

The final model outputs a store × department risk probability matrix. Store 43 – Dept 52 showed a 0.96 predicted probability — the highest in the test period, enabling proactive management intervention.

What This Means for Stakeholders

For Store Managers

A weekly early warning list of their highest-risk store-department combinations, enabling proactive inventory adjustments and targeted promotions before sales decline becomes visible in reporting.

For Regional Operations

Portfolio-wide risk visibility that distinguishes systemic regional issues from isolated store problems — enabling smarter resource allocation and escalation decisions.

For Analytics Teams

A reproducible, extensible feature engineering framework that can incorporate new data signals (geographic enrichment, department name mapping) to further improve prediction quality.

Key Lesson

Moving from reactive ranking to proactive classification requires context-aware baselines. Without accounting for store type, seasonality, and momentum, you're not measuring performance — you're measuring size.

What We'd Do With More Data

Gap
No rural/suburban/urban classification in the dataset
Impact
Can't determine if underperformance is location-driven. Want: Census / Geo Enrichment
Gap
Store type labels (A, B, C) inferred — no official definitions
Impact
Model can't use store format as a meaningful signal. Want: Additional store attributes
Gap
Department IDs exist but names are missing
Impact
Can't pinpoint which category is hurting store performance. Want: Department name mapping
Gap
Markdown fields lack event-level context
Impact
Can't explain why Markdown 1 & 4 correlate positively — no causal story. Want: Markdown event logs