Baggage Unloading Process in Airline Operations

Statistical Analysis of Baggage Delivery Performance

📊 Data Analysis
✈️ Operations Analytics
📈 Statistical Modeling

Executive Summary

This analysis examined baggage delivery times across 99,174 flight records, focusing on data quality, performance trends, and probability modeling. After identifying and removing 15,057 problematic records (15.18%) due to duplicates, unrealistic timestamps, and invalid routing, the cleaned dataset offered more reliable insights.

16.1 min
Average Delivery Time
81.27%
Under 21 Minutes (Empirical)
78.55%
Under 21 Minutes (Theoretical)
15.18%
Data Issues Identified

Data Quality Assessment

Issues Identified in Dataset

Total problematic records: 15,057 out of 99,174 (15.18%)

Duplicate Records

8,380 duplicate entries removed to prevent double-counting in performance metrics and average time calculations.

Unrealistic BagDropDurations

Records showing bag drop durations under 10 seconds with bag counts greater than 10, indicating scanning or recording errors.

Invalid Processing Times

AverageTimePerBag values of 0 or 1 second indicate implausibly fast processing rates from incorrect timestamps or missing intervals.

Same Origin & Destination

Some entries had identical airport codes for origin and destination, likely test records not representing real flights.

FirstBagDropTime Anomalies

First bag logged less than 1 minute after arrival—operationally unlikely given typical deplaning and unloading times.

Data Cleaning Approach

Tools Used: Excel for data cleaning and preparation, StatsTool for statistical analysis

Key Steps:

Statistical Analysis

Descriptive Statistics

Sample Size: 84,117 flights (after cleaning)

Metric Value Interpretation
Mean 16:06 (16.1 minutes) Average delivery time
Median 15:22 (15.27 minutes) 50th percentile delivery time
Standard Deviation 6:30 (6.5 minutes) Typical variation from mean
Minimum 0:16 (0.27 minutes) Fastest delivery observed
Maximum 39:59 (39.98 minutes) Longest delivery observed

Percentile Analysis

Percentile Time Interpretation
1% 03:41 1% of deliveries are extremely fast
10% 08:39 10% take less than ~9 minutes
50% 15:22 Median delivery time
90% 24:39 90% are under ~25 minutes
99% 35:46 Top 1% take longer than ~35 minutes

Key Observation: The mean (16.1 min) is slightly higher than the median (15.27 min), suggesting right-skewness in the distribution. This indicates that while most deliveries are relatively fast, there's a long tail of slower deliveries pulling the average upward.

Probability Analysis: Deliveries Under 21 Minutes

Comparative Results

Two methods were used to estimate the probability that baggage delivery time is less than 21 minutes:

📊 Empirical Approach

Result: 81.27%

Direct calculation counting actual observations where delivery time < 21 minutes.

= COUNTIF(CleanData, "<0:21:00") / 84119

Advantages:

  • No distributional assumptions
  • Simple to compute
  • Reflects actual observed frequencies

Disadvantages:

  • Sensitive to sample size
  • Cannot extrapolate beyond observed data

📈 Theoretical Approach (Log-Normal)

Result: 78.55%

Used log-normal distribution based on observed right-skewness in the data.

Parameters:

  • LN Mean = 2.690
  • LN SD = 0.448

= LOGNORM.DIST(21, 2.690, 0.448, TRUE)

Advantages:

  • Generalizes to unseen data
  • Accounts for skewness better than normal distribution
  • Enables predictive calculations

Disadvantages:

  • Assumes data fits chosen distribution
  • Requires parameter estimation

Why the 2.72% Difference?

The empirical method gives a slightly higher probability because it directly counts observations without assuming a distribution. The lognormal model, while accounting for skewness, smooths the data and may slightly underestimate the tail probability.

  • The empirical method captures all real-world variability (including anomalies)
  • The lognormal model smooths extreme values, slightly reducing tail probabilities

Bottom line: Both methods confirm that approximately ~80% of flights meet the 21-minute target, which is operationally useful.

Recommendations

  • Set a Reliability KPI Benchmark: Adopt 21 minutes as the key operational benchmark for baggage delivery, since ~80% of flights already meet this threshold. Establish regular monitoring to track performance against this KPI across airports and time windows.
  • Implement Real-Time Monitoring Dashboards: Build dashboards that flag outlier cases (>30 minutes) and surface trends by airport, airline, or time of day. Provide visibility to both operations teams and executives for proactive intervention.
  • Address Data Quality at the Source: Standardize data entry processes to reduce duplicate or unrealistic records (~15% of dataset was invalid). Introduce automated validation rules to catch errors earlier.
  • Policy Adjustments for Consistency: Focus resources on the long-tail delays (top 10–20% of flights) where performance diverges significantly from the median. Pilot policy or staffing adjustments at airports with higher variance.
  • Continuous Improvement via A/B Testing: Use geo-based holdouts to test operational changes (e.g., staffing models, unloading processes) at select airports. Compare treatment vs. control performance to validate the impact of interventions before full rollout.

Challenges & Learnings

Key Challenges

  • Messy data with timing errors and logical inconsistencies that required careful filtering
  • Some records lacked context (like flights with the same origin and destination) forcing informed assumptions
  • The data's skewed distribution meant a normal model was inappropriate, requiring log-normal fitting
  • Balancing data cleaning without over-filtering and losing valid insights

Key Insights

The analysis highlighted the importance of data quality—approximately 15% of records contained errors that would have distorted results if not addressed. The average baggage delivery time of 16.1 minutes, with about 80% of flights meeting the 21-minute benchmark, demonstrates solid operational performance. However, the right-skewed distribution reveals opportunities to address the longer tail of delays. The comparison between empirical and theoretical approaches validated our findings while demonstrating the value of using appropriate statistical models for operational decision-making.