Statistical Analysis of Baggage Delivery Performance
This analysis examined baggage delivery times across 99,174 flight records, focusing on data quality, performance trends, and probability modeling. After identifying and removing 15,057 problematic records (15.18%) due to duplicates, unrealistic timestamps, and invalid routing, the cleaned dataset offered more reliable insights.
Total problematic records: 15,057 out of 99,174 (15.18%)
8,380 duplicate entries removed to prevent double-counting in performance metrics and average time calculations.
Records showing bag drop durations under 10 seconds with bag counts greater than 10, indicating scanning or recording errors.
AverageTimePerBag values of 0 or 1 second indicate implausibly fast processing rates from incorrect timestamps or missing intervals.
Some entries had identical airport codes for origin and destination, likely test records not representing real flights.
First bag logged less than 1 minute after arrival—operationally unlikely given typical deplaning and unloading times.
Tools Used: Excel for data cleaning and preparation, StatsTool for statistical analysis
Key Steps:
Sample Size: 84,117 flights (after cleaning)
| Metric | Value | Interpretation |
|---|---|---|
| Mean | 16:06 (16.1 minutes) | Average delivery time |
| Median | 15:22 (15.27 minutes) | 50th percentile delivery time |
| Standard Deviation | 6:30 (6.5 minutes) | Typical variation from mean |
| Minimum | 0:16 (0.27 minutes) | Fastest delivery observed |
| Maximum | 39:59 (39.98 minutes) | Longest delivery observed |
| Percentile | Time | Interpretation |
|---|---|---|
| 1% | 03:41 | 1% of deliveries are extremely fast |
| 10% | 08:39 | 10% take less than ~9 minutes |
| 50% | 15:22 | Median delivery time |
| 90% | 24:39 | 90% are under ~25 minutes |
| 99% | 35:46 | Top 1% take longer than ~35 minutes |
Key Observation: The mean (16.1 min) is slightly higher than the median (15.27 min), suggesting right-skewness in the distribution. This indicates that while most deliveries are relatively fast, there's a long tail of slower deliveries pulling the average upward.
Two methods were used to estimate the probability that baggage delivery time is less than 21 minutes:
Result: 81.27%
Direct calculation counting actual observations where delivery time < 21 minutes.
= COUNTIF(CleanData, "<0:21:00") / 84119
Advantages:
Disadvantages:
Result: 78.55%
Used log-normal distribution based on observed right-skewness in the data.
Parameters:
= LOGNORM.DIST(21, 2.690, 0.448, TRUE)
Advantages:
Disadvantages:
The empirical method gives a slightly higher probability because it directly counts observations without assuming a distribution. The lognormal model, while accounting for skewness, smooths the data and may slightly underestimate the tail probability.
Bottom line: Both methods confirm that approximately ~80% of flights meet the 21-minute target, which is operationally useful.
The analysis highlighted the importance of data quality—approximately 15% of records contained errors that would have distorted results if not addressed. The average baggage delivery time of 16.1 minutes, with about 80% of flights meeting the 21-minute benchmark, demonstrates solid operational performance. However, the right-skewed distribution reveals opportunities to address the longer tail of delays. The comparison between empirical and theoretical approaches validated our findings while demonstrating the value of using appropriate statistical models for operational decision-making.