The M5 Competition
The M5 Competition, the latest of the M Competitions, will run from 2 March to 30 June 2020. It differs from the previous four ones in five important ways, some of them suggested by the discussants of the M4 Competition.
- It uses hierarchical sales data, generously made available by Walmart, starting at the item level and aggregating to that of departments, product categories, stores in three geographical areas of the US: California, Texas, and Wisconsin.
- Besides the time series data, it also includes explanatory variables such as price, promotions, day of the week, and special events (e.g. Super Bowl, Valentine’s Day, and Orthodox Easter) that affect sales which are used to improve forecasting accuracy.
- The distribution of uncertainty is being assessed by asking participants to provide information on four indicative prediction intervals and the median.
- The majority of the more than 42,840 time series display intermittency (sporadic sales including zeros).
- Instead of having a single competition to estimate both the point forecasts and the uncertainty distribution, there will be two parallel tracks using the same dataset, the first requiring 28 days ahead point forecasts and the second 28 days ahead probabilistic forecasts for the median and four prediction intervals (50%, 67%, 95%, and 99%).
- For the first time, it focuses on series that display intermittency, i.e., sporadic demand including zeros.
The aim of the M5 Competition is similar to the previous four: that is to identify the most appropriate method(s) for different types of situations requiring predictions and making uncertainty estimates. Its ultimate purpose is to advance the theory of forecasting and improve its utilization by business and non-profit organizations. Its other goal is to compare the accuracy/uncertainty of ML and DL methods vis-à-vis those of standard statistical ones, and assess possible improvements versus the extra complexity and higher costs of using the various methods.
Expectations & Methods Content
Given the success of the previous four M Competitions, the considerable number of participants attracted, and the significant contributions made, fundamentally changing the field of forecasting, similar or even higher achievements are expected from the M5 Competition that is aimed at the fast growing data science community which will have easy access to the M5 dataset. It will be run using the Kaggle Platform, with an expectancy that the number of participants will be in the several thousands.