The official guidelines for the M4 competition have now been published, and there have been several developments since my last post on this.
There is now a prize for prediction interval accuracy using a scaled version of the Mean Interval Score. If the 100(1−α)“>100(1−α)100(1−α) % prediction interval for time t“>tt is given by [Lt,Ut]“>[Lt,Ut][Lt,Ut], for t=1,…,h“>t=1,…,ht=1,…,h, then the MIS is defined as
where Yt“>YtYt is the observation at time t“>tt. The competition will use 95% prediction intervals, so α=0.05“>α=0.05α=0.05. This both penalizes for wide intervals (since Ut−Lt“>Ut−LtUt−Lt will be large), and penalizes for non-coverage with observations well outside the interval being penalized more heavily. So it deals with sharpness and calibration. See Gneiting & Raftery (2007) for further details.
There is now a strong emphasis on reproducibility with benchmark R code posted on Github and most competitors will be required to post their code on Github also.
There will be weekly, daily and hourly series included, so there will almost certainly be some series with multiple seasonality.
I am grateful to Spyros Makridakis for taking account of my concerns. I think the M4 competition is much improved as a result, and I am excited to see the submissions and results.
Spyros has provided an overview of the competition on the IIF blog.
I have agreed to publish a selection of articles on the M4 competition in the IJF. More details about this will be announced at a later date.
For now, please register to participate. The more forecasters who get involved the better.