REQUEST A DEMO

Demand Forecasting Best Practices: How to Prepare Your Datasets for Forecasts

Author: Magdalena Foltyn, PhD

To successfully predict demand, it is necessary to comprehend and prepare historical data. By "comprehending", I mean learning about data features (both business and quantitative). "Preparation" involves adjusting the data according to these features, which consists of cleansing the data of anomalies. Anomalies are events that deviate from the general pattern of data for a specific period.

In this article, I step back from machine learning to explore the topic of demand forecasting, focusing specifically on data processing methods used in this field. My goal is to provide you with insights into various procedures and their significance within the overall demand forecasting process, resulting in that you will be well-informed about what should be done and why.

Data preparation for demand forecasting

Let's begin with a mental exercise. Imagine that you are a data scientist working on demand forecasting. Your primary task is to predict future demand based on the provided data. The first step is to analyze the given time series and ask yourself a few key questions, among them:

  • Do I have enough historical data?
  • Are the data forecastable?
  • Should I fill the lacks? If so, how do I fill them?
  • How to detect and remove outliers?

Let's now break them down to get a clear picture, starting with the first question in our roster:

1. Initial data assessment: Do I have enough historical data?

In the time series forecasting method, a minimal timeframe for your forecast performance must be determined. Usually, we work on monthly data. Thus, a minimal timeframe is 2 periods, consisting of 24 months, as within such an interval, it is possible to detect seasonality.

  • What if I operate on an interval that lasts only 12 months?

In such a case, forecasting is still possible. However, if the yearly seasonality is a data feature, there is a chance that such a prediction can be inaccurate.

  • What in the case of an interval that lasts 6 months?

Such a value is considered to be too low for accurate demand forecasting. I would suggest considering changing the data gradation for weekly intervals and performing the prediction for a shorter period.

  • Is it good to have more data, e.g., 10 years?

As a rule of thumb, machine learning benefits from a vast quantity of training data. However, the business context for time series forecasting is a slightly different case. Why?

Let me exemplify this statement by presenting you... the last decade of our history. This timeframe includes the period between 2020-2021. As you are probably aware, these were years mostly affected by the COVID-19 pandemic. This period marks a significantly different data record in most industries.

  • Now, the question is: does it make sense to account for this period during prediction?

My experience in data analysis allows me to say that the answer is: probably not, as it constitutes an anomaly.  Such an event stands out from the data and is an obstacle to creating a reliable forecast.

  • Knowing that, another question is—what should we do in such a case?

My recommendation is to take data points for forecasting the latest coherent period. In this case, "coherent" refers to having a consistent trend or seasonality. If the trend in a given interval changes drastically, it’s a sign that you should truncate the data. Although this will result in a shorter historical dataset, the predictions made from it will be more reliable.

2. Initial data assessment: Are the data forecastable?

 To check if data is possible to forecast, perform an ADI / CV2 analysis. This method of demand forecast classification consists of the following two factors:

  1. ADI (Average Demand Interval) is a metric that measures the regularity of demand. It is calculated as the ratio of the total number of periods to the number of periods in which demand occurred:

ADI = number of periods / number of periods with demand

  • CV² (Squared Coefficient of Variation), measures the variability of demand quantities—that is, how much the demand values fluctuate. It is calculated as:

CV² = (standard deviation / mean)² 


Product forecastability: the categories of demand profiles

By using these 2 metrics, we can classify the outcome within one of the four following categories:

a) Smooth demand - ADI < 1.32 , CV² < 0.49

The smooth demand is an indicator of regularity both in time and quantity. This is a premise for good forecastability in terms of univariate forecasting.

b) Intermittent demand - ADI >= 1.32, CV² < 0.49

The intermittent demand pattern indicates that there is little variation in the quantity. However, when we examine the time intervals between two demand occurrences, we notice significant variation. As a result, the margin of error is considerably higher. I recommend evaluating the results of univariate forecasting, considering the possibility that multivariate forecasting may be more reliable in this case.

c) Erratic demand - ADI < 1.32, CV² >= 0.49

In this situation, we are dealing with regular occurrences but with significant variations in quantity, which contrasts with intermittent demand. When facing this category of demand, achieving forecast accuracy can be challenging. Start by using univariate forecasting, but also consider exploring multivariate forecasting, as it may provide better accuracy.

d) Lumpy demand - ADI >= 1.32, CV² >= 0.49

Your demand features a significant variation in both quantity and time. This case makes producing a reliable forecast impossible, no matter the tools. I recommend creating a hierarchical forecast for the whole group of products instead of a univariate forecast.

3. : Data preparation: Data lacks

After addressing two significant questions, it's time to take care of the lacks of data. To do this, we need to consider our three aspects:

  1. Differentiate between NaN (abb: Not a Number) points and zeros
  2. Decide if it makes sense to fill the gaps.
  3. The choice of method for filling in the gaps.

NaN points and zeroes: what is the difference?

Be aware of the difference between zero and NaN (Not a Number). A value of zero indicates that there was no demand, and it is a valid measurement. In contrast, NaN signifies a lack of data, representing a genuine absence of information.

Should I fill the lacks?

When evaluating your data, consider how many NaN points you have. For instance, if you have 21 NaN points over 24 months, filling lacks is pointless. In such cases, filling those NaN values could lead to artificial results.

However, if you have 22 valid data points out of 24 months, it is acceptable to fill in the two missing values. The errors introduced by this approach will likely be negligible.

How to fill the lacks?

There are various methods to consider. The simplest approach involves data interpolation averaging pre- and post-points. A more sophisticated method is to forecast based on the data prior to the missing points.


Data preparation: Outliers

Determining the type of distribution is essential to identify outliers—data points that signify anomalies and do not conform to the overall data pattern. Based on experience, demand time series data (or its stationary equivalent) is most often characterized by a normal or log-normal distribution. If this is not the case, these distributions usually serve as good approximations.

Example: in the case of a normal distribution, it is typically assumed that points within the range of μ ± 3σ are valid, while any points falling outside this range are considered outliers.

How to remove them? Replace them with the value of limit, interpolate, average pre and post points or try to forecast it.

Data preparation in demand forecasting. Business assessment

From the previous section, we understand that data can be statistically tested, and significant conclusions can be drawn from the results. A lack of knowledge about statistical features can have serious implications for predictions; an incorrectly chosen model may yield good results with test data but perform poorly in the future.

  • However, is statistical assessment sufficient?

While it may be in some cases, it often isn't for many businesses.

Example: Demand patterns can be influenced by events such as promotions. If the promotional pattern is not cyclic, it may indicate an anomaly. Such anomalies can be detected through statistical tests if they occur for just one month.

However, if the unusual pattern lasts for three months, it could easily be overlooked and treated as normal.

Therefore, it is crucial to analyze data in the context of known past events. Demand data can be adjusted manually by planners—who have an understanding of past occurrences—or this process can be automated if quantitative data about such events is available.

Demand forecasting: important features to consider post-data assessment and preparation

If you have completed the data assessment and preparation process and the results allow for forecasting, it is time to identify the key features relevant to forecasting. This includes several important highlights, among others:

  • stationarity,
  • trends,
  • seasonality,
  • randomness of data.

Allow me to break them down one by one for you to gain clarity on the issue.

Feature 1. The stationarity of data

Let's begin with the definition of stationary data: these are data sets that do not exhibit trends or seasonality. It is also an important feature in case of outlier detection based on the distribution.

Why is this the case?

When data contains a trend and is normally distributed, a large value may not necessarily be an anomaly; it could simply be a consequence of the distribution characteristics. Such values can be valid if they align with an underlying growth trend, such as an exponential trend. Stationary data is typically achieved through techniques like differentiation, logarithmic transformation, or applying a square root.

Feature 2. Market trends

Regarding the topic of demand forecasting, the word "trend" is used differently in several contexts; thus, it is essential to highlight the meaning of this term that we touch upon. In this case, I refer to an increasing or decreasing tendency of a variable over time.

Example: the prices of gold have been rising every year.

Feature 3. Seasonality

Another factor—seasonality—refers to the level of fluctuation in the volume of sales in a precisely defined timeframe. Such dependencies may refer to all of your inventory or affect specific products alike.

Example: sales of Christmas lights are always highest in December.

Feature 4. Randomness

Randomness refers to data that does not exhibit any discernible pattern.

Example: the sales history of hair dryers can be considered random. Hair dryers are essential beauty tools found in nearly every household and are used daily, making their sales relatively independent of seasonal changes. Additionally, there have been no recent innovations in the hair dryer industry, resulting in no observable trends.

  • Is there any way to facilitate the assessment of those features?

Statistical tests are available for each of these features, so there is no need to assess them solely by examining the data. Tests for stationarity, trend, seasonality, and randomness are essential for selecting the appropriate prediction model; not all models are suitable for data that exhibits seasonality or trends. If the data is random, multivariate forecasting methods should be considered.

Forecast customer demand best practices: the conclusion

Understanding the statistical characteristics of data and its business context is crucial for effective data cleaning and selecting the appropriate model. When it comes to data cleaning and outlier detection, we must consider two types of anomalies: random anomalies that cannot be explained by business context, and planned events that are understood within the business framework. Both types need to be taken into account when identifying outliers.

In BiModal Forecasting, we have automated data cleaning based on statistical tests. However, correcting business data remains a significant challenge since many enterprises do not collect the necessary data for this purpose. Additionally, our tool uses a two-stage process for model selection: first, we identify potential models based on statistical analysis, eliminating any that are not valid; second, we apply all the viable models to determine which one performs the best.

We do not leave anything to chance. Our approach is transparent, and we do not offer a black box solution. Every step we take is backed by statistics and business rationale.



Request a demo

Leave us your phone number. We will contact you shortly!

  • United States+1
  • United Kingdom+44
  • Afghanistan (‫افغانستان‬‎)+93
  • Albania (Shqipëri)+355
  • Algeria (‫الجزائر‬‎)+213
  • American Samoa+1
  • Andorra+376
  • Angola+244
  • Anguilla+1
  • Antigua and Barbuda+1
  • Argentina+54
  • Armenia (Հայաստան)+374
  • Aruba+297
  • Ascension Island+247
  • Australia+61
  • Austria (Österreich)+43
  • Azerbaijan (Azərbaycan)+994
  • Bahamas+1
  • Bahrain (‫البحرين‬‎)+973
  • Bangladesh (বাংলাদেশ)+880
  • Barbados+1
  • Belarus (Беларусь)+375
  • Belgium (België)+32
  • Belize+501
  • Benin (Bénin)+229
  • Bermuda+1
  • Bhutan (འབྲུག)+975
  • Bolivia+591
  • Bosnia and Herzegovina (Босна и Херцеговина)+387
  • Botswana+267
  • Brazil (Brasil)+55
  • British Indian Ocean Territory+246
  • British Virgin Islands+1
  • Brunei+673
  • Bulgaria (България)+359
  • Burkina Faso+226
  • Burundi (Uburundi)+257
  • Cambodia (កម្ពុជា)+855
  • Cameroon (Cameroun)+237
  • Canada+1
  • Cape Verde (Kabu Verdi)+238
  • Caribbean Netherlands+599
  • Cayman Islands+1
  • Central African Republic (République centrafricaine)+236
  • Chad (Tchad)+235
  • Chile+56
  • China (中国)+86
  • Christmas Island+61
  • Cocos (Keeling) Islands+61
  • Colombia+57
  • Comoros (‫جزر القمر‬‎)+269
  • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
  • Congo (Republic) (Congo-Brazzaville)+242
  • Cook Islands+682
  • Costa Rica+506
  • Côte d’Ivoire+225
  • Croatia (Hrvatska)+385
  • Cuba+53
  • Curaçao+599
  • Cyprus (Κύπρος)+357
  • Czech Republic (Česká republika)+420
  • Denmark (Danmark)+45
  • Djibouti+253
  • Dominica+1
  • Dominican Republic (República Dominicana)+1
  • Ecuador+593
  • Egypt (‫مصر‬‎)+20
  • El Salvador+503
  • Equatorial Guinea (Guinea Ecuatorial)+240
  • Eritrea+291
  • Estonia (Eesti)+372
  • Eswatini+268
  • Ethiopia+251
  • Falkland Islands (Islas Malvinas)+500
  • Faroe Islands (Føroyar)+298
  • Fiji+679
  • Finland (Suomi)+358
  • France+33
  • French Guiana (Guyane française)+594
  • French Polynesia (Polynésie française)+689
  • Gabon+241
  • Gambia+220
  • Georgia (საქართველო)+995
  • Germany (Deutschland)+49
  • Ghana (Gaana)+233
  • Gibraltar+350
  • Greece (Ελλάδα)+30
  • Greenland (Kalaallit Nunaat)+299
  • Grenada+1
  • Guadeloupe+590
  • Guam+1
  • Guatemala+502
  • Guernsey+44
  • Guinea (Guinée)+224
  • Guinea-Bissau (Guiné Bissau)+245
  • Guyana+592
  • Haiti+509
  • Honduras+504
  • Hong Kong (香港)+852
  • Hungary (Magyarország)+36
  • Iceland (Ísland)+354
  • India (भारत)+91
  • Indonesia+62
  • Iran (‫ایران‬‎)+98
  • Iraq (‫العراق‬‎)+964
  • Ireland+353
  • Isle of Man+44
  • Israel (‫ישראל‬‎)+972
  • Italy (Italia)+39
  • Jamaica+1
  • Japan (日本)+81
  • Jersey+44
  • Jordan (‫الأردن‬‎)+962
  • Kazakhstan (Казахстан)+7
  • Kenya+254
  • Kiribati+686
  • Kosovo+383
  • Kuwait (‫الكويت‬‎)+965
  • Kyrgyzstan (Кыргызстан)+996
  • Laos (ລາວ)+856
  • Latvia (Latvija)+371
  • Lebanon (‫لبنان‬‎)+961
  • Lesotho+266
  • Liberia+231
  • Libya (‫ليبيا‬‎)+218
  • Liechtenstein+423
  • Lithuania (Lietuva)+370
  • Luxembourg+352
  • Macau (澳門)+853
  • North Macedonia (Македонија)+389
  • Madagascar (Madagasikara)+261
  • Malawi+265
  • Malaysia+60
  • Maldives+960
  • Mali+223
  • Malta+356
  • Marshall Islands+692
  • Martinique+596
  • Mauritania (‫موريتانيا‬‎)+222
  • Mauritius (Moris)+230
  • Mayotte+262
  • Mexico (México)+52
  • Micronesia+691
  • Moldova (Republica Moldova)+373
  • Monaco+377
  • Mongolia (Монгол)+976
  • Montenegro (Crna Gora)+382
  • Montserrat+1
  • Morocco (‫المغرب‬‎)+212
  • Mozambique (Moçambique)+258
  • Myanmar (Burma) (မြန်မာ)+95
  • Namibia (Namibië)+264
  • Nauru+674
  • Nepal (नेपाल)+977
  • Netherlands (Nederland)+31
  • New Caledonia (Nouvelle-Calédonie)+687
  • New Zealand+64
  • Nicaragua+505
  • Niger (Nijar)+227
  • Nigeria+234
  • Niue+683
  • Norfolk Island+672
  • North Korea (조선 민주주의 인민 공화국)+850
  • Northern Mariana Islands+1
  • Norway (Norge)+47
  • Oman (‫عُمان‬‎)+968
  • Pakistan (‫پاکستان‬‎)+92
  • Palau+680
  • Palestine (‫فلسطين‬‎)+970
  • Panama (Panamá)+507
  • Papua New Guinea+675
  • Paraguay+595
  • Peru (Perú)+51
  • Philippines+63
  • Poland (Polska)+48
  • Portugal+351
  • Puerto Rico+1
  • Qatar (‫قطر‬‎)+974
  • Réunion (La Réunion)+262
  • Romania (România)+40
  • Russia (Россия)+7
  • Rwanda+250
  • Saint Barthélemy+590
  • Saint Helena+290
  • Saint Kitts and Nevis+1
  • Saint Lucia+1
  • Saint Martin (Saint-Martin (partie française))+590
  • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
  • Saint Vincent and the Grenadines+1
  • Samoa+685
  • San Marino+378
  • São Tomé and Príncipe (São Tomé e Príncipe)+239
  • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
  • Senegal (Sénégal)+221
  • Serbia (Србија)+381
  • Seychelles+248
  • Sierra Leone+232
  • Singapore+65
  • Sint Maarten+1
  • Slovakia (Slovensko)+421
  • Slovenia (Slovenija)+386
  • Solomon Islands+677
  • Somalia (Soomaaliya)+252
  • South Africa+27
  • South Korea (대한민국)+82
  • South Sudan (‫جنوب السودان‬‎)+211
  • Spain (España)+34
  • Sri Lanka (ශ්‍රී ලංකාව)+94
  • Sudan (‫السودان‬‎)+249
  • Suriname+597
  • Svalbard and Jan Mayen+47
  • Sweden (Sverige)+46
  • Switzerland (Schweiz)+41
  • Syria (‫سوريا‬‎)+963
  • Taiwan (台灣)+886
  • Tajikistan+992
  • Tanzania+255
  • Thailand (ไทย)+66
  • Timor-Leste+670
  • Togo+228
  • Tokelau+690
  • Tonga+676
  • Trinidad and Tobago+1
  • Tunisia (‫تونس‬‎)+216
  • Turkey (Türkiye)+90
  • Turkmenistan+993
  • Turks and Caicos Islands+1
  • Tuvalu+688
  • U.S. Virgin Islands+1
  • Uganda+256
  • Ukraine (Україна)+380
  • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
  • United Kingdom+44
  • United States+1
  • Uruguay+598
  • Uzbekistan (Oʻzbekiston)+998
  • Vanuatu+678
  • Vatican City (Città del Vaticano)+39
  • Venezuela+58
  • Vietnam (Việt Nam)+84
  • Wallis and Futuna (Wallis-et-Futuna)+681
  • Western Sahara (‫الصحراء الغربية‬‎)+212
  • Yemen (‫اليمن‬‎)+967
  • Zambia+260
  • Zimbabwe+263
  • Åland Islands+358
Thank you! Your message has been sent.
Unable to send your message. Please fix errors then try again.