Exponential smoothing method. Forecasting by exponential smoothing (ES, exponential smoothing)

Exponential Smoothing - a method of smoothing time series, the computational procedure of which includes the processing of all previous observations, while taking into account the obsolescence of information as it moves away from the forecast period. In other words, the "older" the observation, the less it should affect the value of the predictive estimate. Idea exponential smoothing is that, as the corresponding observations "age", decreasing weights are attached.

This forecasting method is considered to be very effective and reliable. The main advantages of the method are the ability to take into account the weights background information, in the simplicity of computational operations, in the flexibility of describing various dynamic processes. The exponential smoothing method makes it possible to obtain an estimate of the trend parameters that characterize middle level process, but the trend prevailing at the time of the last observation. The method has found the greatest application for the implementation of medium-term forecasts. For the exponential smoothing method, the main point is the choice of the smoothing parameter (smoothing constant) and initial conditions.

A simple exponential smoothing of time series containing a trend leads to systematic error associated with the lag of the smoothed values from the actual levels of the time series. To take into account the trend in non-stationary series, a special two-parameter linear exponential smoothing is used. Unlike simple exponential smoothing with one smoothing constant (parameter), this procedure smoothes both random disturbances and the trend simultaneously using two different constants (parameters). The two-parameter smoothing method (Holt method) includes two equations. The first is for smoothing the observed values, and the second is for trend smoothing:

where I - 2, 3, 4 - periods of smoothing; 5, - smoothed value for the period £; U, - the actual value of the level for the period 1 5, 1 - smoothed value for the period b-b- smoothed trend value for the period 1 - smoothed value for the period I- 1; BUT and B are smoothing constants (numbers between 0 and 1).

Smoothing constants A and B characterize the weighting factor of the observations. Usually L. AT< 0.3. Since (1 - BUT)< 1, (1 - AT)< 1, then they decrease exponentially as the observation moves away from the current period I. Hence, this procedure is called exponential smoothing.

An equation is added to the general procedure to smooth the trend. Each new trend estimate is obtained as a weighted sum of the difference between the last two smoothed values (the current trend estimate) and the previous smoothed estimate. This equation allows to significantly reduce the influence of random disturbances on the trend over time.

Forecasting using exponential smoothing is similar to the "naive" forecasting procedure, when the forecast estimate for tomorrow is assumed to be equal to today's value. AT this case as a forecast for one period ahead, the smoothed value for the current period plus the current smoothed trend value is considered:

This procedure can be used to forecast for any number of periods, for example, t periods:

The forecasting procedure begins with the fact that the smoothed value 51 is assumed to be equal to the first observation Y, i.e. 5, = Y,.

There is a problem of determining the initial value of the trend 6]. There are two ways to evaluate bx.

Method 1. Let's put bx = 0. This approach works well in the case of a long initial time series. Then the smoothed trend for not big number periods will approach the actual value of the trend.

Method 2. Can get more accurate estimate 6 using the first five (or more) observations of the time series. Based on them, the gyu method least squares the equation is solved Y(= a + b x g. Value b is taken as the initial value of the trend.

How much Forecast NOW! better models Exponential smoothing (ES) you can see in the chart below. On the X axis - the item number, on the Y axis - percentage improvement in the quality of the forecast. Description of the model, a detailed study, the results of experiments, read below.

Model description

Exponential smoothing forecasting is one of the most simple ways forecasting. A forecast can only be obtained for one period ahead. If forecasting is carried out in terms of days, then only one day ahead, if weeks, then one week.

For comparison, forecasting was carried out a week ahead for 8 weeks.

What is exponential smoothing?

Let the row With represents the original sales series for forecasting

C(1)- first week sales With(2) in the second and so on.

Figure 1. Sales by week, row With

Likewise, a row S represents an exponentially smoothed series of sales. The coefficient α is from zero to one. It turns out as follows, here t is a point in time (day, week)

S (t+1) = S(t) + α *(С(t) - S(t))

Large values of the smoothing constant α speed up the response of the forecast to the jump in the observed process, but can lead to unpredictable outliers, because smoothing will be almost absent.

For the first time after the start of observations, having only one result of observations C (1) when forecast S (1) no, and it is still impossible to use formula (1), as a forecast S (2) should take C (1) .

The formula can easily be rewritten in a different form:

S (t+1) = (1 -α )* S (t) +α * With (t).

Thus, with an increase in the smoothing constant, the share of recent sales increases, and the share of smoothed previous sales decreases.

The constant α is chosen empirically. Usually, several forecasts are made for different constants and the most optimal constant is selected in terms of the selected criterion.

The criterion may be the accuracy of forecasting for previous periods.

In our study, we considered exponential smoothing models in which α takes the values (0.2, 0.4, 0.6, 0.8). For comparison with the Forecast NOW! for each product, forecasts were made for each α, and the most accurate forecast was chosen. In reality, the situation would be much more complicated, the user, not knowing in advance the accuracy of the forecast, needs to decide on the coefficient α, on which the quality of the forecast depends very much. Here is such a vicious circle.

clearly

Figure 2. α =0.2 , the degree of exponential smoothing is high, real sales are poorly taken into account

Figure 3. α =0.4 , the degree of exponential smoothing is average, real sales are taken into account in the average degree

You can see how as the constant α increases, the smoothed series more and more corresponds to real sales, and if there are outliers or anomalies, we will get a very inaccurate forecast.

Figure 4. α =0.6 , the degree of exponential smoothing is low, real sales are taken into account significantly

We can see that at α=0.8, the series almost exactly repeats the original one, which means that the forecast tends to the rule “the same amount will be sold as yesterday”

It should be noted that here it is absolutely impossible to focus on the error of approximation to the original data. You can achieve a perfect match, but get an unacceptable prediction.

Figure 5. α = 0.8 , the degree of exponential smoothing is extremely low, real sales are taken into account strongly

Forecast examples

Now let's look at the predictions that are made using different meanings a. As can be seen from Figures 6 and 7, the greater the smoothing coefficient, the more accurately it repeats real sales with a delay of one step, the forecast. Such a delay can actually be critical, so you can’t just choose maximum value a. Otherwise, we will end up with a situation where we say that exactly as much will be sold as was sold in the previous period.

Figure 6. Prediction of the exponential smoothing method for α=0.2

Figure 7. Prediction of the exponential smoothing method for α=0.6

Let's see what happens when α = 1.0. Recall that S - predicted (smoothed) sales, C - real sales.

S (t+1) = (1 -α )* S (t) +α * With (t).

S (t+1) = With (t).

Sales on day t+1 are predicted to be equal to sales on the previous day. Therefore, the choice of a constant must be approached wisely.

Comparison with Forecast NOW!

Now consider this method forecasting versus Forecast NOW!. The comparison was conducted on 256 products that have different sales, with short-term and long-term seasonality, with “bad” sales and shortages, stocks and other outliers. For each product, a forecast was built using the exponential smoothing model, for various α, the best one was selected and compared with the forecast using the Forecast NOW!

In the table below, you can see the value of the forecast error for each item. The error here was considered as RMSE. This is the root of standard deviation prediction from reality. Roughly speaking, it shows by how many units of goods we deviated in the forecast. The improvement shows by what percent the Forecast NOW! it is better if the number is positive, and worse if it is negative. In Figure 8, the x-axis shows goods, the y-axis indicates how much the Forecast NOW! better than exponential smoothing prediction. As you can see from this graph, Forecast NOW! almost always twice as high and almost never worse. In practice, this means that using Forecast NOW! will allow to halve stocks or reduce shortages.

9 5. Method of exponential smoothing. Selecting a smoothing constant

When using the least squares method to determine the predictive trend (trend), it is assumed in advance that all retrospective data (observations) have the same information content. Obviously, it would be more logical to take into account the process of discounting the initial information, that is, the unequal value of these data for developing a forecast. This is achieved in the exponential smoothing method by giving the last observation dynamic series(that is, the values immediately preceding the forecast lead period) of more significant "weights" compared to the initial observations. The advantages of the exponential smoothing method should also include the simplicity of computational operations and the flexibility of describing various process dynamics. The method has found the greatest application for the implementation of medium-term forecasts.

5.1. The essence of the exponential smoothing method

The essence of the method is that the time series is smoothed using a weighted "moving average", in which the weights obey the exponential law. In other words, the farther from the end of the time series is the point for which the weighted moving average is calculated, the less "participation it takes" in the development of the forecast.

Let the original dynamic series consist of levels (series components) y t , t = 1 , 2 ,...,n . For each m successive levels of this series

dynamic series with a step equal to one. If m is an odd number, and it is preferable to take an odd number of levels, since in this case the calculated level value will be in the center of the smoothing interval and it is easy to replace the actual value with it, then the following formula can be written to determine the moving average:

			t+ ξ			t+ ξ
			∑ y i			∑ y i
			i= t−ξ			i= t−ξ
			2ξ + 1
			2ξ + 1

where y t is the value of the moving average for moment t (t = 1 , 2 ,...,n ); y i is the actual value of the level at moment i ;

i is the ordinal number of the level in the smoothing interval.

The value of ξ is determined from the duration of the smoothing interval.

Insofar as

m =2 ξ +1

for odd m, then

ξ = m 2 − 1 .

The calculation of the moving average for a large number of levels can be simplified by defining successive values of the moving average recursively:

y t= y t− 1 +	yt + ξ	− y t − (ξ + 1 )
		2ξ + 1

But given the fact that the latest observations need to be given more "weight", the moving average needs a different interpretation. It lies in the fact that the value obtained by averaging replaces not the central term of the averaging interval, but its last term. Accordingly, the last expression can be rewritten as

Mi = Mi + 1		y i− y i− m

Here the moving average, related to the end of the interval, is denoted by the new symbol M i . Essentially, M i is equal to y t shifted ξ steps to the right, that is, M i = y t + ξ , where i = t + ξ .

Considering that M i − 1 is an estimate of y i − m , expression (5.1)

can be rewritten in the form


		y i+ 1	M i − 1 ,

	M i defined by expression (5.1).
where M i is the estimate	M i defined by expression (5.1).
If calculations (5.2) are repeated as new information arrives
and rewrite in a different form, then we obtain a smoothed observation function:
	Q i= α y i+ (1 − α ) Q i− 1 ,
or in the equivalent form
	Q t= α y t+ (1 − α ) Q t− 1

Calculations carried out by expression (5.3) with each new observation are called exponential smoothing. In the last expression, to distinguish exponential smoothing from moving average, the notation Q is introduced instead of M . The value α , which is

analogue of m 1 is called the smoothing constant. The values of α lie in

interval [ 0 , 1 ] . If α is represented as a series

α + α(1 − α) + α(1 − α) 2 + α(1 − α) 3 + ... + α(1 − α) n ,

it is easy to see that the "weights" decrease exponentially in time. For example, for α = 0 , 2 we get

0,2 + 0,16 + 0,128 + 0,102 + 0,082 + …

The sum of the series tends to unity, and the terms of the sum decrease with time.

The value of Q t in expression (5.3) is the exponential average of the first order, that is, the average obtained directly from

smoothing the observation data (primary smoothing). Sometimes, when developing statistical models, it is useful to resort to the calculation of exponential averages of higher orders, that is, averages obtained by repeated exponential smoothing.

The general notation in the recursive form of the exponential mean of order k is

Q t (k)= α Q t (k− 1 )+ (1 − α ) Q t (− k1 ).

The value of k varies within 1, 2, …, p ,p+1 , where p is the order of the predictive polynomial (linear, quadratic, and so on).

Based on this formula, for the exponential average of the first, second and third orders, the expressions

Q t (1 )= α y t + (1 − α ) Q t (− 1 1 );

Q t (2 )= α Q t (1 )+ (1 − α ) Q t (− 2 1 ); Q t (3 )= α Q t (2 )+ (1 − α ) Q t (− 3 1 ).

5.2. Determining the parameters of the predictive model using the exponential smoothing method

Obviously, in order to develop predictive values based on the dynamic series using the exponential smoothing method, it is necessary to calculate the coefficients of the trend equation through exponential averages. The estimates of the coefficients are determined by the fundamental theorem of Brown-Meyer, which relates the coefficients of the predictive polynomial to the exponential averages of the corresponding orders:

(− 1 )

aˆp

α (1 − α )∞

−α )

j (p − 1 + j ) !

∑ j

p=0

p! (k− 1 ) !j = 0

where aˆ p are estimates of the coefficients of the polynomial of degree p .

The coefficients are found by solving the system (p + 1 ) of equations сp + 1

unknown.

So, for a linear model

aˆ 0 = 2 Q t (1 ) − Q t (2 ) ; aˆ 1 = 1 − α α (Q t (1 )− Q t (2 )) ;

for a quadratic model

aˆ 0 = 3 (Q t (1 )− Q t (2 )) + Q t (3 );

aˆ 1 =1 − α α [ (6 −5 α ) Q t (1 ) −2 (5 −4 α ) Q t (2 ) +(4 −3 α ) Q t (3 ) ] ;

aˆ 2 = (1 − α α ) 2 [ Q t (1 )− 2 Q t (2 )+ Q t (3 )] .

The forecast is implemented according to the selected polynomial, respectively, for the linear model

ˆyt + τ = aˆ0 + aˆ1 τ ;

for a quadratic model

ˆyt + τ = aˆ0 + aˆ1 τ + aˆ 2 2 τ 2 ,

where τ is the prediction step.

It should be noted that the exponential averages Q t (k ) can be calculated only with a known (chosen) parameter, knowing the initial conditions Q 0 (k ) .

Estimates of initial conditions, in particular, for a linear model

Q(1)=a			1 − α
Q(1)=a


Q(2 ) = a − 2 (1 − α ) a

for a quadratic model

Q(1)=a

1 − α

+ (1 − α )(2 − α ) a

2(1−α )

(1− α )(3− 2α )

Q 0(2 ) = a 0−

2α 2

Q(3)=a

3(1−α )

(1 − α )(4 − 3 α ) a

where the coefficients a 0 and a 1 are calculated by the least squares method.

The value of the smoothing parameter α is approximately calculated by the formula

α ≈ m 2 + 1,

where m is the number of observations (values) in the smoothing interval. The sequence of calculation of predictive values is shown in

	Calculation of coefficients of a series by the method of least squares

	Determination of the smoothing interval

	Calculation of the smoothing constant

	Calculation of initial conditions

	Computing exponential averages

	Calculation of estimates a 0 , a 1 , etc.

	Calculation of forecast values of a series
	Rice. 5.1. The sequence of calculation of forecast values

As an example, consider the procedure for obtaining the predictive value of the product's uptime, expressed by the time between failures.

The initial data are summarized in table. 5.1.

We choose a linear forecasting model in the form y t = a 0 + a 1 τ

The solution is feasible with the following initial values:

a 0 , 0 = 64, 2; a 1 , 0 = 31.5; α = 0.305.

Table 5.1. Initial data

Observation number, t

Step length, prediction, τ

MTBF, y (hour)

For these values, the calculated "smoothed" coefficients for

y 2 values will be equal

= α Q (1 )− Q (2 )= 97 , 9 ;

[ Q (1 ) − Q (2 )

31, 9 ,

1−α

under initial conditions

1 − α

A 0 , 0 −

a 1, 0

= −7 , 6

1 − α

= −79 , 4

and exponential averages

Q (1 )= α y + (1 − α ) Q (1 )

25, 2;

Q(2)

= α Q (1 )

+ (1 −α ) Q (2 ) = −47 , 5 .

The “smoothed” value y 2 is then calculated by the formula

Q i (1 )

Q i (2 )

a 0 ,i

a 1 ,i

ˆyt

Thus (Table 5.2), the linear predictive model has the form

ˆy t + τ = 224.5+ 32τ .

Let us calculate the predicted values for lead periods of 2 years (τ = 1 ), 4 years (τ = 2 ) and so on, the time between failures of the product (Table 5.3).

Table 5.3. Forecast valuesˆy t


The equation	t+2	t+4	t+6		t+8	t+20
regression	(τ = 1)	(τ=2)	(τ = 3)			(τ=5)
				τ =
ˆy t = 224.5+ 32τ

It should be noted that the total "weight" of the last m values of the time series can be calculated by the formula

c = 1 − (m (− 1 ) m ) . m+ 1

Thus, for the last two observations of the series (m = 2 ) the value c = 1 − (2 2 − + 1 1 ) 2 = 0. 667 .

5.3. Choice of initial conditions and determination of the smoothing constant

As follows from the expression

Q t= α y t+ (1 − α ) Q t− 1 ,

when performing exponential smoothing, it is necessary to know the initial (previous) value of the smoothed function. In some cases, for initial value one can take the first observation, more often the initial conditions are determined according to expressions (5.4) and (5.5). In this case, the values a 0 , 0 ,a 1 , 0

and a 2 , 0 are determined by the least squares method.

If we do not really trust the chosen initial value, then by taking a large value of the smoothing constant α through k observations, we will bring

"weight" of the initial value up to the value (1 − α ) k<< α , и оно будет практически забыто. Наоборот, если мы уверены в правильности выбранного начального значения и неизменности модели в течение определенного отрезка времени в будущем,α может быть выбрано малым (близким к 0).

Thus, the choice of the smoothing constant (or the number of observations in the moving average) involves a trade-off. Usually, as practice shows, the value of the smoothing constant lies in the range from 0.01 to 0.3.

Several transitions are known that allow one to find an approximate estimate of α . The first follows from the condition that the moving average and the exponential average are equal

α \u003d m 2 + 1,

where m is the number of observations in the smoothing interval. Other approaches are associated with the accuracy of the forecast.

So, it is possible to determine α based on the Meyer relation:

α ≈ S y ,

where S y is the standard error of the model;

S 1 is the mean square error of the original series.

However, the use of the latter ratio is complicated by the fact that it is very difficult to reliably determine S y and S 1 from the initial information.

Often the smoothing parameter, and at the same time the coefficients a 0 , 0 and a 0 , 1

are selected as optimal depending on the criterion

S 2 = α ∑ ∞ (1 − α ) j [ yij − ˆyij ] 2 → min

j=0

by solving the algebraic system of equations, which is obtained by equating the derivatives to zero

∂S2	∂S2	∂S2

∂a0, 0	∂ a 1, 0	∂a2, 0

So, for a linear forecasting model, the initial criterion is equal to

S 2 = α ∑ ∞ (1 − α ) j [ yij − a0 , 0 − a1 , 0 τ ] 2 → min.

j=0

The solution of this system with the help of a computer does not present any difficulties.

For a reasonable choice of α, you can also use the generalized smoothing procedure, which allows you to obtain the following relations relating the forecast variance and the smoothing parameter for a linear model:

S p 2 ≈[ 1 + α β ] 2 [ 1 +4 β +5 β 2 +2 α (1 +3 β ) τ +2 α 2 τ 3 ] S y 2

for a quadratic model

S p 2≈ [ 2 α + 3 α 3+ 3 α 2τ ] S y 2,

where β = 1 − α ;Sy– RMS approximation of the initial dynamic series.

Obviously, in the weighted moving average method, there are many ways to set the weights so that their sum is equal to 1. One of these methods is called exponential smoothing. In this scheme of the weighted average method, for any t > 1, the forecast value at time t+1 is the weighted sum of the actual sales, , in time period t, and the forecasted sales, , in time period t In other words,

Exponential smoothing has computational advantages over moving averages. Here, in order to calculate , it is only necessary to know the values of , and , (together with the value of α). For example, if a company needs to forecast demand for 5,000 items in each time period, then it would need to store 10,001 data values (5,000 values, 5,000 values, and an α value), while to make a forecast based on a moving average of 8 nodes required 40,000 data values. Depending on the behavior of the data, it may be necessary to store different values of α for each product, but even in this case, the amount of information stored is much less than when using a moving average. The good thing about exponential smoothing is that by keeping α and the last prediction, all previous predictions are also implicitly preserved.

Let's consider some properties of the exponential smoothing model. To begin with, we note that if t > 2, then in formula (1) t can be replaced by t–1, i.e. Substituting this expression into the original formula (1), we obtain

Performing successively similar substitutions, we obtain following expression for

Since from the inequality 0< α < 1 следует, что 0 < 1 – α < 1, то Другими словами, наблюдение , имеет больший вес, чем наблюдение , которое, в свою очередь, имеет больший вес, чем . Это иллюстрирует основное свойство модели экспоненциального сглаживания - коэффициенты при убывают при уменьшении номера k. Также можно показать, что сумма всех коэффициентов (включая коэффициент при ), равна 1.

It can be seen from formula (2) that the value is the weighted sum of all previous observations (including the last observation ). The last term of the sum (2) is not statistical observation, but by "assumption" (we can assume, for example, that ). Obviously, with increasing t, the influence on the forecast decreases, and at a certain moment it can be neglected. Even if the value of α is small enough (such that (1 - α) is approximately equal to 1), the value will decrease rapidly.

The value of the parameter α greatly affects the performance of the prediction model, since α is the weight of the most recent observation. This means that one should assign greater valueα in the case when the most predictive model is the last observation. If α is close to 0, this means almost complete confidence in the previous forecast and ignoring the last observation.

Victor had a problem: how the best way choose the value of α. Again, the Solver tool will help you with this. To find the optimal value of α (i.e., the one at which the predictive curve will deviate the least from the time series value curve), do the following.

Select the command Tools -> Search for a solution.
In the Find Solution dialog box that opens, set the target cell to G16 (see the Expo sheet) and specify that its value should be the minimum.
Specify that the cell to be modified is cell B1.
Enter constraints B1 > 0 and B1< 1
By clicking on the Run button, you will get the result shown in Fig. eight.

Again, as in the weighted moving average method, the best prediction will be obtained by assigning the full weight to the last observation. Therefore, the optimal value of α is 1, with the mean absolute deviations being 6.82 (cell G16). Victor received a forecast that he had already seen before.

The exponential smoothing method works well in situations where the variable of interest to us behaves stationary, and its deviations from a constant value are caused by random factors and are not regular. But: regardless of the value of the parameter α, the method of exponential smoothing will not be able to predict monotonically increasing or monotonically decreasing data (the predicted values will always be less or more than the observed ones, respectively). It can also be shown that in a model with seasonal variations, it will not be possible to obtain satisfactory forecasts by this method.

If statistics change monotonously or are subject to seasonal changes, special methods predictions, which will be discussed below.

Holt method (exponential smoothing with a trend)

Holt's method allows forecasting for k time periods ahead. The method, as you can see, uses two parameters α and β. The values of these parameters range from 0 to 1. The variable L, indicates the long-term level of values, or the underlying value of the time series data. The variable T indicates the possible increase or decrease in values over one period.

Let's consider the work of this method on a new example. Svetlana works as an analyst in a large brokerage firm. Based on the quarterly reports she has for Startup Airlines, she wants to forecast that company's earnings for the next quarter. The available data and the diagram built on their basis are in the Startup.xls workbook (Fig. 9). It can be seen that the data have a clear trend (almost monotonously increasing). Svetlana wants to use the Holt method to predict earnings per share for the thirteenth quarter. To do this, you must set the initial values for L and T. There are several choices: 1) L is equal to the value of earnings per share for the first quarter and T = 0; 2) L is equal to the average value of earnings per share for 12 quarters and T is equal to the average change for all 12 quarters. There are other options for the initial values for L and T, but Svetlana chose the first option.

She decided to use the Find Solution tool to find the optimal value of the parameters α and β, at which the value of the mean absolute errors percentage would be minimal. To do this, you need to follow these steps.

Select the command Service -> Search for a solution.

In the Search for a solution dialog box that opens, set cell F18 as the target cell and indicate that its value should be minimized.

In the Changing cells field, enter the range of cells B1:B2. Add constraints B1:B2 > 0 and B1:B2< 1.

Click on the Execute button.

The resulting forecast is shown in fig. ten.

As can be seen, the optimal values turned out to be α = 0.59 and β = 0.42, while the average absolute errors in percent is 38%.

Accounting seasonal changes

Seasonal changes should be taken into account when forecasting from time series data Seasonal changes are up and down fluctuations with a constant period in the values of a variable.

For example, if you look at ice cream sales by month, you can see in warm months(June to August in the northern hemisphere) over high level sales than in winter, and so every year. Here seasonal fluctuations have a period of 12 months. If weekly data are used, then the structure seasonal fluctuations will be repeated every 52 weeks Another example analyzes weekly reports on the number of guests who stayed overnight in a hotel located in the business center of the city. Presumably, it can be said that a large number of customers are expected on the night of Tuesday, Wednesday and Thursday, the least number of customers will be on Saturday and Sunday nights, and the average number of guests is expected on Friday and Monday nights. Such a data structure that displays the number of customers in different days weeks, will be repeated every seven days.

The procedure for making a seasonally adjusted forecast consists of the following four steps:

1) Based on the initial data, the structure of seasonal fluctuations and the period of these fluctuations are determined.

3) Based on the data, from which the seasonal component is excluded, the best possible forecast is made.

4) The seasonal component is added to the received forecast.

Let's illustrate this approach with coal sales data (measured in thousands of tons) in the US for nine years as a manager at Gillette Coal Mine, Frank needs to forecast coal demand for the next two quarters. He entered data for the entire coal industry into the Coal.xls workbook and plotted the data (Figure 11). The graph shows that sales volumes are above average in the first and fourth quarters ( winter time year) and below average in the second and third quarters (spring-summer months).

Exclusion of the seasonal component

First you need to calculate the average of all deviations for one period of seasonal changes. To exclude the seasonal component within one year, data for four periods (quarters) are used. And to exclude the seasonal component from the entire time series, a sequence of moving averages over T nodes is calculated, where T is the duration of seasonal fluctuations. To perform the necessary calculations, Frank used columns C and D, as shown in Fig. below. Column C contains the 4-node moving average based on the data in column B.

Now we need to assign the resulting moving average values to the midpoints of the data sequence from which these values were calculated. This operation is called centering values. If T is odd, then the first value of the moving average (the average of the values from the first to T-point) should be assigned (T + 1)/2 to the point (for example, if T = 7, then the first moving average will be assigned to the fourth point). Similarly, the average of the values from the second to the (T + 1)th point is centered at the (T + 3)/2 point, and so on. The center of the nth interval is at the point (T+(2n-1))/2.

If T is even, as in the case under consideration, then the problem becomes somewhat more complicated, since here the central (middle) points are located between the points for which the value of the moving average was calculated. Therefore, the centered value for the third point is calculated as the average of the first and second values of the moving average. For example, the first number in column D of the centered means in Fig. 12, on the left is (1613 + 1594)/2 = 1603. In fig. 13 shows plots of raw data and centered averages.

Next, we find the ratios of the values of the data points to the corresponding values of the centered means. Since the points at the beginning and end of the data sequence do not have corresponding centered means (see the first and latest values in column D), this action does not apply to these points. These ratios indicate the extent to which the data values deviate from the typical level defined by the centered means. Note that the ratio values for the third quarters are less than 1, and those for the fourth quarters are greater than 1.

These relationships are the basis for creating seasonal indices. To calculate them, the calculated ratios are grouped by quarters, as shown in Fig. 15 in columns G-O.

Then the average values of the ratios for each quarter are found (column E in Fig. 15). For example, the average of all ratios for the first quarter is 1.108. This value is a seasonal index for the first quarter, from which it can be concluded that the volume of coal sales for the first quarter averages about 110.8% of the relative average annual sales.

Seasonal index is the average ratio of data relating to one season (in this case, the season is a quarter) to all data. If the seasonal index is greater than 1, then the performance of this season is above the average for the year, similarly, if the seasonal index is below 1, then the performance of the season is below the average for the year.

Finally, to exclude the seasonal component from the original data, the values of the original data should be divided by the corresponding seasonal index. The results of this operation are shown in columns F and G (Fig. 16). A plot of data that no longer contains a seasonal component is shown in Fig. 17.

Forecasting

Based on the data, from which the seasonal component is excluded, a forecast is built. To do this, an appropriate method is used that takes into account the nature of the behavior of the data (for example, the data has a trend or is relatively constant). In this example, the forecast is made using simple exponential smoothing. The optimal value of the parameter α is found using the Solver tool. The graph of the forecast and real data with the excluded seasonal component is shown in fig. eighteen.

Accounting for seasonal structure

Now we need to take into account the seasonal component in the forecast (1726.5). To do this, multiply 1726 by the seasonal index of the first quarter of 1.108, resulting in a value of 1912. A similar operation (multiplying 1726 by the seasonal index of 0.784) will give a forecast for the second quarter, equal to 1353. The result of adding the seasonal structure to the resulting forecast is shown in Fig. nineteen.

Task options:

Task 1

Given a time series

t
x

1. Plot the dependence x = x(t).

Using a simple moving average over 4 nodes, predict demand at the 11th time point.
Is this forecasting method suitable for this data or not? Why?
Pick up linear function approximation of data by the method of least squares.

Task 2

Using the Startup Airlines Revenue Forecast Model (Startup.xls), do the following:

Task 3

For time series

t
x

run:

Using a weighted moving average over 4 nodes, and assigning weights 4/10, 3/10, 2/10, 1/10, predict demand at the 11th time point. More weight should be assigned to more recent observations.
Is this approximation better than a simple moving average over 4 nodes? Why?
Find the mean of absolute deviations.
Use the Solver tool to find the optimal node weights. How much did the approximation error decrease?
Use exponential smoothing to predict. Which of the methods used gives the best results?

Task 4

Analyze Time Series

Time

Demand

Use a 4-node weighted moving average with weights 4/10, 3/10, 2/10, 1/10 to get a forecast at times 5-13. More weight should be assigned to more recent observations.
Find the mean of absolute deviations.
Do you think this approximation is better than the 4-node simple moving average model? Why?
Use the Solver tool to find the optimal node weights. By how much did you manage to reduce the error value?
Use exponential smoothing to predict. Which of the methods used gives the best result?

Task 5

Given a time series

Task 7

The marketing manager of a small developing company that contains a chain of grocery stores has information on sales volumes for the entire time of the existence of the most profitable store (see table).

Using a simple moving average over 3 nodes, predict the values at nodes 4 through 11.

Using a weighted moving average over 3 nodes, predict the values at nodes 4 through 11. Use the Solver tool to determine the optimal weights.

Use exponential smoothing to predict the values at nodes 2-11. Determine the optimal value of the parameter α using the Solver tool.

Which of the forecasts obtained is the most accurate and why?

Task 8

Given a time series

Plot this time series. Connect the points with straight lines.
Using a simple moving average over 4 nodes, predict demand for nodes 5-13.
Find the mean of absolute deviations.
Is it reasonable to use this forecasting method for the presented data?
Is this approximation better than a simple moving average over 3 nodes? Why?
Plot a linear and quadratic trend from the data.
Use exponential smoothing to predict. Which of the methods used gives the best results?

Task 10

The Business_Week.xls workbook shows data from Business Week for 43 months of monthly car sales.

Remove the seasonal component from these data.
Determine best method forecasting for the available data.
What is the forecast for the 44th period?

Task 11

simple circuit forecasting, when the value for the last week is taken as the forecast for the next week.
Moving average method (with the number of nodes of your choice). Try using several different node values.

Task 12

The Bank.xls workbook shows the performance of the bank. Consider following methods predicting the values of this time series.

As a forecast, the average value of the indicator for all previous weeks is used.

Weighted moving average method (with the number of nodes of your choice). Try using several different node values. Use the Solver tool to determine the optimal weights.

Exponential smoothing method. Find the optimal value of the parameter α using the Solver tool.

Which of the forecasting methods proposed above would you recommend for predicting the values of this time series?

Literature

Similar information.

04/02/2011 - Man's desire to lift the veil of the future and foresee the course of events has the same long history as his attempts to understand the world. It is obvious that quite strong vital motives (theoretical and practical) underlie the interest in the forecast. Forecast acts as the most important method testing scientific theories and hypotheses. The ability to foresee the future is an integral part of consciousness, without which human life itself would be impossible.

The concept of “forecasting” (from the Greek prognosis - foresight, prediction) means the process of developing a probabilistic judgment about the state of a phenomenon or process in the future, this is the knowledge of what is not yet, but what may come in the near or distant future.

The content of the forecast is more complex than the prediction. On the one hand, it reflects the most probable state of the object, and on the other hand, it determines the ways and means to achieve the desired result. On the basis of the information obtained in a predictive way, certain decisions are made to achieve the desired goal.

It should be noted that the dynamics of economic processes in modern conditions characterized by instability and uncertainty, which makes it difficult to use traditional forecasting methods.

Exponential Smoothing and Prediction Models belong to the class of adaptive forecasting methods, the main characteristic of which is the ability to continuously take into account the evolution of the dynamic characteristics of the processes under study, adapt to this dynamics, giving, in particular, the greater the weight and the higher the information value of the available observations, the closer they are to current moment time. The meaning of the term is that adaptive forecasting allows you to update forecasts with minimal delay and using relatively simple mathematical procedures.

The exponential smoothing method was independently discovered Brown(Brown R.G. Statistical forecasting for inventory control, 1959) and Holt(Holt C.C. Forecasting Seasonal and Trends by Exponentially Weighted Moving Averages, 1957). Exponential smoothing, like the moving average method, uses the past values of the time series for forecasting.

The essence of the exponential smoothing method is that the time series is smoothed using a weighted moving average, in which the weights obey the exponential law. A weighted moving average with exponentially distributed weights characterizes the value of the process at the end of the smoothing interval, that is, it is average characteristic last levels row. It is this property that is used for forecasting.

Normal exponential smoothing is applied when there is no trend or seasonality in the data. In this case, the prediction is a weighted average of all available previous series values; in this case, the weights geometrically decrease with time as we move into the past (backward). Therefore (unlike the moving average method) there is no point at which the weights break off, i.e. zero. A pragmatically clear simple exponential smoothing model can be written as follows (all formulas of the article can be downloaded from the link provided):

Let us show the exponential nature of the decrease in the weights of the values of the time series - from the current to the previous, from the previous to the previous-previous, and so on:

If the formula is applied recursively, then each new smoothed value (which is also a prediction) is calculated as a weighted average of the current observation and the smoothed series. Obviously, the result of smoothing depends on the adaptation parameter alpha. It can be interpreted as a discount factor that characterizes the measure of data devaluation per unit of time. Moreover, the influence of data on the forecast decreases exponentially with the “age” of the data. Dependence of data influence on the forecast at different coefficients alpha shown in Figure 1.

Figure 1. Dependence of the influence of data on the forecast for different adaptation coefficients

It should be noted that the value of the smoothing parameter cannot be equal to 0 or 1, since in this case the very idea of exponential smoothing is rejected. So if alpha equals 1, then the predicted value F t+1 matches the current row value Xt, while the exponential model tends to the simplest “naive” model, that is, in this case, forecasting is an absolutely trivial process. If a alpha equals 0, then the initial forecast value F0 (initial value) will simultaneously be a forecast for all subsequent moments of the series, that is, the forecast in this case will look like a regular horizontal line.

However, let's consider variants of the smoothing parameter that are close to 1 or 0. So, if alpha close to 1, then previous observations of the time series are almost completely ignored. If alpha close to 0, then current observations are ignored. Values alpha between 0 and 1 give between accurate results. According to some authors, the optimal value alpha is in the range from 0.05 to 0.30. However, sometimes alpha, greater than 0.30 gives a better prediction.

In general, it is better to evaluate the optimal alpha based on raw data (using grid search), rather than using artificial recommendations. However, if the value alpha, greater than 0.3 minimizes a number of special criteria, this indicates that another forecasting technique (using a trend or seasonality) is able to provide even more accurate results. To find the optimal value alpha(that is, minimization of special criteria) is used quasi-Newtonian likelihood-maximization algorithm(probability), which is more efficient than the usual enumeration on the grid.

Let's rewrite equation (1) in the form of an alternative version that allows us to evaluate how the exponential smoothing model “learns” from its past mistakes:

Equation (3) clearly shows that the forecast for the period t+1 subject to change in the direction of increase, in case of exceeding the actual value of the time series in the period t over the forecast value, and vice versa, the forecast for the period t+1 should be reduced if X t less than F t.

Note that when using exponential smoothing methods important issue always is the determination of the initial conditions (initial forecast value F0). The process of choosing the initial value of the smoothed series is called initialization ( initializing), or, in other words, “warming up” (“ warming up”) models. The point is that the initial value of the smoothed process can significantly affect the forecast for subsequent observations. On the other hand, the influence of choice decreases with the length of the series and becomes uncritical for a very large number of observations. Brown was the first to suggest using the average of the time series as a starting value. Other authors suggest using the first actual value of the time series as the initial forecast.

In the middle of the last century, Holt proposed to extend the simple exponential smoothing model by including the growth factor ( growth factor), or otherwise the trend ( trend factor). As a result, the Holt model can be written as follows:

This method allows you to take into account the presence of a linear trend in the data. Later, other types of trends were proposed: exponential, damped, etc.

Winters proposed to improve the Holt model in terms of the possibility of describing the influence of seasonal factors (Winters P.R. Forecasting Sales by Exponentially Weighted Moving Averages, 1960).

In particular, he further extended the Holt model by including an additional equation describing the behavior seasonal component(component). The system of equations of the Winters model is as follows:

The fraction in the first equation serves to exclude seasonality from the original series. After exclusion of seasonality (according to the method of seasonal decomposition CensusI) the algorithm works with “pure” data, in which there are no seasonal fluctuations. They appear already in the final forecast (15), when the “clean” forecast, calculated almost according to the Holt method, is multiplied by seasonal component (seasonality index).