# Regression analysis

## Introduction

This document refers to Macrobond 1.12 and later.

The Regression implements a multiple linear regression model. Several models can be specified within one instance of the analysis.

The output consists of the coefficients of the linear model, the predicted series and a number of statistics. If there is sufficient data after the end of the estimation sample range, forecasts can be calculated.

## Estimation model

The Regression analysis attempts to find the linear combination of a number of *explanatory* series that best describes a *dependent* series.

The analysis uses the following model:

where y is the dependent series and x_i are the explanatory series.

If the option “No intercept” is selected, then the constant α is not included in the model.

${\text{y}}_{\text{t}}=\text{\alpha}+{\text{\beta}}_{\text{1}}\phantom{\rule{0ex}{0ex}}{x}_{\text{t1}}+{\text{\beta}}_{\text{2}}{x}_{\text{t2}}+{\text{\beta}}_{\text{3}}{x}_{\text{t3}}+{\text{\u03f5}}_{\text{t}}$The parameters `α` and `β` are estimated by minimizing the sum of the squared residuals `ϵ`. This method is known as Ordinary Least Squares (OLS).

The automatic estimation sample range will be the largest range where there is data for all the series. You can specify a smaller range if you like to limit the data to be used in the estimation.

The output from the analysis will include the predicted series calculated using the estimated parameters.

Calculation range | The calculation range used for the analysis. |

Observations | The number of observations used in this analysis. This includes all observations in the calculation range where there are values for all series. |

Degrees of freedom | The number of observations minus the number of dependent series and minus one for the constant parameter. |

R2 | Compares the variance of the estimation with the total variance. The better the result fits the data compared to a simple average, the closer this value is to 1. |

F | The F-ratio is the ratio of the explained variability and the unexplained variability each divided by the corresponding degrees of freedom. In general, a larger F, indicates a more useful model. |

P-value (F) | The p-value is the probability of obtaining a value of F that is at least as extreme as the one that was actually observed if the true values of all the coefficients are zero. |

Sum of squared errors | The sum of the square of the residuals. |

Standard error of regression | The square root of the sum of squared errors divided by the degrees of freedom. This is an estimate of the standard deviation of residuals. |

Standard error of forecasts | The square root of the sum of squared forecast residuals divided by the number of residuals. |

Durbin-Watson | The Durbin-Watson is a test statistic used to detect the presence of autocorrelation in the residuals. The value is in the range 0-4. A value close to 2 means that there is little auto correlation. The result from this test is not useful if any dependent series is included with several lags or if no intercept is included in the model. |

The information criteria are measures of the expected information loss. A lower value means that more information is captured. This can be used to compare models when the same data is used in the model.

AIC | Akaike's information criterion. |

HQ | Hannan and Quinn's information criterion |

Schwarz | Schwartz information criterion |

Coefficient | The estimated parameters |

Standard error | The standard error of the estimated parameters |

t | The estimated coefficient divided by the standard error |

P-value | The p-value is the probability of obtaining a value of t that is at least as extreme as the one that was actually observed if the true value of the coefficient is zero |

#### Forecast

When there are data for all the explanatory series beyond the estimation sample, we can use the estimated parameters to calculate forecasts. This is done by checking the option “Calculate forecast”.

If no end point is specified, the analysis will calculate as many forecasted values as possible. You can specify an end point if you want to limit the length of the forecast.

#### Dynamic forecast

Dynamic forecasting is to use the data generated by the model as input to the model in order to calculate additional forecasts.

In this example we have included an explanatory variable that is a lag of the dependent series.

In the example, it is the lagged series that limits how far we can calculate the forecast. This allows us to use the predicted data as input to calculate the forecast further. The analysis will only attempt to do this if you select the “Dynamic forecast” option.

There is one special case to be aware of. If all your explanatory series are lagged versions of the dependent series, you can use dynamic forecast to calculate forecasted values forever. In this case you *must* specify an end point for the forecast since there is no way for the application to know when to stop.

top

## Settings

#### Regression models

You can define one or more regression models. Each model has separate settings. When a new model is created, the settings of the current model are duplicated. Models can be renamed and deleted.

#### Output dependent series

Select this option in order to include the dependent series as a series in the output.

#### Output explanatory series

Select this option in order to include the explanatory series as series in the output.

#### Estimation sample range

Specify the limits of the estimation sample range. The default range will be the largest range where there is data for all the series.

#### No intercept

When this option is selected, the constant α is omitted from the model:

${\text{y}}_{\text{t}}={\text{\beta}}_{\text{1}}\phantom{\rule{0ex}{0ex}}{x}_{\text{t1}}+{\text{\beta}}_{\text{2}}{x}_{\text{t2}}+{\text{\beta}}_{\text{3}}{x}_{\text{t3}}+{\text{\u03f5}}_{\text{t}}$#### Residuals

When this option is selected a time series containing the residuals will be calculated.

#### Residuals for forecasts

If this option is selected, the series of residuals will contain also residuals for the forecasted values. Such residuals can only be calculated when forecasts are calculated and there is an overlap between the forecasts and the dependent series.

#### Uncertainty band

By selecting the option *Uncertainty band* two additional time series will be calculated. These form a band around the predicted values by adding and subtracting a number of the measurement *Standard error of regression*, as described above.

#### Calculate forecasts

Forecasts will be calculated only if this option is selected and there is sufficient data, as explained in the section Forecast above.

#### End point

You can limit how far into the future that forecasts will be calculated. If not specified, forecasts will be calculated as far as possible.

In the special case when dynamic forecast is enabled and the model contains only lagged versions of the dependent variable, a limit must be specified.

#### Allow dynamic forecast

Allow the use of predicted values of the dependent series when calculating forecasts.

#### Confidence band

By selecting the option *Confidence band*, two additional series will be calculated. These form a confidence band around the forecasted values. The band is calculated so that the forecast is within the band with the specified probability assuming that the forecast values are t-distributed.

#### Include

Select if you want to include this series in the model.

#### Is dependent

Select which series is the dependent series. This must be specified.

#### Lag to/from

In order to include lagged versions of the series you can specify the range of lags to include.

When lags are specified for the dependent series, the lagged series will be used as explanatory series in the model. The dependent series will always be without lag.

#### Lag range

Here you can exclude lags from the range specified by the Lag to/from setting.

top