Principal components analysis

Overview

The Principal components analysis (PCA) allows you to calculate a set of linearly uncorrelated series, or components, from a set of possibly correlated series. As a dimension-reduction technique, PCA helps you reduce a set of series to a smaller set of series containing most of the information of the large set.

We provide standard implementation of this analysis. The component series are calculated using an orthogonal transformation so that the first series captures the highest possible variance of the original set. Each successive series captures the highest possible remaining variance under the constraint that it is orthogonal to the preceding series. The analysis also outputs the eigenvectors and the eigenvalues.

Settings

General

Do not include series used in calculations in the output

When checked, any series included in the calculation will be excluded from the output. Uncheck this setting if you want both the original series and the calculation result in the output.

Include new series automatically

When checked, any new series added to the Series list will automatically be included in the calculation.

Use legacy format

Checking this option will enable legacy output meaning the analysis won't group outcome into lists. It will produce separate series for each of the components of each model. Please note that by enabling this option all following analyses will lose their settings.

Select method for creating matrix

Use correlation (normalize input)

The eigenvectors will be calculated from the correlation matrix. This means that the input is centered and normalized before the components are calculated. PCA is sensitive to the scale of the input. Use this setting if variables are of different units, e.g., currencies and indices.

Use covariance

The eigenvectors will be calculated from the covariance matrix. This means that the input is only centered before the components are calculated. Remember that if you choose covariance, the input is not normalized, and the analysis will be sensitive to the scale of the input.

Select series

Number of components

Here the number of component series is defined. These are the principal components that will be calculated and included in the output. This number of components cannot be greater than the number of series included in the analysis.

The components are sorted in order of how much variance of the original data set that they capture. If you select 'Greatest' you will get the most significant series and selecting 'Smallest' will yield the least significant series.

Output series description

Specify the description of the output series or use the default description.

Include

Select what series to include in the calculation.

Output PCA elements

Eigenvectors/Matrix

This is a Matrix renamed to 'Eigenvectors'.

The matrix contains the eigenvectors of the correlation or covariance matrix. These vectors are orthonormal.

Eigenvalues/Category chart & table

This is a Category chart and Category table renamed to 'Eigenvalues chart' and 'Eigenvalues table'.

The analysis yields two category series, one with the eigenvalues and one with the cumulative proportion of the eigenvalues. The latter can be interpreted as how much of the original variance that is captured by that principal component together with all preceding components.

Principal components/Time chart & table

The 'Number of components' setting specifies how many component series should be calculated. The series are either the most or least significant components. In the time series space, the components are projected as the eigenvectors scaled so that the variance is the same as the corresponding eigenvalue.

Projection is internal product of the PC vector with time series. By determining the eigenvectors of the covariance matrix corresponding to successive eigenvalues, we obtain the coefficients of the linear combinations that form the new principal components.

The eigenvectors only specify a direction, and not any magnitude. So to be able to decide a magnitude of the component series, the common approach is to scale the resulting series so that the variance is the same value as the corresponding eigenvalue.

Example

Principal component UK Swap rates

The three main principal components of changes in the UK swap rates are identified using PCA.

Questions

I've checked 'Use correlation (normalize input)' - is it possible to show this normalized series?

The normalization is done on the matrix level, which corresponds to normalizing the series. We never explicitly calculate the normalized series, so it’s not possible to plot it.

I have many different sets of data, but the matrix tab only shows the output for one.

The matrix is built only for the first model in the sequence. So, to see all matrices, you'd need to create five different PCAs.

What happens when series have different lengths?

The calculation is made in the interval where there is data in all series.

 

Cross sampling

Overview

The Cross sampling analysis is a tool for extracting particular values or metrics and comparing them across series. Use this analysis when you want to create a chart with categories, such as countries, along the x-axis, and columns of values on the y-axis. The Cross sampling analysis is a successor to the Scalar analysis with a new workflow that is especially tailored for lists.

The lists are pre-prepared data sets on which you can easily operate in Cross sampling (and in other analyses). To change data, you just go back to Series list and replace or add a series without rebuilding whole analysis. You can also create and share lists through My list feature.

The Cross sampling analysis can perform a variety of calculations that result in one value per input series, such as the last value, the mean in a time range, or year to date performance. The output is always a category series, meaning that the time variable is replaced by a categorical variable. You can display this output in a Category chart, Bar chart, or Category scatter chart.

Working with Cross sampling analysis

Input - Lists

All input data in this analysis go in Lists - data sets defined under Series list's tab. For more information see: Lists of series. You can create and share lists with your colleagues through My list feature.

Settings

The settings consist of four parts. In the screenshot below, there are two lists and two series that are not part of any list to the left. On the right-hand side, the two lists have been added and can be seen as two columns.

  1. Select what calculations to apply to each series. Details about the calculations can be found below.
  2. Select how the output should be generated.
  3. This is the list of all input series and lists.
  4. Here you organize the series that should be used for the output.

Calculations

Here, you can add one or more calculations that will be performed on all selected input series. The available calculations are:

Open, Close, High, Low

The first, highest, lowest, or last value of the specified range.

Mean, Median

The mean or median of the range.

Last

The last valid value of each series.

Last common

The value at the last point in time at which all the included series have values.

Last non-forecast

The last value that is not a forecast in the series.

Value at

The value at a specific point in time. If a series is missing a value for that date, the first available value before that date will be used.

Nth last value

The nth last value of a series, where a value of 1 gives the last value, 2 gives the second value to last, 3 gives the third value to last etc.

Year, Quarter, Month, Week to date

The performance from the start of the period to the specified date. The performance is measured as the change compared to the last value of the previous period.*

Performance since

The performance between two specified dates. The performance is measured as the change compared to the last value of the previous period.*

Performance analysis works a bit different than performance calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value, while in Performance it gives an error if the specified start date is missing. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Years, Quarters, Months, Weeks back

The change from a selected number and type of periods before the specified date.*

For years and quarters, this is the same as using the 'Rate of change since' method and specifying the start of the range as '-1y' or '-1q'.

Rate of change since

The rate of change between two points in time.*

Rates of change as value, percentage or logarithmic are calculated in the following way:

  • value = y t y t n
  • percentage = 100 y t y t n | y t n |
  • logarithmic = 100 ln y t y t n
  • annualRateValue = c h i = 1 h z t + 1 i
  • annualRatePercent = 100 z t z t h c h 1

where c is the typical number of observations in one year.

Rate of change analysis works a bit different than Rate of change since calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Percentage proportion

The percentage proportion of each series compared to the sum of all series at a specified point in time.

Standard deviation

The standard deviation of the range.

Percentile

The specified percentile of the selected range.

Lower, Upper tail mean

The mean of the values in the upper or lower percentile of the range.

Trimmed mean

The mean of the middle values as specified by the percentage.

Standardize

The mean divided by the standard deviation of the range.

Note that formula Standardize() won't give same outcome. In formula we standardize the series (value - mean)/stddev for each value. While in Cross sampling we calculate a standardized value for the whole series (or a specified interval) according to mean/stddev.

Settings for calculation methods
*Relative dates

Most Cross sampling calculations require either a point in time or a time interval to be specified. You can use specific dates, but you may want the dates to update when new data is added. In that case, leaving the date box blank or using relative dates, such as '-1y' can be useful. It’s important to understand what default dates are chosen when none is specified, and how relative dates work in each context.

Point in time

First, we’ll talk about calculations that require only one point in time, such as value at. If the point in time box is left blank, the last valid value for each series will be used.

If you specify a relative date here, that date will be relative to the last calendar date, not relative to the last date for each series. If you would like the last calendar date to be used, even though not all series may have values, you should use the relative date '+0'.

Time intervals

If you leave the range start blank, the first available value for each series is used. If you leave the range end blank, the last available value for each series is used.

When you use a relative date for the range start and leave the range end blank, the end point will be the last valid value for each series and the starting point for each series will be relative to its last point, not the last calendar date.

If you use relative dates for both the range start and the range end, they will both be relative to the last calendar date.

Output mode

You can select one of two ways to create the output series.

One series per calculation

For each calculation defined, there will be one series per column defined in the organization pane. In this mode you can select what metadata to use for generating the labels by selecting Label generation:

Based on the example in the screenshot above, this means that there will be one series with the GDP values and one series with the unemployment numbers.

One series per input

In this case there will be one output series with all the defined calculations for each input series. For example, if you have an input like this:

you will get a category chart like this:

Organizing output

The analysis is tailored for lists of series. You organize the output by selecting a list of series on the left-hand side. In most cases, you drag a list over to the right-hand side, which will add the list to the output. If you want to change/add series, you need to do this on List in Series list.

Series will be paired automatically by sub-region metadata.
Note you can only place lists side-by-side if the series belong to the same family or at least one of them is a list by region. The entries in the lists will be automatically aligned.

Groups

You can separate each data set in analysis by creating separate Group tabs. The effect will be same as if you added lists to one Group, but it might be easier to keep track with separate Groups.

Individual series

You can also add with drag and drop individual series (or group of them). Mark series and on the right navigate so you would see bolder horizontal line - then you can drop series and it will be added to the group.

Note, when using series from a list you cannot drag individual series like this. Instead please go to Lists tab and add series there.

Order

The order will be determined by one of the columns that is based on a list. You can select which column decides the order by clicking on the button in the column header.

Missing input

If any of the lists have missing series, a red background will appear. A series may be missing if no series has been entered in the list for that position in a plain list or if two lists by region do not have the same set of regions.

You have a few options how to handle such missing input and you can select the strategy in the setting called 'Missing inputs'.

Replacing individual series

You can replace individual series by dragging a series to a position in the table to the right. This can be used for doing an exception in a list or for filling in missing entry when you do not want to change the underlying list.

Any series in a list that has been replaced, will have a yellow background and a button for reverting the change.

How to create simple chart with Cross sampling and Lists?

  1. Copy/Cut series you want to use.
  2. Go to Series list > Lists tab, use 'Add new by region list' and paste series.
  3. Add list to Series list.
  4. Add Cross sampling analysis, select calculation and Output mode.
  5. Drag list to from 'Series' to 'Group'.
  6. Add chart or table.

How to have two different colors for one List?

If you want to add a second color to a chart you can wrap selected series with flagforecast() and use separate color setting to introduce another color. But if you have a List this cannot be done. See below steps with solution. Note that the partition of series into different colors doesn't have to be even.

  1. Add a random extra series or constant in the Series list that you do not need. In our example,
    we add a '0'. This series will be used only to construct groups of series and will be deleted at the end. In Cross Sampling, firstly add this extra series as a new column. Then add a calculation 'Last'
    so we have values generated.
  2. Drag and drop the series for the first part of series under the 0 as we do below. Make sure that
    you see a 'special line' below 0 as in the video so that the series stack up below 0.
  3. Add 0 as a new column again so we can create the second part. Drag and drop the second part series below the last red box.
  4. In Group switch 'Missing input' from 'Error' to 'Missing'.
  5. Go into series list and delete '0' (or extra series you have added). This removes the 0 from the Cross
    Sampling and we are left with only the series we want.
  6. Add a Category Chart. To get a column look, go into Graph layout (Ctrl+L), and then change Graph type to Stacked Column.

If you want to sort the series, add the entire List as a new column. Now let’s add a Sorting analysis. Expand the 'Last.' Then sort the earlier established 'List' and give it a direction. Then sort all the other series by that List. However, in Graph layout only use the first and second group of series, and not the list. The list is only needed to sort all the series at once both parts have missing series.

For a ready-file see Different colors for two or more groups inside one list.

Using Transpose analysis after Cross sampling

With Transpose analysis you can change data's place from x-axis to y-axis and vice versa without rearranging data or rebuilding Cross sampling. See Transpose for examples.

Examples

Single series column with average line

In this example, we calculated average of GDP values for several countries and plotted it together with those values.

Cross sampling analysis on three indicators

In this document we worked on lists containing city level series for three different indicators, which then were combined in one bubble chart to compare values.

Conditional formatting rules

We applied conditional formatting rules to the Bar chart table created with Cross sampling analysis with use of lists.

Highlighting series

See how to use formula to highlight chosen series or series based on a conditions.

Different colors for two or more groups inside one list

See how to divide series from one list into two (or more) groups with different colors for each.

Two by two stacked columns

Create two stacked columns with different calculations for each of two countries using Cross sampling and Transpose.

Questions

How to add a single series column (constant series)?

Sometimes you want to create a group with constant series. For example, here a sum of the GDP series has been created and this series has been added by selecting it ono the left and pressing the 'Add selected series as new single series column.' See the file under Examples: Single series column with average line.

If new series are added to the lists, the single series column will be extended to include more rows of the same series.

How to show date(s) of observation?

When using 'Last common' or 'Value at' calculation method you can select metadata {s .ObservationDate}

Example:

What is the difference between 'Add selected series as new column' and 'Add selected series as new single series column'?

If you are working with series not arranged in a list, you should use 'Add selected series as new column.'

If you want to have one series as a whole column select 'Add selected series as new single series column.'

What is the difference between the Rate of change analysis and selecting ‘Rate of change since’ when doing a Cross sampling analysis?

  • Rate of change analysis calculates the changes from the end of each time series while
  • 'Rate of change since' in Scalar/Cross sampling analysis calculates it from the end of the whole calendar. Meaning that if some series do not end at the same observation date, the calculation range will differ.

You can set the 'Range Start' and 'Range End' in the Scalar/Cross sampling analysis to make sure the calculation is done on the same range across all input series.

Example:

Why annualization in Rate of change analysis works differently than in Cross sampling analysis?

Rate of change is by default set to 'Mode: Fixed period'. There is also another mode there - 'Calendar date' - which is helpful when working with a Daily series or for when using Annual rate. Annualization is done differently when you select 'Calendar mode' since Macrobond then use the actual length of the period to do annualization. If you switch it to 'Calendar mode' you will get same value in both analyses.

Why I can see 'Strict' option in one document and can't in other?

Since version 1.29 the 'Strict' option is not available in new documents as it is always turned on for calculation. File where you can see that option was created in an older version of Macrobond.

Transpose

Overview

This analysis changes position of data - data from x-axis will be shown on y-axis and vice versa. You do not need to re-arrange series or change analysis' settings.

Transpose can be used with Category chartCategory scatter chart and Category table.

After which analyses I can use Transpose?

You can add it only after those analyses which outcome is presented as category data:

  • Cross sampling
  • Scalar
  • Slice
  • Seasonal adjustment MA
  • Yield Curve
  • Correlation
  • PCA
  • Regression

How to use it?

Add Transpose analysis and check the box in the 'Include' column. The Output labels can be automatically generated with Title generation tool. Below in Analysis tree add the chart/table.

Examples

Cross sampling with Transpose

In this example we performed Cross sampling analysis. With Transpose instead of having each country separately we can have them grouped by region.

Slice with Transpose

Here we sliced 2020 year from unemployment rate series. Then with Transpose we changed place of data - country is on x-axis and color is for each month.

Distribution stack chart - New houses

See Transpose used to show distribution of years (with date) by number of starts of new house construction.

Two by two stacked columns

Create two stacked columns with different calculations for each of two countries using Cross sampling and Transpose.

Rolling regression

Overview

The Rolling regression analysis implements a linear multivariate rolling window regression model. Just like ordinary regression, the analysis aims to model the relationship between a dependent series and one or more explanatory series. The difference is that in Rolling regression you define a window of a certain size that will be kept constant through the calculation. The analysis preforms a regression on the observations contained in the window, then the window is moved one observation forward in time and process is repeated. Thus, many regressions will be performed as the window moves forward.

Estimation model

For more in-depth information regarding the estimation model see Regression analysis.

Working with Rolling regression analysis

Settings

Regression models

You can define one or more regression models. Each model has separate settings. When a new model is created, the settings of the current model are duplicated. Models can be renamed and deleted.

Output dependent series

Select this option to include the dependent series in the output.

Output explanatory series

Select this option to include the explanatory series in the output.

Date range

Specify the limits of the date range and window length. The default range will be the largest range where there is data for all the series.

No intercept

When this option is selected, the constant α is omitted from the model and it will be defined as:

y t = β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

Residuals

When this option is selected a series containing the residuals will be included in the output.

Durbin-Watson

The Durbin-Watson is a test statistic used to detect the presence of autocorrelation in the residuals. The value is in the range 0-4. A value close to 2 means that there is little auto correlation. Values from 0 to less than 2 point to positive autocorrelation and values from 2 to 4 means negative autocorrelation. The result from this test is not useful if any dependent series is included with several lags or if no intercept is included in the model.

For more information about this see Investopedia.

Schwarz

The Schwarz information criterion takes overfitting into account and estimates the efficiency of the model in terms of predicting the data. The criterion yields a positive value, where a lower value is considered better when comparing different models based on the same data.

R2

The R2 value compares the variance of the estimation with the total variance. The better the result fits the data compared to a simple average, the closer this value is to 1.

Coefficient

The estimated parameters.

P-values

The p-value is the probability of obtaining a value of t that is at least as extreme as the one that was actually observed if the true value of the coefficient is zero.

T-values

The t-value measures the size of the difference relative to the variation in your sample data.

Series settings

Include

Select if you want to include this series in the model.

Is dependent

Select which series is the dependent series. This must be specified.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to 'Change over value (one observation)' while expressing the result in levels. If you tick that option, the result will output the coefficients for intercept and diff(x1) rather than intercept and x1.

This setting does not affect the model itself. It only influences the step after the calculation of the model when the levels are calculated from the differences.

Lag to/from and Lag range

Here you specify the lags you would like to include for a specific series. When lagging a series, the values are delayed in time and the series stretches further into the future.

If you for example set “Lag from” to 0 and 'Lag to' to 2 three series will be included, one series with no lag, one with a lag of 1 and one series with 2 lags. This will automatically change the lag range to '0 to 2'. You may specify the desired lags using 'Lag to/from' or 'Lag range', the result will be the same. If you set Lag range to a single digit or set 'Lag to' and 'Lag from' to the same value, a single lagged series will be included.

When lags are specified for the dependent series, the lagged series will be used as explanatory series in the model. The dependent series will always be without lag.

How to create simple rolling regression model?

  1. Check box for 'Output the dependent series' or 'Output the explanatory series'.
  2. Select window length.
  3. Select Output indicators (they will appear on chart).
  4. Check 'Include' for at least two series and mark one as 'Is dependent'.
  5. Add Time chart.

Common errors

Degree of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom are the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.

The number of observations must thus be larger than the number of independent (explanatory) series.

Forecast

It is not possible to calculate forecast in Rolling regression analysis. In some way such functionality at some point would base forecast on itself because it keeps rolling.

As a workaround, we recommend using simple Regression analysis. In 'Estimation sample range' type in parameter '-window_length' (i.e.; -5m and -2m; -50 and -10). Thanks to that you will set last non-forecasted value in the desired point in time. When you enable 'Calculate forecast' box this will calculate the forecast based on the regression of the narrowed earlier number of observations of these two series.

Report

The fact that a rolling window is utilized has implications for the output. When using Regression analysis, a report is generated. In Rolling regression, no such report will be available. This is because, as explained in the overview, a rolling regression constitutes of many regressions, all of which will yield individual statistics. The output of statistics, information criteria and parameters will thus all be time series. You have many options regarding what information to include in the result.

How to output indicators?

Simply mark the indicator in the panel and it will be available as output.

Calculating regression with formulas

To calculate α and β use:

Intercept(series1, series2, window)
Slope(series1, series2, window)

where series1 is the dependent series and series2 is the explanatory series. If you get different values than from analysis check 'Estimation sample range' - it has to be calculated on identical time range. To avoid adding Cut() formula everywhere you can set data range on Series list.

Formulas above calculate the regression between two series, but if in Regression analysis are more series these won't be comparable models - you will get different outcomes.

Examples

Rolling regression

In this example, we used the model presented for the Regression analysis, and created a new regression model which is generated on 5 years rolling window. For the output, we've included the residuals and the R2.

Mulitvariable rolling regression

Here we calculate explanatory variable share.

Questions

Why rolling regression's average residual is not zero?

If you do a standard regression, the mean of the residuals is zero. With Rolling regression, it does one separate regressions at each point in time and thus residuals are not zero on average.

Why model's values are different than the ones coming from a same model but rolling?

The differences stems from different time ranges being taken into calculation because of:

  1. Mon-Sun daily series vs Mon-Fri daily series
  2. what Macrobond takes into account when calculating a sample range

To ensure that you are looking and comparing the exact same time periods use one of the below methods:

  • change frequency in the document to Daily (not Daily (highest) or Daily (lowest)) and set Observations to Monday-Friday
  • change the window size in Rolling regression from, for example 'X months', to number of observations in Regression i.e., '402'.

The first method will give closely similar results but still not 100% same due to the way Macrobond sets Start range i.e., '-18m' is not strictly '18m', but '18m' counting from the previous observation so, effectively '18 months +1'. In which case, you may want to set the Window length to a number of observations instead.
The latter approach will output the same values in both Regression and Rolling regression.

Regression

Overview

The Regression analysis implements a multiple linear regression model. The analysis aims to model the relationship between a dependent series and one or more explanatory series. Several models can be specified within one instance of the analysis. The output consists of the coefficients of the linear model, the predicted series, and several statistical indicators. If there is sufficient data after the end of the estimation sample range, forecasts can be calculated.

Estimation model

The Regression can only calculate static linear regression model. Analysis attempts to find the linear combination of a number of explanatory series that best describes a dependent series.

   ⇒   

The analysis uses the following Ordinary Least Squares (OLS) model:

y t = α + β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

 

y dependent series
x_i explanatory series
α intercept
β slope (coefficient, beta)
ϵ sum of the squared residuals
(sum of squared errors)

If the option No intercept is selected in analysis, then the constant α is not included in the model.

The parameters α and β are estimated by minimizing the sum of the squared residuals ϵ. The output from the analysis will include the predicted series calculated using the estimated parameters.

The automatic estimation sample range will be the largest range where there is data for all series. You can specify a smaller range if you like to limit the data to be used in the estimation.

Working with Regression analysis

Settings

Regression models

You can define one or more regression models. Each model has separate settings. When a new model is created, the settings of the current model are duplicated. Models can be renamed and deleted.

Output dependent series

Select this option to include the dependent series in the output.

Output explanatory series

Select this option to include the explanatory series in the output.

Estimation sample range

Specify the limits of the estimation sample range. The default range will be the largest range where there is data for all the series.

No intercept

When this option is selected, the constant α is omitted from the model and it will be defined as:

y t = β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

Residuals

When this option is selected a series containing the residuals will be included in the output.

Residuals for forecasts

If this option is selected, the series of residuals will also contain residuals for the forecasted values. Such residuals can only be calculated when forecasts are calculated and there is an overlap between the forecasts and the dependent series.

Uncertainty band

By selecting the option Uncertainty band, two additional time series will be calculated. These time series form a band around the predicted values by adding and subtracting a number of standard deviations. The standard deviations are based on Standard error of regression which is calculated as 'the square root of the sum of squared forecast residuals divided by the number of residuals', as described in the section Report.

Calculate forecasts

Forecasts will be calculated only if this option is selected and there is sufficient data, as explained in the section Forecast.

End point

You can limit how far into the future that forecasts will be calculated. If not specified, forecasts will be calculated as far as possible. In the special case, when dynamic forecast is enabled and the model contains only lagged versions of the dependent variable, a limit must be specified.

Allow dynamic forecast

Allow the use of predicted values of the dependent series when calculating forecasts.

Confidence band

By selecting the option Confidence band, two additional series will be calculated. These time series form a confidence band around the forecasted values. The band is calculated so that the forecast is within the band with the specified probability assuming that the forecast values are t-distributed.

Series settings

Include

Select if you want to include this series in the model.

Is dependent

Select which series is the dependent series. This must be specified.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to 'Change over value (one observation)' while expressing the result in levels. If you tick that option, the result will output the coefficients for intercept and diff(x1) rather than intercept and x1.

Diff->legacy

Calculate the predicted series by adding diffs to the dependent series (it was a default option for Macrobond version 1.26 and lower).

This setting does not affect the model itself. It only influences the step after the calculation of the model when the levels are calculated from the differences.

Diff->agg

Calculate the predicted series by aggregating the predicted differentials.

Lag to/from and Lag range

Here you specify the lags you would like to include for a specific series. When lagging a series, the values are delayed in time and the series stretches further into the future.

If you for example set 'Lag from' to 0 and 'Lag to' to 2 three series will be included, one series with no lag, one with a lag of 1 and one series with 2 lags. This will automatically change the lag range to '0 to 2'. You may specify the desired lags using 'Lag to/from' or 'Lag range', the result will be the same. If you set Lag range to a single digit or set Lag to and 'Lag from' to the same value, a single lagged series will be included.

When lags are specified for the dependent series, the lagged series will be used as explanatory series in the model. The dependent series will always be without lag.

How to create simple regression model?

  1. Check boxes for 'Output the dependent series' and 'Output the explanatory series'.
  2. Check 'Include' for at least two series and mark one as 'Is dependent'.
  3. Add Scatter chart. Go there and open Graph layout (Ctrl+L).
  4. Pair series to generate a regression line. Make sure to set the right order of the series in Graph layout window (Note that when series is lagged there won't be a straight line as outcome.):

    a) Pair #1: the explanatory and the dependent series

    b) Pair #2: the explanatory and the predicted series

    Line A Line B
    series 1 series 1
    series 2  series 2 [predicted]
  5. Click on one of the lines, go to Presentation properties > Appearance, change Graph style to Custom. Then set Line to None and select Marker style.

How to add best fit line through different series' last values?

With formula output only last value from each series. Lag each value and glue them together with Cross section creating fake time series which can be fed to Regression analysis. See the file with explanation under Best fit line for last values of group of series with and WITHOUT Regression analysis.

Common errors

Too few time series in graph

  1. Check if you have added Category scatter chart - use Scatter chart instead.
  2. In left panel you should see at least 3 series in the output. Check if you have enabled 'Output dependent series' and 'Output explanatory series' at the top in Regression analysis.
  3. Check if you have two pairs of series in Graph layout. If not, see how to pair them under How to create simple regression model?.

Degree of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom is the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.
The number of observations must thus be larger than the number of independent (explanatory) series.

This might be caused for example by changing document's frequency to lower (i.e., from Monthly to Annual) or series used have not enough overlapping observations.

Forecast

How it works?

When there are data for all the explanatory series beyond the estimation sample, we can use the estimated parameters to calculate forecasts. This is done by checking the option 'Calculate forecast'. If no end point is specified, the analysis will calculate as many forecasted values as possible. You can specify an end point if you want to limit the length of the forecast. End point refers to 'up to, but not including'.

Forecast can't go further than the longest common part of explanatory series.

See 'Regression - how forecast is calculated' file under: Examples

Dynamic forecast

⇒   

Dynamic forecasting uses the data generated by the model as input to the model to calculate additional forecasts. To enable it check box for 'Allow dynamic forecast' under Forecast panel in Regression analysis.

In example above we have included an explanatory variable that is a lag of the dependent series. It is the lagged series that limits how far we can calculate the forecast. This allows us to use the predicted data as input to calculate the forecast further. The analysis will only attempt to do this if you select the 'Dynamic forecast' option.

There is one special case to be aware of. If all the explanatory series are lagged versions of the dependent series, you can use dynamic forecast on the series infinitely many times. In this case you must specify an end for the forecast since there is no way for the application to know when to stop.

Report

Overview

The Regression analysis automatically generates a report, which includes variety of statistical information.

Calculation range

The calculation range used for the analysis.

Observations

The number of observations used in the analysis. This includes all observations in the calculation range where there are values for all series.

Degrees of freedom

The number of observations minus the number of explanatory series and minus one for the constant parameter.

R2

Compares the variance of the estimation with the total variance. The better the result fits the data compared to a simple average, the closer this value is to 1.

In the ordinary case when an intercept term is included, this value is calculated as the square of the correlation between the dependent series and the estimate. In this case R2 will always be between 0 and

If the option "No intercept" has been selected, R2 is calculated in a different way since the dependent and estimated series can now have different mean values.

Please note that R2 for models that allow an intercept term cannot be compared with models that do not allow intercept. Typically, R2 for models that do not allow intercept, will be higher than the corresponding model with intercept, but that does not mean that it is a better fit.

Adjusted R2

To overcome the issue with R2 that it will always be higher when you add more variables to your model, you often look at the adjusted R2 that is calculated in the following way:

1-(1-R2)*(n-1)/(df-1)

Where n is the length of the series and df is the degrees of freedom.

F

The F-ratio is the ratio of the explained variability and the unexplained variability each divided by the corresponding degrees of freedom. In general, a larger F, indicates a more useful model.

P-value (F)

The p-value is the probability of obtaining a value of F that is at least as extreme as the one that was actually observed if the true values of all the coefficients are zero.

Sum of squared errors

The sum of the square of the residuals.

Standard error of regression

The square root of the sum of squared errors divided by the degrees of freedom. This is an estimate of the standard deviation of residuals.

Standard error of forecasts

The square root of the sum of squared forecast residuals divided by the number of residuals.

Durbin-Watson

The Durbin-Watson is a test statistic used to detect the presence of autocorrelation in the residuals. The value is in the range 0-4. A value close to 2 means that there is little auto correlation. Values from 0 to less than 2 point to positive autocorrelation and values from 2 to 4 means negative autocorrelation. The result from this test is not useful if any dependent series is included with several lags or if no intercept is included in the model.

For more information about this see Investopedia.

Information criteria

The information criteria are measures of the expected information loss. A lower value means that more information is captured. This can be used to compare models when the same data is used in the models.

AIC

Akaike's information criterion.

HQ

Hannan and Quinn's information criterion

Schwarz

Schwarz criterion also known as Bayesian information criterion

Coefficient

The estimated parameters
Regression - 4

Standard error

The standard error of the estimated parameters

t

The estimated coefficient divided by the standard error

P-value

The p-value is the probability of obtaining a value of t that is at least as extreme as the one that was actually observed if the true value of the coefficient is zero

How to output coefficients?

This analysis doesn’t produce as output the coefficients from the models. In main application they are only available in the 'Regression report'.

You can access them through Excel add-in, to do this:

  1. Right click on Series list and select 'Copy your series as Excel data set'. Paste it to Excel.
  2. Right click on red object, go to Edit. In new window as Select output, choose Regression and in next field Category series.

You can also calculate them separately in main application - see paragraph below.

Errors

Sometimes Model cannot be calculated and instead you will see error message indicating what is preventing calculation.

Calculation range end date is before start date

One (or more) series is too short. Check them on Time table before Regression analysis and exclude it from the calculation.

There is a linear dependency between the independent series

This means the equation system cannot be solved due to arbitrary values can be assigned to one or more of the constants in the equation and residuals can't be calculated.

Usually it appears when you turn frequency of series from lower (i.e., Annual) to higher (i.e., Monthly) thus those series have repeated values for an entire period. As a solution go to Conversion settings tab > To higher... and select there 'Cubic interpolation'. Some further manipulation might be needed (i.e., deleting lags for some series) but it depends on composition of your model.

Degrees of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom is the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.
The number of observations must thus be larger than the number of independent (explanatory) series.

This might be caused for example by changing document's frequency to lower (i.e., from Monthly to Annual) or series used have not enough overlapping observations.

Calculating regression with formulas 

To calculate α, β and R2 use:

Intercept(series1, series2)
Slope(series1, series2)
Pow(Correlation(series1, series2), 2)

where series1 is the dependent series and series2 is the explanatory series. If you get different values than from analysis check 'Estimation sample range' - it has to be calculated on identical time range. To avoid adding Cut() formula everywhere you can set data range on Series list.

Formulas above calculate the regression between two series, but if in Regression analysis are more series these won't be comparable models - you will get different outcomes.

Examples

Regression model - multiple series

In the Regression analysis, we first defined the variables of the model, by:

  • Marking the Industrial Production as dependent variable
  • Specifying the lags for the explanatory series (these numbers are based on the Correlation analysis).
  • Defining the output, we want to have in the chart: dependent series & residuals

We also decided to calculate forecasts. This is possible as all explanatory variables have been lagged, meaning we can calculate forecasts for the shortest number of lags defined, here 2 months.

Regression S&P and VIX scatter chart

In the Regression analysis, we checked as output both the dependent and explanatory variables. Both series, as well as the predicted series, will be needed in the Scatter Chart to show one week change in both in indices.

Regression - how forecast is calculated

An example showing how forecast with lagged series work.

Philips curve

Estimation of the Phillips curve based on the observations with fit line created through Regression.

Best fit line for last values of group of series with and WITHOUT Regression analysis

See how to prepare series to show best fit line for last values of group of series in Regression analysis and also how to do this without Regression at all - only through calculations.

Questions

Can I add non-linear regression?

Generally no, but in Examples you can find Philips curve built with Regression.

From where it is taking its residual?

The residual stems from the difference between the predicted model and its main series.

How to add dummy variable?

You can create binary series (0/1 series) using conditions in the formula language. These formulas need to be applied before the regression analysis. In Regression please add such series as explanatory.

For example: 

quarter()=1|quarter()=3

Returns 1 if the observation is Q1 or Q3, 0 otherwise.

quarter()=1 & year()=2020|quarter()=3 & year()=2020

Returns 1 if the observation is Q1 or Q3 for year 2020, 0 otherwise. Each quarter must have '& year()=2020' parameter, otherwise it will point to quarters in each year.

DayOfWeek()=5

Returns 1 if the observation is a Friday, 0 otherwise.

Counter()=EndOfYear()

Returns 1 if the observation is the last one in a year, 0 otherwise.

Counter()=Date(2020, 4, 1)

Returns 1 if the observation is a 1st April 2020, 0 otherwise.

Cop(usgdp, yearlength())<0

Returns 1 if the US GDP y/y growth rate is negative, 0 otherwise.

For more information see: Built-in formula functions

How to do a logarithmic regression?

You can calculate it with formulas but express the explanatory variable as Log(series).

For more information see: Built-in formula functions

How to do a regression against time (on one series)?

To perform linear regression on one time series (where the independent variable is time) use Counter() on Series list and perform Regression analysis.

For more information see: Built-in formula functions

Why model's values are different than the ones coming from a same model but rolling?

The differences stems from different time ranges being taken into calculation because of:

  1. Mon-Sun daily series vs Mon-Fri daily series
  2. what Macrobond takes into account when calculating a sample range

To ensure that you are looking and comparing the exact same time periods use one of the below methods:

  • change frequency in the document to Daily (not Daily (highest) or Daily (lowest)) and set Observations to Monday-Friday
  • change the window size in Rolling regression from, for example 'X months', to number of observations in Regression i.e., '402'.

The first method will give closely similar results but still not 100% same due to the way Macrobond sets Start range i.e., '-18m' is not strictly '18m', but '18m' counting from the previous observation so, effectively '18 months +1'. In which case, you may want to set the Window length to a number of observations instead.
The latter approach will output the same values in both Regression and Rolling regression.

Rolling principal components analysis

Overview

The Rolling principal components analysis (Rolling PCA) allows you to calculate a set of linearly uncorrelated series, or components, from a set of possibly correlated series. Rolling PCA enables you to do a time-dependent calculation which uses a moving or expanding window to compute the calculation.

For information about not-rolling version of this analysis see Principal components analysis.

Settings

General

Do not include series used in calculations in the output

When checked, any series included in the calculation will be excluded from the output. Uncheck this setting if you want both the original series and the calculation result in the output.

Include new series automatically

When checked, any new series added to the Series list will automatically be included in the calculation.

Select method for creating matrix

Use correlation (normalize input)

The eigenvectors will be calculated from the correlation matrix. This means that the input is centered and normalized before the components are calculated. PCA is sensitive to the scale of the input. Therefore, use this setting if variables are of different units, e.g., currencies and indices.

Use covariance

The eigenvectors will be calculated from the covariance matrix. This means that the input is only centered before the components are calculated. Remember that if you choose covariance, the input is not normalized, and the analysis will be sensitive to the scale of the input.

Select window type

Use expanding window

Observations will be added successively to the calculation one at the time from the beginning of the start date to the last observation available.

The calculations will start when there are as many observations as there are components.

Use moving window

The analysis will be performed on a specified window of observations that moves forward one observation at the time. Check this setting if you want to set the length of the moving window.

The window size cannot be smaller than the number of components and the calculations will start when there are enough observations to fill one window.

Output

Output: Eigenvalues/Cumulative proportions

The output is either the eigenvalue of each principal component of each window as we 'roll' over the input series or the cumulative proportions of the captured variance. The output will thus be as many time series as input series.

Output series description

Specify the description of the output series or use the default description.

Include

Select what series to include in the calculation.

Example

Expanding window in Rolling PCA

In this example, we use an expanding window to determine how much systemic variance was explained by our Principal Components before and after the financial crisis. We also compare two principal components from the 'Static' PCA with the components from the Rolling PCA

Unit Root Test

Overview

The Unit Root Test provides you with a tool to test if a series is non-stationary. More specifically, it performs an Augmented Dickey-Fuller (ADF) test of the null hypothesis that a time series has a unite root, which will violate the underlying assumptions in many statistical models. The analysis produces a report in which you can read and interpret results from the statistical test.

Estimation model

If a unit root is present in the system, the variance may be infinite overall and the data can be difficult to use in further analysis. The ADF test regression equation has the form:

Δ y t = α y t + i = 1 p β i Δ y t i + γ c + δ t + e t

α is the coefficient of the endogenous variable y on level form and is used for the actual test. Viewed as an autoregressive process, AR(p), the β -coefficients are for the p lags of y on difference form, Δ y . The ADF test (Dickey & Fuller, 1979) introduced lagged differenced variables to whiten the residuals, e, in the estimation. The lag length, p, should be chosen carefully observing the change in information criterion and should apply to the AR-model regardless of whether it has a unit root. (Sometimes t-values are used as criteria instead.) If an AR(1) process is assumed, all Δ y -terms can be omitted by setting both min and max lag to zero, given that e is still trusted to be approximately white noise. γ is the coefficient for the constant term and  for the linear trend.

If the series is assumed to be integrated, it can be differenced by checking the diff box. Note that trend and constant terms are not affected by this.

The null hypothesis is that the series has a unit root, so that is equal to zero, or:

H 0 : α = 0 H 1 : α < 0

Evaluation is done with t-values from the sample statistics, t α = α ^ / σ ^ α after OLS estimation. Because the t-values of the ADF estimator are non-standard both in sample and asymptotically, special tables and response surface calculations from MacKinnon 1996 are used (with permission) to calculate the p-value. The settings for the deterministic variables are in a drop-down box. The MacKinnon calculations will adjust to this (and to sample size) automatically but cannot account for other deterministic (or exogenous) variables, so MB does not allow them in the ADF test.

References

Dickey, D. A., and Fuller, W. A. (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74, 427-431

MacKinnon, James G, 1996. "Numerical Distribution Functions for Unit Root and Cointegration Tests," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 11(6), pages 601-618, Nov.-Dec.

(Programs and files: http://qed.econ.queensu.ca/pub/faculty/mackinnon/numdist/)

Settings

Start Lag/End Lag

To perform the analysis, you must specify which number of lags that will be included in the test. If you change 'Start lag' to 1 and 'End lag' to 3, it will include lags of 1, 2 and 3 in the test. It is also possible to run the test in a classical Dickey Fuller setup, without any lagged differences, by setting both min and max lag to zero.

Include

Select if you want to include this series in the analysis.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to "Change in value" (one observation) while expressing the result in levels.

Deterministic variables

Select if you want to include any deterministic variables in the equation, intercept or intercept and linear trend.

Report

The Unit Root Test automatically generates a report, which includes a variety of statistical information.

Examples

US GDP

In this example, we used the Unit Root Test on the U.S. GDP, expressed in logarithm. One version of the series has been expressed in differentials, to see how results would differ.

Questions

If I’m getting p < 0.05 does that mean series has unit root or not?

A p-value close to 1 indicates that there is likely a unit root. If there is a unit root, the series is not stationary.

A p-value closer to 0 means that we can likely reject the assumption that there is a unit root and the series is stationary.

Scalar

Overview

The Scalar analysis is a tool for extracting particular values or metrics and comparing them across series. Use this analysis when you want to create a chart with categories, such as countries, along the x-axis, and columns of values on the y-axis.

The scalar analysis can perform a variety of calculations that result in one value per input series, such as the last value, the mean in a time range, or year to date performance. The output is always a category series, meaning that the time variable is replaced by a categorical variable. You can display this output in a Category chart, Bar chart, or Category scatter chart.

Settings

Input series

This is a list of the series in your document that you can include in the scalar analysis. That a series is ‘included’ means that the added calculations will be performed on it, and the resulting values will be included in the output series. You can also select whether new series that you add to your document should be automatically included in the calculations.

The order of the series in this list determines the order of the values in the output series. You can adjust the order by clicking and dragging series, or sort them alphabetically by clicking 'sort.' Sorting is done by region followed by maturity length, price type and, if all else equal or not available, alphabetically by title.

Automatic attributes for Value labels

By default, Value Labels are generated using the non-common elements of the series descriptions. This works well for series using harmonized descriptions.

In certain cases, you might end up with very long value labels. To avoid this, this setting allows you to pick from a list of attributes you want to display to automatically generate the value labels.

Calculations

Here, you can add one or more calculations that will be performed on all selected input series. The available calculations are:

Open, Close, High, Low

The first, highest, lowest, or last value of the specified range.

Mean, Median

The mean or median of the range.

Last

The last valid value of each series.

Last common

The value at the last point in time at which all the included series have values.

Last non-forecast

The last value that is not a forecast in the series.  

Value at

The value at a specific point in time. If a series is missing a value for that date, the first available value before that date will be used.

Nth last value

The nth last value of a series, where a value of 1 gives the last value, 2 gives the second value to last, 3 gives the third value to last etc. 

Year, Quarter, Month, Week to date

The performance from the start of the period to the specified date. The performance is measured as the change compared to the last value of the previous period.*

Performance since

The performance between two specified dates. The performance is measured as the change compared to the last value of the previous period.*

Performance analysis works a bit different than performance calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value, while in Performance it gives an error if the specified start date is missing. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Years, Quarters, Months, Weeks back

The change from a selected number and type of periods before the specified date.*

For years and quarters, this is the same as using the 'Rate of change since' method and specifying the start of the range as '-1y' or '-1q'.

Rate of change since

The rate of change between two points in time.*

Rate of change analysis works a bit different than Rate of change since calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Percentage proportion

The percentage proportion of each series compared to the sum of all series at a specified point in time.

Standard deviation

The standard deviation of the range.

Percentile

The specified percentile of the selected range.

Lower, Upper tail mean

The mean of the values in the upper or lower percentile of the range.

Trimmed mean

The mean of the middle values as specified by the percentage.

Standardize

The mean divided by the standard deviation of the range.

Note that formula Standardize() won't give same outcome. In formula we standardize the series (value - mean)/stddev for each value. While in Scalar we calculate a standardized value for the whole series (or a specified interval) according to mean/stddev.

Settings for calculation methods

*Relative dates

Most scalar calculations require either a point in time or a time interval to be specified. You can use specific dates, but you may want the dates to update when new data is added. In that case, leaving the date box blank or using relative dates, such as '-1y', can be useful. It’s important to understand what default dates are chosen when none is specified, and how relative dates work in each context.

Point in time

First, we’ll talk about calculations that require only one point in time, such as value at. If the point in time box is left blank, the last valid value for each series will be used.

If you specify a relative date here, that date will be relative to the last calendar date, not relative to the last date for each series. If you would like the last calendar date to be used, even though not all series may have values, you should use the relative date '+0'.

Time intervals

If you leave the range start blank, the first available value for each series is used. If you leave the range end blank, the last available value for each series is used.

When you use a relative date for the range start and leave the range end blank, the end point will be the last valid value for each series and the starting point for each series will be relative to its last point, not the last calendar date.

If you use relative dates for both the range start and the range end, they will both be relative to the last calendar date.

Value labels

These are the categories of the output series produced. A tip for when you want to know what your chart will look like is to look at these categories listed as value labels. They are also the labels that will appear on the x-axis of your category chart, or the right side of your bar chart.

Output series

There are four possible ways of organizing your output. The one you should choose depends on:

  1. Whether you want to group your input series, and
  2. What categories you would like on the x-axis


These four options can be divided into two types based on whether or not you would like to group the input series.

One series per calculation & one series per input

Choose one of these two settings when you do not want to group your input series. The categories on the x-axis, then, are either the series names or scalar calculations.

  • One series per calculation: Use this setting when you want the input series names on the x-axis. It creates one category series per scalar calculation done, where the categories are the input series. Example:


  • One series per input: Use this setting when you would like the x-axis to contain the names of the calculations you’ve done in scalar. It produces one output series per input series, where the categories are the calculations done. Example:

New group after every n series & Partition into n series

Choose one of these settings when you do want to group the input series by some series descriptor, such as country. Switching between the two settings will switch which series descriptor is the category on the x-axis (value label), as illustrated by the example below.

  • New group after every

  • Partition into

The output of this setting also depends on the order of the series in the input series list. Pay attention to the group number that appears next to the series.

Series with the same group number make up the same output series. The application creates the output categories based on the descriptors that are not common within these groups.

Methods

Rates of change as valuepercentage or logarithmic are calculated in the following way:

  • value = y t y t n
  • percentage = 100 y t y t n | y t n |
  • logarithmic = 100 ln y t y t n
  • annualRateValue = c h i = 1 h z t + 1 i
  • annualRatePercent = 100 z t z t h c h 1

where c is the typical number of observations in one year.

Examples

One series per input

In this example, we calculated the average GDP growth per decade by adding one 'Mean' calculation per decade. We also used the setting 'One series per input'. This means that one series will be created for each input series that we use. We have 6 countries, so 6 category series will be created, one per country.

One series per calculation

We used the scalar analysis to produce a single category series for the YTD performance. Here, 'One series per calculation' means that one category series will be created per calculation applied.

Partition into & New group after every

We have taken last value for number of Females and Males in Labor Force. In first example series were sorted by the total number of persons in each country - chart displays Female/Male division.
Second example shows total number of Females and Males in Labor Force for all countries with division by country.

Highlighting series

See how to use formula to highlight chosen series or series based on a conditions.

Questions

How do I sort category series after Scalar?

You should use the Sorting analysis.  

With multiple category series, make sure that after having set the direction for the main series, you also set the direction of the remaining series using 'by [series name]', so that they follow the main reference-direction. 

How to show date(s) of observation?

When using 'Last common' or 'Value at' calculation method you can select metadata {s .ObservationDate}

Example:

What is the difference between the Rate of change analysis and selecting ‘Rate of change since’ when doing a Scalar analysis?

  • Rate of change analysis calculates the changes from the end of each time series while
  • 'Rate of change since' in Scalar analysis calculates it from the end of the whole calendar. Meaning that if some series do not end at the same observation date, the calculation range will differ.

You can set the 'Range Start' and 'Range End' in the Scalar analysis to make sure the calculation is done on the same range across all input series.

Example:

Why I can see 'Strict' option in one document and can't in other?

Since version 1.29 the 'Strict' option is not available in new documents as it is always turned on for calculation. File where you can see that option was created in an older version of Macrobond.

Vector autoregression

Overview

The Vector autoregression analysis (VAR) estimates the linear dependencies among a few series. The analysis can produce fitted values and forecasts for those series. In addition to estimating a given system, you can also automatically test different models and let the analysis pick the best one based on information criteria. The VAR analysis also allows for modelling of cointegrated variables. By calculating VECM you can estimate the speed at which a dependent variable returns to equilibrium after a change in other variables. Finally, the VAR analysis has a feature for calculating impulse response, the response of one variable to an impulse in another.

Estimation model

The main difference from regression analysis is that in VAR you have several dependent variables instead of one. A VAR can be thought of as a system of linear regressions, but the emphasis is on using lagged values of the dependent variables to model a set of variables. There is an equation for each variable that explains its evolution based on its own lags and the lags of other variables in the model.

The analysis yields a report that contains the estimated parameters of the system as well as several statistics that can be used as a test of the system's validity and stability. The estimation is made using all common valid observations for the model series in the selected estimation.

In the analysis, the dependent variables are called endogenous variables. There may also be exogenous variables. Such variables are only explanatory and are not modelled in the system. A model may be denoted as being of order p, called VAR(p), containing K endogenous variables. If there are 2 variables in a VAR (1) model, the system of equations can be written as:

y t = v + A y t - 1 + u t

The expression can be written in expanded form as:

y 1 , t y 2 , t = v 1 v 2 + a 1 1 a 1 2 a 2 1 a 2 2 y 1 , t - 1 y 2 , t - 1 + u 1 , t u 2 , t

The equations can thus be explicitly written as:

y 1 , t = v 1 + a 1 1 y 1 , t - 1 + a 1 2 y 2 , t - 1 + u 1 , t y 2 , t = v 2 + a 2 1 y 1 , t - 1 + a 2 2 y 2 , t - 1 + u 2 , t

The present value of y depends on the intercept v, the lagged value of itself and the other variable, and the error term u. Each error term is supposed to be uncorrelated with all lags of itself and lags of the other error terms.

An arbitrary number of successive forecasts can be calculated, and you must specify an end date for the forecast calculation.

When a system contains exogenous variables, assume that these are included in the vector x together with their lags and possibly including lag 0 (contemporaneous variables) so that x contains s elements. The system of equations for a model called VARX (p, s) can then be written as:

y t = v + i = 1 p A i y t - 1 + j = 1 s B j x j + u t

When there are exogenous variables, forecasts can only be calculated as long as there is data available for all the exogenous variables. You might want to add forecasts to these variables before they are passed on to the VAR analysis.

For a symmetric system, where each equation contains the same explanatory variables and lags, OLS (ordinary least squares) is used as the estimation method. For asymmetric systems, GLS (generalized least squares) is used, which requires an iterative procedure. This is more computationally intense, and the system might not converge fast enough to find a solution for large systems.

Impulse Response

In order to examine a VAR system, an Impulse Response (IR) can be calculated between two given variables. While an econometrician may assume a system with several variables describes some economic relationship, it can still be interesting to isolate two of them and explore their particular dynamics, in one particular direction.

IR calculates the response of one variable to an impulse in another for some period later in time. IR has also been called dynamic multipliers, because the simplest way to compute them is to multiply the reduced form VAR-matrix by itself i times for a horizon i. The effect of past values for all coefficients in the system are used in the calculation, but we only look at the number for the accumulated effect that one variable has on another.

A way of understanding this is that by substituting the errors with a one where we investigate the impulse and zero everywhere else, an econometrician can trace this unit shock to the given variable at each time period.

IR calculation only makes sense between endogenous variables. The Macrobond application presently allows only unit residual IR. The Macrobond user can investigate the residuals prior to interpreting the IR results. For simplicity, the IR is represented as a time series, with values starting at the forecast date. The response function has the same unit as response variable, usually expressed in percentage points. The IR function outputted by VAR is in reduced or regular form.

Note we don't have the functionality to calculate error bars for the impulse response.

Cholesky's method for impulse response

A popular approach to IR has been Cholesky orthogonalization, where the model is first transformed by multiplying it with the Cholesky factor of the residual-covariance matrix, so that responses to orthogonal impulses are attained. This is a good idea as far as respecting the idea of isolated effects, but also requires the residuals to have finite variance (Hannsgen, 2010).

When using the Cholesky method the unit (y-axis) of the impulse response is the same as the unit of the response variable.

Ordering matters, however not in VECM (where you can drag series up & down) but on Series list.

VECM

The VAR analysis also allows for modelling of cointegrated variables. With this assumption, the variables on differenced form are explained by vectors on level form in addition to the usual VAR form. The rationale for this is that some variables co-move in the long run by the force of some linear process, while having other dynamics in the short run.

This is often illustrated by the 'drunk and his dog'. The drunk is walking home with great difficulty and often gets lost, but eventually makes it home. The dog is running around but only so far away from its owner, and they eventually both make it home. So, in the long run the two are always moving together, despite the fact that the walk is quite random and unrelated in the short run. Economic interpretations are plenty and allow for many relations at the same time for example long and short run interest rates, consumption and price level and consumption and investment.

The whole system is specified in Granger representation as:

Δ y t = Π y t - 1 + Γ 1 Δ y t - 1 + Γ 2 Δ y t - 2 + + Γ p Δ y t - p + u t x . y z

where:  Π = α β T  is the low rank matrix where the cointegrating relations β are loaded onto each equation by α. The dimension of these matrices r is the cointegrating rank of the system. The “VAR part” contains the short-term variables and the coefficients Γ i are known as adjustment coefficients. This allows the model to capture some non-linearity that the regular VAR would miss.

The vectors  y t - 1 Π are the error correcting vectors. They are not errors as in residuals, but refers to the long run effects compensating for what is not captured in the short run. If rank should be zero so that Pi = 0, the model is theoretically reduced to a VAR on change form, i.e., all variables differenced once. If Pi is full rank, the system reduces to a VAR on levels form. (Note that both of these cases tell us nothing about stability of the dynamic system which it represents.) Since these are not proper VEC-models the Macrobond application does not allow them and throws an exception if you try to model it (this includes the cases of automatic rank identification). Hence getting the VEC model right can be cumbersome, and it is not necessarily superiority to a regular VAR.

The rank statistics are determined by MHM (Mackinnon-Haug-Michelis, 1999) critical values. Each rank level of the matrix ΠΠ is tested. There are two options:

  1. The trace test at level r has a hypothesis so that  H 0 : rank = r and  H 1 : rank = k
  2. The maximum eigenvalue test at level r, on the other hand, tests the null hypothesis  H 0 : rank = r against  H 1 :rank = r + 1

With the Macrobond application two approaches to VECM can be used: Johansen and Ahn-Rensel-Saikkonen. Johansen’s (1986) approach is solved for in a maximum likelihood scheme. It starts by identifying the cointegrating vectors. These are then subtracted from the original dependent variables. This reduces the system to a regular VAR which is solved using OLS. It is the best known and perhaps the most widely used VEC-model. Since the rank reduction of the matrix  Π   is done by means of eigendecomposition, this model may be viewed as employing noise reduction to clean up   Π .

Ahn-Rensel-Saikkonen is based on least squares regression. It starts by estimating the whole system x.yz so that   Π is full rank. Afterwards the proper rank reduction is made, decomposing   Π to a lower rank, but without changing the short run coefficients   Γ i . It is suggested in Brüggeman & Lütkepohl (2004) that this approach is more robust in small samples than Johansen.

Report

The VAR analysis automatically generates a report, which includes variety of statistical information.

Settings

Estimation sample range

Specify the limits of the estimation sample range. The default range will be the largest range where there is data for all the series.

Output residuals

When this option is selected a time series containing the residuals will be calculated.

Output the endogenous series

Select this option to include the endogenous series in the output.

Output the exogenous series

Select this option to include the exogenous series in the output.

Calculate impulse response

Select this option in order to calculate the impulse response of the specified length. Select in what equation the impulse should be applied and what variable.

Method

Unit - unit residual impulse response

Cholesky - Cholesky’s method for impulse response

Confidence band

Confidence bands for forecasts of each equation are computed using the VAR estimator covariance matrix. Since the VAR is ideally a stable linear dynamic system, the forecasted values are dynamically generated. This means that they converge toward some mean (zero if normalized). Therefore, the error bands must also converge to a constant value, an upper and lower bound, respectively. Because not much is known about the small sample properties about the Feasible Generalized Least Squares estimator used in the VAR, only asymptotic errors are computed. This makes the estimated error terms less reliable in estimations from short time series. It can be shown that the estimator variance of the FGLS is lower than or at least equal to that of standard OLS.

Autocorrelation test lags

Select this option in order to include a Portmanteau autocorrelation test in the report. Specify the number of lags to include. The number of lags should be larger than the highest number of lags of the endogenous or exogenous variables.

Max endogenous lags

Specify the maximum number of lags to include for the endogenous variables. You can further refine which lags to include in the model on the 'Lag settings for endogenous variables in the equations' tab.

Max exogenous lags

Specify the maximum number of lags to include for the exogenous variables. You can further refine which lags to include in the model on the 'Lag settings for exogenous variables in the equations' tab.

Find best model based on max endogenous lags for information criteria

Select this option to let the system automatically test what combination of symmetric lags are optimal based on the selected information criteria.

You can select the minimum and maximum number of lags of the endogenous variables to test and also the minimum and maximum of different lags (regressors) to include in each round of tests.

Select the setting 'Require stable process' in order to disqualify any model where the roots of the characteristic equation indicate that the model is not stable.

Type

Select if a series should be included in the model as an endogenous variable, exogenous variable.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to "Change in value" (one observation) while expressing the result in levels.

Intercept

Select if the intercept should be included in the model for endogenous variables. This option is not available per variable for VECM.

Restrict to CE

In VECM, both trend and intercept can be restricted to the cointegrating relations. This means that they are treated as deterministic variables within Π , on level form. These occur either on level or change form, never both. The variables for intercept and trend are added by adding VECM in the configuration box.

Equation name

Optionally specify the name of the equation to be used in the report.

Variable name

Optionally specify the name of the equation to be used in the report.

VECM

Enables VECM.

Configuration

  • Select method, Johansen or Ahn-Rensel-Saikkonen
  • Include intercept adds an intercept variable
  • Include linear trend adds a trend variable

Automatic cointegration test

Select whether to automatically find best cointegration rank or to enter it manually. The settings for automatic rank selection are described in the section Estimation model.

Examples

Vector autoregression

A model of five endogenous variables is defined in the Vector autoregression analysis. It is set to calculate a forecast for 1 month ahead. The model is a using three lags for each variable which is called a VAR (3) model.

VECM European rates

The endogenous series are swap rates of maturities 10 years and 5 years for Sweden, Denmark, and the Euro area. The short run dynamics of these data are known to be dominated by simultaneous highly correlated shifts of all rates. The high correlation of short-term movements is explained by stable relation of levels of rates; the slope of yield curves.

IFR Example

Questions

How to add and use dummy variable in VAR model?

Create binary series (0/1 series) using conditions in the formula language or in-house series. Series should be added as an exogenous variable.

Examples of dummy variables: 

quarter()=1 & year()=2020|quarter()=3 & year()=2020

Returns 1 if the observation is Q1 or Q3 for year 2020, 0 otherwise. Each quarter must have '& year()=2020' parameter, otherwise it will point to quarters in each year.

Counter()>=Date(2020, 1, 1)&Counter()<=Date(2022, 1, 1)

Returns 1 if the observation is between those dates, 0 otherwise.

Cop(usgdp, yearlength())<0

Returns 1 if the US GDP y/y growth rate is negative, 0 otherwise.

Forecast

Overview

The Forecast analysis enables you to add your own values to a series. The forecasting method allows the user to add both future values and overwrite historical ones. Forecasts can also be added in Series list, but this means that the forecasts will be applied to the raw data prior to any analyses. By using forecast analysis, values can be added as an intermediate step in the overall analysis.

Settings

Edit

When you press 'Edit', you will see a dialog where you can specify forecast values and specify how they should be added. 'Value method' lets you specify if the forecast should be added as an absolute value or be dependent on some other value either as a difference or as a percentage change from said value.

Date method gives you the option of defining forecasts both at a specific point in time and at a specific time horizon forward in time from the last observation or forecast.

Value preference

If a row in edit contains both an observation and a forecast, this option lets you choose which value to include in the output.

Missing value method

Here, you specify how the calculation treats missing values. This is the place to choose how to smooth line when using 'Date method' with period bigger than frequency of the document (i.e., 'Relative quarterly' while document is Monthly).

Adding forecast

You can add forecast values to a raw time series in the Series list, or after having applied some calculations to it.

Add a forecast at the beginning of the document

To add a forecast to the raw time series, before any calculations are made, click on Series list in the analysis tree. Open the tab called Forecasts and click Edit next to the series you'd like to add a forecast to.

Please note that these forecasts will be added in the original frequency of the time series, even if the document uses another frequency. Under 'Value preference' you can select which value (original or forecast) should appear when new original values will come in.

Add a forecast after some calculations

To add the forecast after having applied some calculations to it, add the forecast as an analysis later in the analysis tree.

Please note that these forecasts will be added in the original frequency of the time series, even if the document uses another frequency. Under 'Value preference' you can select which value (original or forecast) should appear when new original values will come in.

How to add forecast values?

  1. Select and add the series you want to work with, to the Series list
  2. In the Series list select the Forecasts tab or add Forecast analysis
  3. Click on the edit button to open the Edit forecast dialog box
  4. Select a Value method
    • absolute values
    • absolute differences relative other values
    • percent changes relative other values.
  5. Select a Date method
    • points in time
    • number of years since the previous forecast or value
  6. Select the scale to make it easier to enter forecasts for series expressed in millions or billions.
  7. Add new values to the relevant columns shown.
  8. All added values will be shown as forecasts in the chart, using 50% opacity.

Keep in mind

  • Forecasts are added using the original frequency of the time series
  • Changing the frequency of a document will result in a frequency conversion of all series values, including forecasts
  • Define how empty values should be treated, under Missing Value Representation in the conversion settings tab

  • If you want to apply forecasts before performing any calculation on the series, go to Forecast tab on Series list.

 

Example

Projected GDP of France

In this example, we added forecasts to the quarterly changes in the GDP of France.

Questions

How to extend series with forecast?

You can extend series using mechanism inside Forecast analysis. You can do this by:

  • points in time
  • relatively

See more information under In-app features - Forecast tab.

How to remove a forecast?

Depending how you want to proceed, there are two solutions. You can:

  • remove the forecast values entirely from a series
  • keep the values and remove the forecast indicator for them

To remove forecast values entirely, use the following expression in Series list. It replaces forecast values with null and returns only the non-forecast values of the series.

if(isForecast(series), Null(), series)

To remove the forecast indicator from a series, use the following expression in Series list. It returns the series values as non-forecast values when the condition (the second parameter) is false.

FlagForecast(series, 0)

Forecast values are usually displayed in a different color on charts, so removing the forecast indicator will make the entire series be graphed in the same color.

How to disable the special presentation of forecast values?

Forecast values are usually presented with a distinct style on charts. If you do not want to make a visual distinction of forecasted values, there are two main ways you can do this.

  • Remove the 'forecast flag' from all values in the series. (see above to question How to remove a forecast? for more information)
  • Change the graph presentation so that forecasted values look the same as those that are not forecasts

To do this select the graph in the chart you'd like to change. Under Presentation properties, change the graph style from Automatic to Custom, and click on Forecast. Here, you can change the forecast color to match that of non-forecast values.

How to hide forecast line in legend?

If you want to have one line for a series instead of default two, click on a legend and under Presentation properties > Appearance check 'Hide forecast'.

How to add forecast before the start of time series?

It's not possible. You can append values from another series with one of the join() formulas. For more information about this, click here: Joining with formula

How to smooth line created with Forecast?

If you, for example, have document in Daily frequency and used 'Relative monthly' you got some 'ugly' steps on your chart. You can smooth line by choosing a setting under 'Missing value method':