Regression

Overview

The Regression analysis implements a multiple linear regression model. The analysis aims to model the relationship between a dependent series and one or more explanatory series. Several models can be specified within one instance of the analysis. The output consists of the coefficients of the linear model, the predicted series, and several statistical indicators. If there is sufficient data after the end of the estimation sample range, forecasts can be calculated.

Estimation model

The Regression can only calculate static linear regression model. Analysis attempts to find the linear combination of a number of explanatory series that best describes a dependent series.

   ⇒   

The analysis uses the following Ordinary Least Squares (OLS) model:

y t = α + β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

 

y dependent series
x_i explanatory series
α intercept
β slope (coefficient, beta)
ϵ sum of the squared residuals
(sum of squared errors)

If the option No intercept is selected in analysis, then the constant α is not included in the model.

The parameters α and β are estimated by minimizing the sum of the squared residuals ϵ. The output from the analysis will include the predicted series calculated using the estimated parameters.

The automatic estimation sample range will be the largest range where there is data for all series. You can specify a smaller range if you like to limit the data to be used in the estimation.

Working with Regression analysis

Settings

Regression models

You can define one or more regression models. Each model has separate settings. When a new model is created, the settings of the current model are duplicated. Models can be renamed and deleted.

Output dependent series

Select this option to include the dependent series in the output.

Output explanatory series

Select this option to include the explanatory series in the output.

Estimation sample range

Specify the limits of the estimation sample range. The default range will be the largest range where there is data for all the series.

No intercept

When this option is selected, the constant α is omitted from the model and it will be defined as:

y t = β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

Residuals

When this option is selected a series containing the residuals will be included in the output.

Residuals for forecasts

If this option is selected, the series of residuals will also contain residuals for the forecasted values. Such residuals can only be calculated when forecasts are calculated and there is an overlap between the forecasts and the dependent series.

Uncertainty band

By selecting the option Uncertainty band, two additional time series will be calculated. These time series form a band around the predicted values by adding and subtracting a number of standard deviations. The standard deviations are based on Standard error of regression which is calculated as 'the square root of the sum of squared forecast residuals divided by the number of residuals', as described in the section Report.

Calculate forecasts

Forecasts will be calculated only if this option is selected and there is sufficient data, as explained in the section Forecast.

End point

You can limit how far into the future that forecasts will be calculated. If not specified, forecasts will be calculated as far as possible. In the special case, when dynamic forecast is enabled and the model contains only lagged versions of the dependent variable, a limit must be specified.

Allow dynamic forecast

Allow the use of predicted values of the dependent series when calculating forecasts.

Confidence band

By selecting the option Confidence band, two additional series will be calculated. These time series form a confidence band around the forecasted values. The band is calculated so that the forecast is within the band with the specified probability assuming that the forecast values are t-distributed.

Series settings

Include

Select if you want to include this series in the model.

Is dependent

Select which series is the dependent series. This must be specified.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to 'Change over value (one observation)' while expressing the result in levels. If you tick that option, the result will output the coefficients for intercept and diff(x1) rather than intercept and x1.

Diff->legacy

Calculate the predicted series by adding diffs to the dependent series (it was a default option for Macrobond version 1.26 and lower).

This setting does not affect the model itself. It only influences the step after the calculation of the model when the levels are calculated from the differences.

Diff->agg

Calculate the predicted series by aggregating the predicted differentials.

Lag to/from and Lag range

Here you specify the lags you would like to include for a specific series. When lagging a series, the values are delayed in time and the series stretches further into the future.

If you for example set 'Lag from' to 0 and 'Lag to' to 2 three series will be included, one series with no lag, one with a lag of 1 and one series with 2 lags. This will automatically change the lag range to '0 to 2'. You may specify the desired lags using 'Lag to/from' or 'Lag range', the result will be the same. If you set Lag range to a single digit or set Lag to and 'Lag from' to the same value, a single lagged series will be included.

When lags are specified for the dependent series, the lagged series will be used as explanatory series in the model. The dependent series will always be without lag.

How to create simple regression model?

  1. Check boxes for 'Output the dependent series' and 'Output the explanatory series'.
  2. Check 'Include' for at least two series and mark one as 'Is dependent'.
  3. Add Scatter chart. Go there and open Graph layout (Ctrl+L).
  4. Pair series to generate a regression line. Make sure to set the right order of the series in Graph layout window (Note that when series is lagged there won't be a straight line as outcome.):

    a) Pair #1: the explanatory and the dependent series

    b) Pair #2: the explanatory and the predicted series

    Line A Line B
    series 1 series 1
    series 2  series 2 [predicted]
  5. Click on one of the lines, go to Presentation properties > Appearance, change Graph style to Custom. Then set Line to None and select Marker style.

How to add best fit line through different series' last values?

With formula output only last value from each series. Lag each value and glue them together with Cross section creating fake time series which can be fed to Regression analysis. See the file with explanation under Best fit line for last values of group of series with and WITHOUT Regression analysis.

Common errors

Too few time series in graph

  1. Check if you have added Category scatter chart - use Scatter chart instead.
  2. In left panel you should see at least 3 series in the output. Check if you have enabled 'Output dependent series' and 'Output explanatory series' at the top in Regression analysis.
  3. Check if you have two pairs of series in Graph layout. If not, see how to pair them under How to create simple regression model?.

Degree of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom is the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.
The number of observations must thus be larger than the number of independent (explanatory) series.

This might be caused for example by changing document's frequency to lower (i.e., from Monthly to Annual) or series used have not enough overlapping observations.

Forecast

How it works?

When there are data for all the explanatory series beyond the estimation sample, we can use the estimated parameters to calculate forecasts. This is done by checking the option 'Calculate forecast'. If no end point is specified, the analysis will calculate as many forecasted values as possible. You can specify an end point if you want to limit the length of the forecast. End point refers to 'up to, but not including'.

Forecast can't go further than the longest common part of explanatory series.

See 'Regression - how forecast is calculated' file under: Examples

Dynamic forecast

⇒   

Dynamic forecasting uses the data generated by the model as input to the model to calculate additional forecasts. To enable it check box for 'Allow dynamic forecast' under Forecast panel in Regression analysis.

In example above we have included an explanatory variable that is a lag of the dependent series. It is the lagged series that limits how far we can calculate the forecast. This allows us to use the predicted data as input to calculate the forecast further. The analysis will only attempt to do this if you select the 'Dynamic forecast' option.

There is one special case to be aware of. If all the explanatory series are lagged versions of the dependent series, you can use dynamic forecast on the series infinitely many times. In this case you must specify an end for the forecast since there is no way for the application to know when to stop.

Report

Overview

The Regression analysis automatically generates a report, which includes variety of statistical information.

Calculation range

The calculation range used for the analysis.

Observations

The number of observations used in the analysis. This includes all observations in the calculation range where there are values for all series.

Degrees of freedom

The number of observations minus the number of explanatory series and minus one for the constant parameter.

R2

Compares the variance of the estimation with the total variance. The better the result fits the data compared to a simple average, the closer this value is to 1.

In the ordinary case when an intercept term is included, this value is calculated as the square of the correlation between the dependent series and the estimate. In this case R2 will always be between 0 and

If the option "No intercept" has been selected, R2 is calculated in a different way since the dependent and estimated series can now have different mean values.

Please note that R2 for models that allow an intercept term cannot be compared with models that do not allow intercept. Typically, R2 for models that do not allow intercept, will be higher than the corresponding model with intercept, but that does not mean that it is a better fit.

Adjusted R2

To overcome the issue with R2 that it will always be higher when you add more variables to your model, you often look at the adjusted R2 that is calculated in the following way:

1-(1-R2)*(n-1)/(df-1)

Where n is the length of the series and df is the degrees of freedom.

F

The F-ratio is the ratio of the explained variability and the unexplained variability each divided by the corresponding degrees of freedom. In general, a larger F, indicates a more useful model.

P-value (F)

The p-value is the probability of obtaining a value of F that is at least as extreme as the one that was actually observed if the true values of all the coefficients are zero.

Sum of squared errors

The sum of the square of the residuals.

Standard error of regression

The square root of the sum of squared errors divided by the degrees of freedom. This is an estimate of the standard deviation of residuals.

Standard error of forecasts

The square root of the sum of squared forecast residuals divided by the number of residuals.

Durbin-Watson

The Durbin-Watson is a test statistic used to detect the presence of autocorrelation in the residuals. The value is in the range 0-4. A value close to 2 means that there is little auto correlation. Values from 0 to less than 2 point to positive autocorrelation and values from 2 to 4 means negative autocorrelation. The result from this test is not useful if any dependent series is included with several lags or if no intercept is included in the model.

For more information about this see Investopedia.

Information criteria

The information criteria are measures of the expected information loss. A lower value means that more information is captured. This can be used to compare models when the same data is used in the models.

AIC

Akaike's information criterion.

HQ

Hannan and Quinn's information criterion

Schwarz

Schwarz criterion also known as Bayesian information criterion

Coefficient

The estimated parameters
Regression - 4

Standard error

The standard error of the estimated parameters

t

The estimated coefficient divided by the standard error

P-value

The p-value is the probability of obtaining a value of t that is at least as extreme as the one that was actually observed if the true value of the coefficient is zero

How to output coefficients?

This analysis doesn’t produce as output the coefficients from the models. In main application they are only available in the 'Regression report'.

You can access them through Excel add-in, to do this:

  1. Right click on Series list and select 'Copy your series as Excel data set'. Paste it to Excel.
  2. Right click on red object, go to Edit. In new window as Select output, choose Regression and in next field Category series.

You can also calculate them separately in main application - see paragraph below.

Errors

Sometimes Model cannot be calculated and instead you will see error message indicating what is preventing calculation.

Calculation range end date is before start date

One (or more) series is too short. Check them on Time table before Regression analysis and exclude it from the calculation.

There is a linear dependency between the independent series

This means the equation system cannot be solved due to arbitrary values can be assigned to one or more of the constants in the equation and residuals can't be calculated.

Usually it appears when you turn frequency of series from lower (i.e., Annual) to higher (i.e., Monthly) thus those series have repeated values for an entire period. As a solution go to Conversion settings tab > To higher... and select there 'Cubic interpolation'. Some further manipulation might be needed (i.e., deleting lags for some series) but it depends on composition of your model.

Degrees of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom is the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.
The number of observations must thus be larger than the number of independent (explanatory) series.

This might be caused for example by changing document's frequency to lower (i.e., from Monthly to Annual) or series used have not enough overlapping observations.

Calculating regression with formulas 

To calculate α, β and R2 use:

Intercept(series1, series2)
Slope(series1, series2)
Pow(Correlation(series1, series2), 2)

where series1 is the dependent series and series2 is the explanatory series. If you get different values than from analysis check 'Estimation sample range' - it has to be calculated on identical time range. To avoid adding Cut() formula everywhere you can set data range on Series list.

Formulas above calculate the regression between two series, but if in Regression analysis are more series these won't be comparable models - you will get different outcomes.

Examples

Regression model - multiple series

In the Regression analysis, we first defined the variables of the model, by:

  • Marking the Industrial Production as dependent variable
  • Specifying the lags for the explanatory series (these numbers are based on the Correlation analysis).
  • Defining the output, we want to have in the chart: dependent series & residuals

We also decided to calculate forecasts. This is possible as all explanatory variables have been lagged, meaning we can calculate forecasts for the shortest number of lags defined, here 2 months.

Regression S&P and VIX scatter chart

In the Regression analysis, we checked as output both the dependent and explanatory variables. Both series, as well as the predicted series, will be needed in the Scatter Chart to show one week change in both in indices.

Regression - how forecast is calculated

An example showing how forecast with lagged series work.

Philips curve

Estimation of the Phillips curve based on the observations with fit line created through Regression.

Best fit line for last values of group of series with and WITHOUT Regression analysis

See how to prepare series to show best fit line for last values of group of series in Regression analysis and also how to do this without Regression at all - only through calculations.

Questions

Can I add non-linear regression?

Generally no, but in Examples you can find Philips curve built with Regression.

From where it is taking its residual?

The residual stems from the difference between the predicted model and its main series.

How to add dummy variable?

You can create binary series (0/1 series) using conditions in the formula language. These formulas need to be applied before the regression analysis. In Regression please add such series as explanatory.

For example: 

quarter()=1|quarter()=3

Returns 1 if the observation is Q1 or Q3, 0 otherwise.

quarter()=1 & year()=2020|quarter()=3 & year()=2020

Returns 1 if the observation is Q1 or Q3 for year 2020, 0 otherwise. Each quarter must have '& year()=2020' parameter, otherwise it will point to quarters in each year.

DayOfWeek()=5

Returns 1 if the observation is a Friday, 0 otherwise.

Counter()=EndOfYear()

Returns 1 if the observation is the last one in a year, 0 otherwise.

Counter()=Date(2020, 4, 1)

Returns 1 if the observation is a 1st April 2020, 0 otherwise.

Cop(usgdp, yearlength())<0

Returns 1 if the US GDP y/y growth rate is negative, 0 otherwise.

For more information see: Built-in formula functions

How to do a logarithmic regression?

You can calculate it with formulas but express the explanatory variable as Log(series).

For more information see: Built-in formula functions

How to do a regression against time (on one series)?

To perform linear regression on one time series (where the independent variable is time) use Counter() on Series list and perform Regression analysis.

For more information see: Built-in formula functions

Why model's values are different than the ones coming from a same model but rolling?

The differences stems from different time ranges being taken into calculation because of:

  1. Mon-Sun daily series vs Mon-Fri daily series
  2. what Macrobond takes into account when calculating a sample range

To ensure that you are looking and comparing the exact same time periods use one of the below methods:

  • change frequency in the document to Daily (not Daily (highest) or Daily (lowest)) and set Observations to Monday-Friday
  • change the window size in Rolling regression from, for example 'X months', to number of observations in Regression i.e., '402'.

The first method will give closely similar results but still not 100% same due to the way Macrobond sets Start range i.e., '-18m' is not strictly '18m', but '18m' counting from the previous observation so, effectively '18 months +1'. In which case, you may want to set the Window length to a number of observations instead.
The latter approach will output the same values in both Regression and Rolling regression.

All about extending series

Built-in formula

Extend()

Extend formulas - see comparison here.

Extend with last value to the end of a period

The following formulas, which you can type into the Series list, will extend the input series to the end of the corresponding period:

ExtendToEndOfMonth(series)

ExtendToEndOfQuarter(series)

ExtendToEndOfYear(series)

If you want to mark the extended values as forecasts, use:

ExtendLastAsForecast(series)

Note series will be extended till the end of the common calendar (established at the last step before using this formula) which depends on the series included in document.

Extend with another value to a specified point in time

Using the formula below, you can customize what value is used and until when the series is extended.

Extend(series, observation, number)

It can be translated as Extend(which series, how far, which value) and in order to work effectively you will need helper formulas like Last() or Endvalid(), for example:

Extend(uscpi, Last(EndOfYearAhead(0)), LastValid(uscpi))

will extend uscpi series to end of current year with its last value. While:

Extend(usrate0001, Endvalid(usrate0001)+Monthslength(2), 0.75)

will extend series usrate0001 and will put 0.75 value two months from its end. Observations between end of series and new point will be filled in automatically with series last value. See the file at the start of this chapter to find solution for this (it's another formula).

Extend with YoY percentage from same series

Below formula extends the series using the % YoY change to calculate a forecast for the whole calendar range. New values will be marked automatically as forecast.

ExtendLastYoYForecast(series)

This one has an additional parameter to point to the end of extending range.

ExtendLastYoYForecast(series, observation)

Note series will be extended till the end of the common calendar (established at the last step before using this formula) which depends on the series included in document. But here you can use same method with EndValid() as in previous example:

ExtendLastYoYForecast(series, endvalid(series)+Yearslength(6))

ExtendLinear() vs LinearExtended()

ExtendLinear vs LinearExtended - see comparison here.

Both formulas, ExtendLinear() and LinearExtended(), are used to extend the series. With similar names and purpose, they can be easily confused but they extend the series in different way.

ExtendLinear uses linear interpolation to extend the series between the latest value and a manually specified value in the future, thus extending the series with a trend.

LinearExtended uses the Least-Square method to create a fitted line, which in turn is used to extend the series to a specified point in the future.

Extend trend from part of series

Extend trend from part of series.

Use Cut() and LinearExtended() to calculate trend from sample of the series.

Extend with value x observations from end of series

Extend with value x observations from end of series.

Extend() will put selected value only at the end of extension. Here you can find out how to select one of the values from inside series and repeat it after end of original series.

Extend with x last values and roll them

Extend with x last values and roll them.

Extend() series to create last valid point in the future. Extend series with last few values and roll those values till that last valid point.

Join()

Join formulas - see comparison here.

Joining two series

You can also extend one series with another. For that we have:

Join(series1, series2)

It can be translated as Join(older series, newer series). After last value of series1 Macrobond will add values from series2 from the respective points in time. This can also be modified by third parameter which will point exactly where to connect series':

Join(series1, series2, Start(series2))

Join(series1, series2, Date(yyyy, mm, dd))

Joining two series, one with more history

We also have special join formula for appending historical values before the start of series1:

JoinMoreHistory(series1, series2)

Unlike the standard Join formula this one translates to Join(newer series, older series). It takes all observations of series1 and adds all values of series2 that are before the start of series1.

Joining two series with different scales

For these formulas, a factor is applied to scale up the series:

JoinScaled(series1, series2) 

JoinScaledAppend(series1, series2)

JoinMoreHistoryScaled(series1, series2)

JoinScaled() works like Join(). It will join series2 at the end of series1. At the date of the junction, series1 (older series) is scaled to match value of series2 (newer series). The rest of the values of series1 are then adjusted using the factor used at the date of the junction.

JoinScaledAppend() is similar to Join() and JoinScaled(). It will connect series at the end of series1, but it will scale series2 (newer series) to match series1 (older series).

JoinMoreHistoryScaled() works like JoinMoreHistory(). It will add more history to series1 (newer series) using the values of series2 (older series) that are before the start of series1. At the date of the junction, series2 (older series) is scaled to be equal to the value of series1 (newer series). The rest of the values of series2 are then adjusted using the factor used at the date of the junction.

Joining more than two series

Join formula can connect only two series. If you need to connect more, please insert one join formula into another:

Join(series1, Join(series2, series3))

Joining many series for different periods (but they start in same point in time)

Let's say you have series presenting values for different periods of time (i.e.; 2024, 2025, 2026 etc.) but they all start in 01/01/2023. The best way is not to use join() but moves them with formula and add them with Cross section.

Join many series with same start point

Join many series with same start point and original series

In-app features

Partial Periods functionality – filling in last non-complete period

If series is for example Daily and you want to convert its frequency to Annual, you will not see value for a current year because there is no 12 months of data - incomplete periods are removed.

In such situations use Conversion settings' 'Partial Periods to lower frequency method'. It will let you choose your preferred setting how to handle incomplete periods and mark the outcome as forecast value.

Forecast tab

You can add forecast to the raw time series, before any calculations are made through tab on Series list or after calculations through Forecast analysis.

Note that these forecasts will be added in the original frequency of the time series, even if the document uses another frequency.

For more information see Adding forecast on Series list and Forecast analysis.

Extend with points in time

You can add absolute values to a raw time series in the Series list, or after having applied some calculations to it. Values can be added manually or copied (vertically aligned) and pasted in.

When new values will appear forecast will be overwritten unless you change 'Value preference' in Forecast tab/analysis.

Extend relatively

Forecast can be added as absolute values for each point in time but it can be set as relative. This method will always keep your forecast ahead of existing series.

Growth/Increase by % - custom methods

Extend based on percentage values from another series

When you have one series and you want to extend it based on another series (PoP or YoY) you can do it with Cut() and AggregateProduct():

Extend based on PoP-series (recommended)

Extend based on PoP-series - aggregate joinmorehistory

Extend based on PoP-series - multiple join()

Extend based on YoY-series - multiple join()

You can also extend series when future estimates are in more than one series, you just need a different approach:

Extend based on many %-series

Many extends based on different series

Growth

Continuous growth rate and Linear growth rate

Below example is for continuous growth rate of 5%:

AggregateProduct(1+0.05/100)*100

and this one represents Linear growth rate of 2% over time:

AggregateProduct(Pow(1+0.02, 1/round(YearLength(), 0)))*100

Continuous growth vs Linear growth - see comparison here.

Growth Till Date by Level

Custom formula which calculate stable growth by applied level (i.e., 1 % is expressed as 0.01) till a point in the future. That point can be also set as relative for example from the end of original series.

.GrowthTillDate(series, endDate, level) = 
 let 
  .extended=cutstart( (extend(series, endDate, LastValid(series))) , end(series)) 
  .lastWithMultipliers = (if(counter(.extended)>end(series), 1+level, if(counter(.extended)=end(series), series, 1)))
 in
  join(series, flagforecast(AggregateProduct(.lastWithMultipliers)))
 end

Growth Till Date by Level

See more information about under How formula with dot works? and User defined formulas.

Growth by Annual Rate

Custom formula which can calculate annual rate starting from any point in time of a series till any point in the future.

.GrowthByAnnualRate(baseSeries, annualRate, startDate, endDate) =
let
  .rate = Pow(1+annualRate/100, 1/YearLength())
  .base = At(baseSeries, startdate)
  .period = CutStart(Extend(.base, endDate, .base), startDate)
in
  .period*Pow(.rate, Counter(.period)-startDate)
end

Growth by Annual Rate

See more information about under How formula with dot works? and User defined formulas.

Growth with first point after certain value

Select a value - when series will hit this point formula will calculate growth factor. Next it will calculate future values based on this growth factor.

Growth with first point after certain value

Increase by X% each year, month or day

We recommend using Forecast - extend relatively. With this function your forecast will always be ahead of series. It has relative weekly, monthly, quarterly and yearly setting.

Daily

Series begins at Date() with value 0 and then increases 0.5/360 for every calendar day

if(Counter()>Date(2018, 04, 05), (Counter()-(Date(2018, 04, 05)))*(0.5/360), 0)

or

AggregateProduct(If (Counter()>Date(2018, 04, 05), 1+(0.5/360), 1))-1

After Date() increase by daily - see comparison here.

Trend lines

joinmorehistory(lag(AggregateProduct(1.5)*100, 1), 100) 50% daily growth
joinmorehistory(lag(AggregateProduct(1.2)*100, 1), 100) 20% daily growth
joinmorehistory(lag(AggregateProduct(1.1)*100, 1), 100) 10% daily growth
joinmorehistory(lag(AggregateProduct(Pow(2, 1/5))*100, 1), 100) double value every 5 days

Trend lines (x% daily growth) - see these formulas used in COVID chart.

Increase by X% each time period

This example uses helper series created with AggregateProduct() creating new values based on increasing '1' by X%. Values are created since a particular Date() and extended till the end of original series. Then outcome is Rebased relatively by original series.

Increase by X% each time period

Extend backward/backfill history

Backfill history

This feature is available in Macrobond 1.30 and later.

Macrobond may be required to create a new time series with a recent historical start point if there is a change made to the methodology of the time series or any other change. You can now easily locate related discontinued time series by right-clicking on series and selecting 'Add superseded'.

You can combine series with formula, for more information see Join().

Create in-house and use join

Create an Account in-house and then use join() formula to connect older values with series. This is the best option we can recommend right now.

For more information see Creating Account in-house and Join().

Join shorter series with 0

If another series has more historical values and you want to match its time frames, use formula:

join(0, series, Start(series))

Note it will add as much 0 values as there is points in time in common calendar. It won't add anything if all series start in the same moment.

Add 100 at the start of series (for recursive formulas)

Use below formula to add 100 at the start of series:

JoinMoreHistory(100*AggregateProduct(1+fx:s1/100), 100)

Calculations with formulas

How to:

Annualize a monthly P/P series with a formula?

Use the following expression:

pow((1+(series/100)), YearLength())-1

The expression '1+ (series / 100)' will be raised to the power of 'yearlength()'. After which 1 will be subtracted from the result.

Change the color of a series when it’s below / above 0 (or any other value)?

  1. Create two series, one containing only the values above 0 and another containing the values below 0  and then graph each series in a different color. Do the following:
    a) In the series list, type the expressions:

    if(series >0, series, Null())
    
    if(series <0, series, Null())

    b) In the chart, click on a graph and open the Presentation properties tab. Select 'Custom' from the graph style drop-down menu in the appearance group. Select the color of your choice.

  2. Flag the values below 0 as forecast values, and change the color in which forecast values are graphed, by doing the following:
    a) In the series list, type the expression:

    if(series > 0, series, flagforecast(series))

    b) In the chart, click on the graph of the series and open the presentation properties tab. Select 'custom' from the graph style drop-down menu in the appearance group. Click 'forecast.' Select the color of your choice.

  3. Flag the values above 0 as forecast values, and change the color in which forecast values are graphed, by doing the following:
    a) In the series list, type the expression:

    FlagForecast(fx:s1, fx:s1>0)

b) In the chart, click on the graph of the series and open the presentation properties tab. Select 'custom' from the graph style drop-down menu in the appearance group. Click 'forecast.' Select the color of your choice.

Create an if condition/statement?

Formula language can be used in both the series list and the formula analysis. The if-statement is a formula requiring three parameters:

if(condition, value1, value2)

which can be expanded on as:

if(condition, if_TRUE_return_this , if_FALSE_return_this)

For example:

if(sek > 8, sek, 0)

Which returns a series with values of 0 on days when the series is below 8 and values equaling the series when the currency pair is above 8.

Create a continuous growth rate?

To create constantly growing line use formula for compound interest:

AggregateProduct(1+VALUE/100)*100

for example, 5% continuous growth rate would be:

AggregateProduct(1+0.05/100)*100

Combine an if condition/statement with the logical operators and/or?

To include 'and' in the function use the ' & ' sign as such:

if(condition1 & condition2, value1, value2)

which can be expanded on as:

if(condition1 and condition2, if_TRUE_return_this , if_FALSE_return_this)

For example:

if(sek>7 & sek<8, sek, 0)

which returns the values of the series on days when the series is between 7 and 8. For all other observations, the value of the series will be 0.

To include 'or' in the function use the ' | ' sign as such:

if(condition1|condition2, value1, value2)

which can be expanded on as:

if(condition1 OR condition2, , if_TRUE_return_this , if_FALSE_return_this)

For example:

if(sek>7 | sek<9, sek, 0)

which returns the values of the series on days when the series is above 7 or below 9. For all other observations, the value of the series will be 0.

Make an index out of P/P series?

To create an index from a return series, use formula for compound interest:

AggregateProduct(1+(series)/100)*100

The ' *100 ' creates here starting base value.

Note that for different series you might need to transform this calculation.

Disaggregate a series?

Series which are aggregated annually can be 'disaggregated' by using an If() statement in the formula language.

Example:

If(Counter()=StartOfYear(), cngpfi0091, Momentum(cngpfi0091, 1))

The logic of this formula is: keep the first value of each year as given, but the subsequent values in that year are calculated by subtracting the previous value from the current value.

Replace null values for 0 in a time series?

You can use the function:

Null0(series)

Rolling principal components analysis

Overview

The Rolling principal components analysis (Rolling PCA) allows you to calculate a set of linearly uncorrelated series, or components, from a set of possibly correlated series. Rolling PCA enables you to do a time-dependent calculation which uses a moving or expanding window to compute the calculation.

For information about not-rolling version of this analysis see Principal components analysis.

Settings

General

Do not include series used in calculations in the output

When checked, any series included in the calculation will be excluded from the output. Uncheck this setting if you want both the original series and the calculation result in the output.

Include new series automatically

When checked, any new series added to the Series list will automatically be included in the calculation.

Select method for creating matrix

Use correlation (normalize input)

The eigenvectors will be calculated from the correlation matrix. This means that the input is centered and normalized before the components are calculated. PCA is sensitive to the scale of the input. Therefore, use this setting if variables are of different units, e.g., currencies and indices.

Use covariance

The eigenvectors will be calculated from the covariance matrix. This means that the input is only centered before the components are calculated. Remember that if you choose covariance, the input is not normalized, and the analysis will be sensitive to the scale of the input.

Select window type

Use expanding window

Observations will be added successively to the calculation one at the time from the beginning of the start date to the last observation available.

The calculations will start when there are as many observations as there are components.

Use moving window

The analysis will be performed on a specified window of observations that moves forward one observation at the time. Check this setting if you want to set the length of the moving window.

The window size cannot be smaller than the number of components and the calculations will start when there are enough observations to fill one window.

Output

Output: Eigenvalues/Cumulative proportions

The output is either the eigenvalue of each principal component of each window as we 'roll' over the input series or the cumulative proportions of the captured variance. The output will thus be as many time series as input series.

Output series description

Specify the description of the output series or use the default description.

Include

Select what series to include in the calculation.

Example

Expanding window in Rolling PCA

In this example, we use an expanding window to determine how much systemic variance was explained by our Principal Components before and after the financial crisis. We also compare two principal components from the 'Static' PCA with the components from the Rolling PCA

Ratios

Introduction

With the Ratios feature, you can have most series as divided by common rations such as GDP or population in your Series list by a simple click of a button. This feature works for regions where Macrobond have identified the required key series for the denominator. You can select it from within Search tab or Browse tab:

You can also add the expression in the Series list - it is represented with a series expression like:

#PerGdp(series)

If you want to see underlying calculation behind the ratio, go to the Series information tab in Series list, right click on the series and select 'View calculations'. This will open a new tab in Analytics with a snapshot of the calculated series document containing the calculation. You have the option to save this as a real calculated in-house series if you like.

Ratio behaves like a time series, they have fixed frequency and attributes.

Benefits of using Ratios

Using the Ratio feature has several advantages:

  • It can happen that sources do not calculate such ratios themselves (example: Current Account as % of GDP). Therefore, such series are not available in Macrobond and need to be calculated by you. Ratios allow you to perform this calculation with one click.
  • Ratios can be performed on multiple series at once.
  • You save a lot of time on deciding on the best possible combination of the underlying series.

Types of ratios

Using the Ratio feature, with one click you can create a ratio expressing the relation of a series of your choice to one of the following, predefined indicators:

#PerCapita(series)

It divides the selected series by the population of this series’ Region.

#PerGdp(series)

It divides the series by the GDP value, in current prices, of this series’ Region.

#PerGdpPercent(series)

It divides the series by the GDP value, in current prices, of this series’ Region and sets the result as a percentage.

#PerCpi(series)

It deflates the selected series by the headline CPI of this series’ Region.

#PerCpiCore(series)

It deflates the selected series by the core CPI of this series’ Region.

#PerCpiHarmonized(series)

It deflates the selected series by the harmonized CPI of this series’ Region.

#PerPpi(series)

It deflates the selected series by the PPI of this series’ Region.

How it works?

PerCapita

  1. Macrobond uses the series of the variable that has the same frequency as the nominator, where the source is indicated. If there is no match, it uses a series with the closest higher frequency. If there is no match neither, as a last resort,  it uses a series with the closest lower frequency.
  2. If the series from point 1) has shorter history than the numerator, Macrobond looks for all the series with longer history. If any of them can be found, Macrobond uses the one closest in frequency (higher frequency is preferred) and extends the series from point 1) using the formula function JoinMoreHistoryScaled().
  3. If the series from point 1) covers more future observations than the numerator, Macrobond looks for all series that covers more future observations. If any of them can be found, Macrobond uses the one closest in frequency (higher frequency is preferred) and extends the series from point 2) using the formula function JoinScaledAppend().
  4. For the denominator, Macrobond sets the frequency conversion to Interpolate and sets the Partial period method to Rate of change.

PerGDP & PerGDPPercent

  1. First, Macrobond tries to find a series that has the same or higher frequency from the source. If this is not available, a lower frequency is used.
  2. Macrobond tries to match the seasonality of the numerator from the database. If the numerator is seasonally adjusted, but there is no adjusted series from the source, it seasonally adjusts the denominator. If the numerator is not seasonally adjusted, but the source only offers adjusted series, Macrobond seasonally adjust the numerator.
  3. If the numerator is a flow series, Macrobond converts it from annual rate in case they are of different frequency or if only one of them is expressed in this way.
  4. If the numerator is not a flow series, Macrobond annualizes the denominator, if it is not already in annual rate, by multiplying by the number of observations per year.
  5. When doing seasonal adjustments, the Seasonal adjustment Census X-13 analysis is used if the series is quarterly or monthly. If the series is annual, Macrobond does not add any seasonal adjustment. In all other cases the Moving Average version of the seasonal adjustment is used.
  6. If the GDP series ends earlier than the numerator, the formula function ExtendLastYoYForecast() is used to extend the series.
  7. Finally, the numerator is converted to the same currency as the denominator.
  8. The PerGDPPercent version follows the same methodology, but the result is then multiplied by 100 and the unit is set to “percent”.

PerCPI, PerCPICore, PerCPIHarmonized, PerPPI

  1. First, Macrobond tries to find a series that has the same or higher frequency from the source. If this is not available, a lower frequency is used.
  2. Macrobond tries to match the seasonality of the numerator from the database. If the numerator is seasonally adjusted, but there is no adjusted series from the source, it seasonally adjusts the denominator. If the numerator is not seasonally adjusted, but the source only offers adjusted series, Macrobond seasonally adjust the numerator.
  3. When doing seasonal adjustments, the Seasonal adjustment Census X-13 analysis is used if the series is quarterly or monthly. If the series is annual, Macrobond does not add any seasonal adjustment. In all other cases the Moving Average version of the seasonal adjustment is used.
  4. When the price index series is chosen and seasonally adjusted (if applicable), the price index series is divided by ‘100’.
  5. There are no limitations to what series that can be divided by the price denominators, even data published in real, volume, terms can be divided by these ratios.
  6. If the frequency of the numerator is lower, the price index series is averaged. If the frequency is higher, the price index series is linearly interpolated.

Other macros

#Annual(series)

It turns series into Annual frequency with last value in a year and extends last, incomplete year. It doesn't change frequency of the document.

#DeAnnualRate(series)

It divides the series by the observation count per year and sets the result as a Flow series.

#SA(series)

It applies seasonal adjustment from Census X-13. It truncates the data to fit it into model so output is limited to 82 years (Annual series) and 64 years (Monthly series).

You can combine macros:

#PerGdpPercent(#Annual(debopa0256))

Settings

Frequency conversion

Like any Macrobond document, all series used in the Ratio will be expressed in the same frequency. By default, the document will use the frequency of the series of your choice for which you would like to create a Ratio.

For example, if you want to calculate a ratio of GDP Per Capita with quarterly frequency, you can select a quarterly GDP flow series and divide it by an annual population stock series (using the PerCapita ratio).

By default, Flows series are converted to a higher frequency by aggregating their data points, and stock series by interpolation.

Missing Data

Sometimes, when selected series have different frequencies or one of them is lagging towards the other, the application extends the shorter series by using the ExtendLastYoYForecast() formula function.

Seasonal adjustment

This choice depends on your chosen series for which you want to create a ratio. If you choose a seasonally adjusted series the application matches it with an existing (official) seasonally adjusted series, where it is possible. If not, seasonal adjustments are calculated in the application.

Currency conversion

By design, the ratio follows the currency of the denominator.

Additional settings

  • Extend partial periods setting is always set to automatic for the denominator series.
  • The frequency of the calculation is set to that of the numerator.
  • Annual rate is removed in case series are of different frequency or if only one of them is expressed in this way.
  • The result should not be associated with any currency.

How to apply a ratio?

Find series in data-tree, select second red button next to it and from drop-down menu choose one of the ratios. Press on '÷ >' sign and a ratio will be added to your chart.

When you have more than one series use the '÷ >' sign next to 'Add selected time series'. If some series do not have option for particular Ratio you will be informed with exclamation sign on yellow dot.

You can also type it in manually. For more information about types see Types of ratios.

Checking calculation behind a ratio

Behind each ratio there is a calculation with predefined series as denominator and additional formulas. To see the calculation, go to Series information tab on Series list. Right-click on series and select View calculation. New file will open and you will see full formula for selected ratio.

If you want to change the calculation and use it elsewhere, type in changes and save calculation as Calculated in-house series.

Unit Root Test

Overview

The Unit Root Test provides you with a tool to test if a series is non-stationary. More specifically, it performs an Augmented Dickey-Fuller (ADF) test of the null hypothesis that a time series has a unite root, which will violate the underlying assumptions in many statistical models. The analysis produces a report in which you can read and interpret results from the statistical test.

Estimation model

If a unit root is present in the system, the variance may be infinite overall and the data can be difficult to use in further analysis. The ADF test regression equation has the form:

Δ y t = α y t + i = 1 p β i Δ y t i + γ c + δ t + e t

α is the coefficient of the endogenous variable y on level form and is used for the actual test. Viewed as an autoregressive process, AR(p), the β -coefficients are for the p lags of y on difference form, Δ y . The ADF test (Dickey & Fuller, 1979) introduced lagged differenced variables to whiten the residuals, e, in the estimation. The lag length, p, should be chosen carefully observing the change in information criterion and should apply to the AR-model regardless of whether it has a unit root. (Sometimes t-values are used as criteria instead.) If an AR(1) process is assumed, all Δ y -terms can be omitted by setting both min and max lag to zero, given that e is still trusted to be approximately white noise. γ is the coefficient for the constant term and  for the linear trend.

If the series is assumed to be integrated, it can be differenced by checking the diff box. Note that trend and constant terms are not affected by this.

The null hypothesis is that the series has a unit root, so that is equal to zero, or:

H 0 : α = 0 H 1 : α < 0

Evaluation is done with t-values from the sample statistics, t α = α ^ / σ ^ α after OLS estimation. Because the t-values of the ADF estimator are non-standard both in sample and asymptotically, special tables and response surface calculations from MacKinnon 1996 are used (with permission) to calculate the p-value. The settings for the deterministic variables are in a drop-down box. The MacKinnon calculations will adjust to this (and to sample size) automatically but cannot account for other deterministic (or exogenous) variables, so MB does not allow them in the ADF test.

References

Dickey, D. A., and Fuller, W. A. (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74, 427-431

MacKinnon, James G, 1996. "Numerical Distribution Functions for Unit Root and Cointegration Tests," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 11(6), pages 601-618, Nov.-Dec.

(Programs and files: http://qed.econ.queensu.ca/pub/faculty/mackinnon/numdist/)

Settings

Start Lag/End Lag

To perform the analysis, you must specify which number of lags that will be included in the test. If you change 'Start lag' to 1 and 'End lag' to 3, it will include lags of 1, 2 and 3 in the test. It is also possible to run the test in a classical Dickey Fuller setup, without any lagged differences, by setting both min and max lag to zero.

Include

Select if you want to include this series in the analysis.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to "Change in value" (one observation) while expressing the result in levels.

Deterministic variables

Select if you want to include any deterministic variables in the equation, intercept or intercept and linear trend.

Report

The Unit Root Test automatically generates a report, which includes a variety of statistical information.

Examples

US GDP

In this example, we used the Unit Root Test on the U.S. GDP, expressed in logarithm. One version of the series has been expressed in differentials, to see how results would differ.

Questions

If I’m getting p < 0.05 does that mean series has unit root or not?

A p-value close to 1 indicates that there is likely a unit root. If there is a unit root, the series is not stationary.

A p-value closer to 0 means that we can likely reject the assumption that there is a unit root and the series is stationary.

Observations setting

What is it for?

The Observations setting allows you to adjust how inconsistencies between different calendars of observations should be treated. This is useful when time series have missing observations for certain calendar dates like weekdays or months, while other time series don’t. This harmonizing tool can be accessed via the Series list.

Settings

Generally, you can choose between one of three options:

Any series

This is the default setting. Using this setting will display points in time (i.e., dates) as long as one of the series in the document has values

All series

This setting will only display common points in time (i.e., dates) where all series have values

All points

This setting will display all points in time (i.e., dates) according to the frequency chosen.

Settings for Daily series

When Frequency is set to Daily you will see more options with different days of week combinations (i.e., Mon-Fri, Sat-Wed). Note that it's not the same as 'Highest  (Daily)' - this one has only general options.

Setting the missing value method

Depending on the observations setting you choose, a series might include points in time that do not contain values. In this case you should select a conversion method to define how missing values are treated. The options for conversion can be found under the drop-down list circled in the image.

Change Region

Purpose

The Change region feature is used to replace a series in your document by an equivalent series for another country. Applying this change in the series list is a way to instantly change the country of your analyses and charts without having to open new documents.  This can make some tasks much faster, such as creating a report with the same analyses for many countries.

Change region / Change region & duplicate

There are two options that involve changing region: Change region, and Change region and duplicate.

  • Change region - this option will replace the time series you selected with a similar one for another country. This means the original series is no longer in your document.
  • Change region & duplicate - with this option you can keep the original series in your document and add the same indicator for another country.

How to apply change?

If you want to change one series - right-click on it. If you want to change more you have to mark them with Shift/Ctrl keys.

To change whole file go to upper menu File > Change region.

Change region in...

Documents

You can apply a change of region in two different tabs in the Series list: the Series and expressions tab and the Series information tab. The two access points serve different purposes.

Series and expressions

Changing region and duplicating is the fastest way to add the same calculation for several countries. For example, say you have an expression that contains a formula, such as US current account as a percentage of its GDP. You can use change region and duplicate here to add the same calculation for Canada, without having to find the data for Canada or rewrite the formula.

Series information

In this tab, each of the series in your document is listed once, even if it’s used more than once in the series and expressions tab. Applying Change region to a series here means that it will be replaced every time it appears in your list of series and expressions. For instance, if one series is used multiple times across several expressions, you can replace it in all expressions at once by applying change region in this tab.

Documents tab

In Analytics’ Document tab you can right-click on a file and select Change region or Change region and duplicate. By using the latter, you will be able to choose the destination for a duplicated file.

Multiple documents

File > Open documents

In upper menu’s File > Open document you can select more than one document (with Ctrl or Shift key) and change region for all of them. Mark files, right-click and select Change region or Change region and duplicate.

You will then see a dialog where you can select a new region. The existing documents will be updated with series that matches the new region.

If you select Change region and duplicate, you can create new documents instead of updating existing one. Select location to save the new documents. It is also possible to select how the new documents should be named.

Presentation documents

If you go to upper menu’s File > Open document and open a Presentation document, you can use Change region and duplicate on it (and only this option). It will create new Presentation with new underlying documents in the indicated place.

Note, it will change all series in Presentation into chosen country equivalents (if they are available).

Scalar

Overview

The Scalar analysis is a tool for extracting particular values or metrics and comparing them across series. Use this analysis when you want to create a chart with categories, such as countries, along the x-axis, and columns of values on the y-axis.

The scalar analysis can perform a variety of calculations that result in one value per input series, such as the last value, the mean in a time range, or year to date performance. The output is always a category series, meaning that the time variable is replaced by a categorical variable. You can display this output in a Category chart, Bar chart, or Category scatter chart.

Settings

Input series

This is a list of the series in your document that you can include in the scalar analysis. That a series is ‘included’ means that the added calculations will be performed on it, and the resulting values will be included in the output series. You can also select whether new series that you add to your document should be automatically included in the calculations.

The order of the series in this list determines the order of the values in the output series. You can adjust the order by clicking and dragging series, or sort them alphabetically by clicking 'sort.' Sorting is done by region followed by maturity length, price type and, if all else equal or not available, alphabetically by title.

Automatic attributes for Value labels

By default, Value Labels are generated using the non-common elements of the series descriptions. This works well for series using harmonized descriptions.

In certain cases, you might end up with very long value labels. To avoid this, this setting allows you to pick from a list of attributes you want to display to automatically generate the value labels.

Calculations

Here, you can add one or more calculations that will be performed on all selected input series. The available calculations are:

Open, Close, High, Low

The first, highest, lowest, or last value of the specified range.

Mean, Median

The mean or median of the range.

Last

The last valid value of each series.

Last common

The value at the last point in time at which all the included series have values.

Last non-forecast

The last value that is not a forecast in the series.  

Value at

The value at a specific point in time. If a series is missing a value for that date, the first available value before that date will be used.

Nth last value

The nth last value of a series, where a value of 1 gives the last value, 2 gives the second value to last, 3 gives the third value to last etc. 

Year, Quarter, Month, Week to date

The performance from the start of the period to the specified date. The performance is measured as the change compared to the last value of the previous period.*

Performance since

The performance between two specified dates. The performance is measured as the change compared to the last value of the previous period.*

Performance analysis works a bit different than performance calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value, while in Performance it gives an error if the specified start date is missing. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Years, Quarters, Months, Weeks back

The change from a selected number and type of periods before the specified date.*

For years and quarters, this is the same as using the 'Rate of change since' method and specifying the start of the range as '-1y' or '-1q'.

Rate of change since

The rate of change between two points in time.*

Rate of change analysis works a bit different than Rate of change since calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Percentage proportion

The percentage proportion of each series compared to the sum of all series at a specified point in time.

Standard deviation

The standard deviation of the range.

Percentile

The specified percentile of the selected range.

Lower, Upper tail mean

The mean of the values in the upper or lower percentile of the range.

Trimmed mean

The mean of the middle values as specified by the percentage.

Standardize

The mean divided by the standard deviation of the range.

Note that formula Standardize() won't give same outcome. In formula we standardize the series (value - mean)/stddev for each value. While in Scalar we calculate a standardized value for the whole series (or a specified interval) according to mean/stddev.

Settings for calculation methods

*Relative dates

Most scalar calculations require either a point in time or a time interval to be specified. You can use specific dates, but you may want the dates to update when new data is added. In that case, leaving the date box blank or using relative dates, such as '-1y', can be useful. It’s important to understand what default dates are chosen when none is specified, and how relative dates work in each context.

Point in time

First, we’ll talk about calculations that require only one point in time, such as value at. If the point in time box is left blank, the last valid value for each series will be used.

If you specify a relative date here, that date will be relative to the last calendar date, not relative to the last date for each series. If you would like the last calendar date to be used, even though not all series may have values, you should use the relative date '+0'.

Time intervals

If you leave the range start blank, the first available value for each series is used. If you leave the range end blank, the last available value for each series is used.

When you use a relative date for the range start and leave the range end blank, the end point will be the last valid value for each series and the starting point for each series will be relative to its last point, not the last calendar date.

If you use relative dates for both the range start and the range end, they will both be relative to the last calendar date.

Value labels

These are the categories of the output series produced. A tip for when you want to know what your chart will look like is to look at these categories listed as value labels. They are also the labels that will appear on the x-axis of your category chart, or the right side of your bar chart.

Output series

There are four possible ways of organizing your output. The one you should choose depends on:

  1. Whether you want to group your input series, and
  2. What categories you would like on the x-axis


These four options can be divided into two types based on whether or not you would like to group the input series.

One series per calculation & one series per input

Choose one of these two settings when you do not want to group your input series. The categories on the x-axis, then, are either the series names or scalar calculations.

  • One series per calculation: Use this setting when you want the input series names on the x-axis. It creates one category series per scalar calculation done, where the categories are the input series. Example:


  • One series per input: Use this setting when you would like the x-axis to contain the names of the calculations you’ve done in scalar. It produces one output series per input series, where the categories are the calculations done. Example:

New group after every n series & Partition into n series

Choose one of these settings when you do want to group the input series by some series descriptor, such as country. Switching between the two settings will switch which series descriptor is the category on the x-axis (value label), as illustrated by the example below.

  • New group after every

  • Partition into

The output of this setting also depends on the order of the series in the input series list. Pay attention to the group number that appears next to the series.

Series with the same group number make up the same output series. The application creates the output categories based on the descriptors that are not common within these groups.

Methods

Rates of change as valuepercentage or logarithmic are calculated in the following way:

  • value = y t y t n
  • percentage = 100 y t y t n | y t n |
  • logarithmic = 100 ln y t y t n
  • annualRateValue = c h i = 1 h z t + 1 i
  • annualRatePercent = 100 z t z t h c h 1

where c is the typical number of observations in one year.

Examples

One series per input

In this example, we calculated the average GDP growth per decade by adding one 'Mean' calculation per decade. We also used the setting 'One series per input'. This means that one series will be created for each input series that we use. We have 6 countries, so 6 category series will be created, one per country.

One series per calculation

We used the scalar analysis to produce a single category series for the YTD performance. Here, 'One series per calculation' means that one category series will be created per calculation applied.

Partition into & New group after every

We have taken last value for number of Females and Males in Labor Force. In first example series were sorted by the total number of persons in each country - chart displays Female/Male division.
Second example shows total number of Females and Males in Labor Force for all countries with division by country.

Highlighting series

See how to use formula to highlight chosen series or series based on a conditions.

Questions

How do I sort category series after Scalar?

You should use the Sorting analysis.  

With multiple category series, make sure that after having set the direction for the main series, you also set the direction of the remaining series using 'by [series name]', so that they follow the main reference-direction. 

How to show date(s) of observation?

When using 'Last common' or 'Value at' calculation method you can select metadata {s .ObservationDate}

Example:

What is the difference between the Rate of change analysis and selecting ‘Rate of change since’ when doing a Scalar analysis?

  • Rate of change analysis calculates the changes from the end of each time series while
  • 'Rate of change since' in Scalar analysis calculates it from the end of the whole calendar. Meaning that if some series do not end at the same observation date, the calculation range will differ.

You can set the 'Range Start' and 'Range End' in the Scalar analysis to make sure the calculation is done on the same range across all input series.

Example:

Why I can see 'Strict' option in one document and can't in other?

Since version 1.29 the 'Strict' option is not available in new documents as it is always turned on for calculation. File where you can see that option was created in an older version of Macrobond.

Vector autoregression

Overview

The Vector autoregression analysis (VAR) estimates the linear dependencies among a few series. The analysis can produce fitted values and forecasts for those series. In addition to estimating a given system, you can also automatically test different models and let the analysis pick the best one based on information criteria. The VAR analysis also allows for modelling of cointegrated variables. By calculating VECM you can estimate the speed at which a dependent variable returns to equilibrium after a change in other variables. Finally, the VAR analysis has a feature for calculating impulse response, the response of one variable to an impulse in another.

Estimation model

The main difference from regression analysis is that in VAR you have several dependent variables instead of one. A VAR can be thought of as a system of linear regressions, but the emphasis is on using lagged values of the dependent variables to model a set of variables. There is an equation for each variable that explains its evolution based on its own lags and the lags of other variables in the model.

The analysis yields a report that contains the estimated parameters of the system as well as several statistics that can be used as a test of the system's validity and stability. The estimation is made using all common valid observations for the model series in the selected estimation.

In the analysis, the dependent variables are called endogenous variables. There may also be exogenous variables. Such variables are only explanatory and are not modelled in the system. A model may be denoted as being of order p, called VAR(p), containing K endogenous variables. If there are 2 variables in a VAR (1) model, the system of equations can be written as:

y t = v + A y t - 1 + u t

The expression can be written in expanded form as:

y 1 , t y 2 , t = v 1 v 2 + a 1 1 a 1 2 a 2 1 a 2 2 y 1 , t - 1 y 2 , t - 1 + u 1 , t u 2 , t

The equations can thus be explicitly written as:

y 1 , t = v 1 + a 1 1 y 1 , t - 1 + a 1 2 y 2 , t - 1 + u 1 , t y 2 , t = v 2 + a 2 1 y 1 , t - 1 + a 2 2 y 2 , t - 1 + u 2 , t

The present value of y depends on the intercept v, the lagged value of itself and the other variable, and the error term u. Each error term is supposed to be uncorrelated with all lags of itself and lags of the other error terms.

An arbitrary number of successive forecasts can be calculated, and you must specify an end date for the forecast calculation.

When a system contains exogenous variables, assume that these are included in the vector x together with their lags and possibly including lag 0 (contemporaneous variables) so that x contains s elements. The system of equations for a model called VARX (p, s) can then be written as:

y t = v + i = 1 p A i y t - 1 + j = 1 s B j x j + u t

When there are exogenous variables, forecasts can only be calculated as long as there is data available for all the exogenous variables. You might want to add forecasts to these variables before they are passed on to the VAR analysis.

For a symmetric system, where each equation contains the same explanatory variables and lags, OLS (ordinary least squares) is used as the estimation method. For asymmetric systems, GLS (generalized least squares) is used, which requires an iterative procedure. This is more computationally intense, and the system might not converge fast enough to find a solution for large systems.

Impulse Response

In order to examine a VAR system, an Impulse Response (IR) can be calculated between two given variables. While an econometrician may assume a system with several variables describes some economic relationship, it can still be interesting to isolate two of them and explore their particular dynamics, in one particular direction.

IR calculates the response of one variable to an impulse in another for some period later in time. IR has also been called dynamic multipliers, because the simplest way to compute them is to multiply the reduced form VAR-matrix by itself i times for a horizon i. The effect of past values for all coefficients in the system are used in the calculation, but we only look at the number for the accumulated effect that one variable has on another.

A way of understanding this is that by substituting the errors with a one where we investigate the impulse and zero everywhere else, an econometrician can trace this unit shock to the given variable at each time period.

IR calculation only makes sense between endogenous variables. The Macrobond application presently allows only unit residual IR. The Macrobond user can investigate the residuals prior to interpreting the IR results. For simplicity, the IR is represented as a time series, with values starting at the forecast date. The response function has the same unit as response variable, usually expressed in percentage points. The IR function outputted by VAR is in reduced or regular form.

Note we don't have the functionality to calculate error bars for the impulse response.

Cholesky's method for impulse response

A popular approach to IR has been Cholesky orthogonalization, where the model is first transformed by multiplying it with the Cholesky factor of the residual-covariance matrix, so that responses to orthogonal impulses are attained. This is a good idea as far as respecting the idea of isolated effects, but also requires the residuals to have finite variance (Hannsgen, 2010).

When using the Cholesky method the unit (y-axis) of the impulse response is the same as the unit of the response variable.

Ordering matters, however not in VECM (where you can drag series up & down) but on Series list.

VECM

The VAR analysis also allows for modelling of cointegrated variables. With this assumption, the variables on differenced form are explained by vectors on level form in addition to the usual VAR form. The rationale for this is that some variables co-move in the long run by the force of some linear process, while having other dynamics in the short run.

This is often illustrated by the 'drunk and his dog'. The drunk is walking home with great difficulty and often gets lost, but eventually makes it home. The dog is running around but only so far away from its owner, and they eventually both make it home. So, in the long run the two are always moving together, despite the fact that the walk is quite random and unrelated in the short run. Economic interpretations are plenty and allow for many relations at the same time for example long and short run interest rates, consumption and price level and consumption and investment.

The whole system is specified in Granger representation as:

Δ y t = Π y t - 1 + Γ 1 Δ y t - 1 + Γ 2 Δ y t - 2 + + Γ p Δ y t - p + u t x . y z

where:  Π = α β T  is the low rank matrix where the cointegrating relations β are loaded onto each equation by α. The dimension of these matrices r is the cointegrating rank of the system. The “VAR part” contains the short-term variables and the coefficients Γ i are known as adjustment coefficients. This allows the model to capture some non-linearity that the regular VAR would miss.

The vectors  y t - 1 Π are the error correcting vectors. They are not errors as in residuals, but refers to the long run effects compensating for what is not captured in the short run. If rank should be zero so that Pi = 0, the model is theoretically reduced to a VAR on change form, i.e., all variables differenced once. If Pi is full rank, the system reduces to a VAR on levels form. (Note that both of these cases tell us nothing about stability of the dynamic system which it represents.) Since these are not proper VEC-models the Macrobond application does not allow them and throws an exception if you try to model it (this includes the cases of automatic rank identification). Hence getting the VEC model right can be cumbersome, and it is not necessarily superiority to a regular VAR.

The rank statistics are determined by MHM (Mackinnon-Haug-Michelis, 1999) critical values. Each rank level of the matrix ΠΠ is tested. There are two options:

  1. The trace test at level r has a hypothesis so that  H 0 : rank = r and  H 1 : rank = k
  2. The maximum eigenvalue test at level r, on the other hand, tests the null hypothesis  H 0 : rank = r against  H 1 :rank = r + 1

With the Macrobond application two approaches to VECM can be used: Johansen and Ahn-Rensel-Saikkonen. Johansen’s (1986) approach is solved for in a maximum likelihood scheme. It starts by identifying the cointegrating vectors. These are then subtracted from the original dependent variables. This reduces the system to a regular VAR which is solved using OLS. It is the best known and perhaps the most widely used VEC-model. Since the rank reduction of the matrix  Π   is done by means of eigendecomposition, this model may be viewed as employing noise reduction to clean up   Π .

Ahn-Rensel-Saikkonen is based on least squares regression. It starts by estimating the whole system x.yz so that   Π is full rank. Afterwards the proper rank reduction is made, decomposing   Π to a lower rank, but without changing the short run coefficients   Γ i . It is suggested in Brüggeman & Lütkepohl (2004) that this approach is more robust in small samples than Johansen.

Report

The VAR analysis automatically generates a report, which includes variety of statistical information.

Settings

Estimation sample range

Specify the limits of the estimation sample range. The default range will be the largest range where there is data for all the series.

Output residuals

When this option is selected a time series containing the residuals will be calculated.

Output the endogenous series

Select this option to include the endogenous series in the output.

Output the exogenous series

Select this option to include the exogenous series in the output.

Calculate impulse response

Select this option in order to calculate the impulse response of the specified length. Select in what equation the impulse should be applied and what variable.

Method

Unit - unit residual impulse response

Cholesky - Cholesky’s method for impulse response

Confidence band

Confidence bands for forecasts of each equation are computed using the VAR estimator covariance matrix. Since the VAR is ideally a stable linear dynamic system, the forecasted values are dynamically generated. This means that they converge toward some mean (zero if normalized). Therefore, the error bands must also converge to a constant value, an upper and lower bound, respectively. Because not much is known about the small sample properties about the Feasible Generalized Least Squares estimator used in the VAR, only asymptotic errors are computed. This makes the estimated error terms less reliable in estimations from short time series. It can be shown that the estimator variance of the FGLS is lower than or at least equal to that of standard OLS.

Autocorrelation test lags

Select this option in order to include a Portmanteau autocorrelation test in the report. Specify the number of lags to include. The number of lags should be larger than the highest number of lags of the endogenous or exogenous variables.

Max endogenous lags

Specify the maximum number of lags to include for the endogenous variables. You can further refine which lags to include in the model on the 'Lag settings for endogenous variables in the equations' tab.

Max exogenous lags

Specify the maximum number of lags to include for the exogenous variables. You can further refine which lags to include in the model on the 'Lag settings for exogenous variables in the equations' tab.

Find best model based on max endogenous lags for information criteria

Select this option to let the system automatically test what combination of symmetric lags are optimal based on the selected information criteria.

You can select the minimum and maximum number of lags of the endogenous variables to test and also the minimum and maximum of different lags (regressors) to include in each round of tests.

Select the setting 'Require stable process' in order to disqualify any model where the roots of the characteristic equation indicate that the model is not stable.

Type

Select if a series should be included in the model as an endogenous variable, exogenous variable.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to "Change in value" (one observation) while expressing the result in levels.

Intercept

Select if the intercept should be included in the model for endogenous variables. This option is not available per variable for VECM.

Restrict to CE

In VECM, both trend and intercept can be restricted to the cointegrating relations. This means that they are treated as deterministic variables within Π , on level form. These occur either on level or change form, never both. The variables for intercept and trend are added by adding VECM in the configuration box.

Equation name

Optionally specify the name of the equation to be used in the report.

Variable name

Optionally specify the name of the equation to be used in the report.

VECM

Enables VECM.

Configuration

  • Select method, Johansen or Ahn-Rensel-Saikkonen
  • Include intercept adds an intercept variable
  • Include linear trend adds a trend variable

Automatic cointegration test

Select whether to automatically find best cointegration rank or to enter it manually. The settings for automatic rank selection are described in the section Estimation model.

Examples

Vector autoregression

A model of five endogenous variables is defined in the Vector autoregression analysis. It is set to calculate a forecast for 1 month ahead. The model is a using three lags for each variable which is called a VAR (3) model.

VECM European rates

The endogenous series are swap rates of maturities 10 years and 5 years for Sweden, Denmark, and the Euro area. The short run dynamics of these data are known to be dominated by simultaneous highly correlated shifts of all rates. The high correlation of short-term movements is explained by stable relation of levels of rates; the slope of yield curves.

IFR Example

Questions

How to add and use dummy variable in VAR model?

Create binary series (0/1 series) using conditions in the formula language or in-house series. Series should be added as an exogenous variable.

Examples of dummy variables: 

quarter()=1 & year()=2020|quarter()=3 & year()=2020

Returns 1 if the observation is Q1 or Q3 for year 2020, 0 otherwise. Each quarter must have '& year()=2020' parameter, otherwise it will point to quarters in each year.

Counter()>=Date(2020, 1, 1)&Counter()<=Date(2022, 1, 1)

Returns 1 if the observation is between those dates, 0 otherwise.

Cop(usgdp, yearlength())<0

Returns 1 if the US GDP y/y growth rate is negative, 0 otherwise.