Principal components analysis

Overview

The Principal components analysis (PCA) allows you to calculate a set of linearly uncorrelated series, or components, from a set of possibly correlated series. As a dimension-reduction technique, PCA helps you reduce a set of series to a smaller set of series containing most of the information of the large set.

We provide standard implementation of this analysis. The component series are calculated using an orthogonal transformation so that the first series captures the highest possible variance of the original set. Each successive series captures the highest possible remaining variance under the constraint that it is orthogonal to the preceding series. The analysis also outputs the eigenvectors and the eigenvalues.

Settings

General

Do not include series used in calculations in the output

When checked, any series included in the calculation will be excluded from the output. Uncheck this setting if you want both the original series and the calculation result in the output.

Include new series automatically

When checked, any new series added to the Series list will automatically be included in the calculation.

Use legacy format

Checking this option will enable legacy output meaning the analysis won't group outcome into lists. It will produce separate series for each of the components of each model. Please note that by enabling this option all following analyses will lose their settings.

Select method for creating matrix

Use correlation (normalize input)

The eigenvectors will be calculated from the correlation matrix. This means that the input is centered and normalized before the components are calculated. PCA is sensitive to the scale of the input. Use this setting if variables are of different units, e.g., currencies and indices.

Use covariance

The eigenvectors will be calculated from the covariance matrix. This means that the input is only centered before the components are calculated. Remember that if you choose covariance, the input is not normalized, and the analysis will be sensitive to the scale of the input.

Select series

Number of components

Here the number of component series is defined. These are the principal components that will be calculated and included in the output. This number of components cannot be greater than the number of series included in the analysis.

The components are sorted in order of how much variance of the original data set that they capture. If you select 'Greatest' you will get the most significant series and selecting 'Smallest' will yield the least significant series.

Output series description

Specify the description of the output series or use the default description.

Include

Select what series to include in the calculation.

Output PCA elements

Eigenvectors/Matrix

This is a Matrix renamed to 'Eigenvectors'.

The matrix contains the eigenvectors of the correlation or covariance matrix. These vectors are orthonormal.

Eigenvalues/Category chart & table

This is a Category chart and Category table renamed to 'Eigenvalues chart' and 'Eigenvalues table'.

The analysis yields two category series, one with the eigenvalues and one with the cumulative proportion of the eigenvalues. The latter can be interpreted as how much of the original variance that is captured by that principal component together with all preceding components.

Principal components/Time chart & table

The 'Number of components' setting specifies how many component series should be calculated. The series are either the most or least significant components. In the time series space, the components are projected as the eigenvectors scaled so that the variance is the same as the corresponding eigenvalue.

Projection is internal product of the PC vector with time series. By determining the eigenvectors of the covariance matrix corresponding to successive eigenvalues, we obtain the coefficients of the linear combinations that form the new principal components.

The eigenvectors only specify a direction, and not any magnitude. So to be able to decide a magnitude of the component series, the common approach is to scale the resulting series so that the variance is the same value as the corresponding eigenvalue.

Example

Principal component UK Swap rates

The three main principal components of changes in the UK swap rates are identified using PCA.

Questions

I've checked 'Use correlation (normalize input)' - is it possible to show this normalized series?

The normalization is done on the matrix level, which corresponds to normalizing the series. We never explicitly calculate the normalized series, so it’s not possible to plot it.

I have many different sets of data, but the matrix tab only shows the output for one.

The matrix is built only for the first model in the sequence. So, to see all matrices, you'd need to create five different PCAs.

What happens when series have different lengths?

The calculation is made in the interval where there is data in all series.

 

Analyzing data – overview

Once you've found and selected the data to work with, the next step is doing something with it in the Analytics tab. This is where you’ll be executing calculations, linking your workflow with external resources or tools, and applying customizable chart styles and tables.

Analytics

The workflow in Analytics is organized from the left to the right.

On the left side, you have the Series Browser. Use this to select and add time series to the document. In the middle, you have the Analysis Tree this is where you work with the data you have added. At the end on the right, you have the Presentation area where your work is visualized, either as a table or chart .

Series browser

Databases

Let's start with the Series browser, where you can select and view all available databases in Macrobond. At the top you have the Country & Region database, listed by countries and then by categories within. Country & Region database is primarily comprised of local and national sources of data, collected by Macrobond. After this there's Source & Release database which stores all data presented by the sources. Finally, we have Concept & Category where data is presented by concept.  As you scroll down further, you’ll see third-party databases with wide array of data aggregators.

Structure

As you may notice, the data is organized using a tree structure, made up of primary and sub- categories of data. When you select the sub-category, the application will generate a list of the available series for that category below.

Series browser is the same mechanism as in the Browse activity tab, for more about this see Browse – overview. You can also use Search field - here you can exclude and include relevant metadata and sort it in order of relevance. For more see Search – overview .

Series list

Opening the Series list allows you to review the series and, if needed, adjust different parameters. The available parameters are frequency, observations, currency, data range, conversions settings (including missing values) and forecasts. For more see Using the Series list.

Analysis tree

Add analysis

Once you've made the necessary adjustments in the Series list, you can go ahead and add analyses. For that you'll use the Add function which allows you to select from a range of different analyses and presentations. Every calculation or presentation that you select is then added as a new branch in the tree structure, which also means it'll define the output of the branches below it.

Insert analysis

While the add button always results in a new branch being added to the end of the tree, using Insert allows you to choose where in the tree you would like to insert a calculation.

Undo the changes

Easily undo changes with the standard Ctrl+Z key command or select Undo from the Edit menu.

Using the Analysis tree is a powerful tool to help you see and understand the effect of each new analysis on the sequence of calculations. For more see Analysis tree.

Analyses

We have multiple types of analysis that can assist you with your work and not only simplify but speed up the process of modifying data and creating charts. For more see Analyses.

Formula language

You can perform various calculations using the formula language and things that are not possible with built-in analyses. Use this if you want to have more flexibility to customize the calculations you apply. The formula language can either be applied to the start of the tree or to a particular section. For more see Formula.

Cross sampling

Overview

The Cross sampling analysis is a tool for extracting particular values or metrics and comparing them across series. Use this analysis when you want to create a chart with categories, such as countries, along the x-axis, and columns of values on the y-axis. The Cross sampling analysis is a successor to the Scalar analysis with a new workflow that is especially tailored for lists.

The lists are pre-prepared data sets on which you can easily operate in Cross sampling (and in other analyses). To change data, you just go back to Series list and replace or add a series without rebuilding whole analysis. You can also create and share lists through My list feature.

The Cross sampling analysis can perform a variety of calculations that result in one value per input series, such as the last value, the mean in a time range, or year to date performance. The output is always a category series, meaning that the time variable is replaced by a categorical variable. You can display this output in a Category chart, Bar chart, or Category scatter chart.

Working with Cross sampling analysis

Input - Lists

All input data in this analysis go in Lists - data sets defined under Series list's tab. For more information see: Lists of series. You can create and share lists with your colleagues through My list feature.

Settings

The settings consist of four parts. In the screenshot below, there are two lists and two series that are not part of any list to the left. On the right-hand side, the two lists have been added and can be seen as two columns.

  1. Select what calculations to apply to each series. Details about the calculations can be found below.
  2. Select how the output should be generated.
  3. This is the list of all input series and lists.
  4. Here you organize the series that should be used for the output.

Calculations

Here, you can add one or more calculations that will be performed on all selected input series. The available calculations are:

Open, Close, High, Low

The first, highest, lowest, or last value of the specified range.

Mean, Median

The mean or median of the range.

Last

The last valid value of each series.

Last common

The value at the last point in time at which all the included series have values.

Last non-forecast

The last value that is not a forecast in the series.

Value at

The value at a specific point in time. If a series is missing a value for that date, the first available value before that date will be used.

Nth last value

The nth last value of a series, where a value of 1 gives the last value, 2 gives the second value to last, 3 gives the third value to last etc.

Year, Quarter, Month, Week to date

The performance from the start of the period to the specified date. The performance is measured as the change compared to the last value of the previous period.*

Performance since

The performance between two specified dates. The performance is measured as the change compared to the last value of the previous period.*

Performance analysis works a bit different than performance calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value, while in Performance it gives an error if the specified start date is missing. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Years, Quarters, Months, Weeks back

The change from a selected number and type of periods before the specified date.*

For years and quarters, this is the same as using the 'Rate of change since' method and specifying the start of the range as '-1y' or '-1q'.

Rate of change since

The rate of change between two points in time.*

Rates of change as value, percentage or logarithmic are calculated in the following way:

  • value = y t y t n
  • percentage = 100 y t y t n | y t n |
  • logarithmic = 100 ln y t y t n
  • annualRateValue = c h i = 1 h z t + 1 i
  • annualRatePercent = 100 z t z t h c h 1

where c is the typical number of observations in one year.

Rate of change analysis works a bit different than Rate of change since calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value. You can use 'Strict' box to select the date.

Note that since version 1.29 the ‘Strict’ option is removed in new documents as it is always turned on for calculation.

Percentage proportion

The percentage proportion of each series compared to the sum of all series at a specified point in time.

Standard deviation

The standard deviation of the range.

Percentile

The specified percentile of the selected range.

Lower, Upper tail mean

The mean of the values in the upper or lower percentile of the range.

Trimmed mean

The mean of the middle values as specified by the percentage.

Standardize

The mean divided by the standard deviation of the range.

Note that formula Standardize() won't give same outcome. In formula we standardize the series (value - mean)/stddev for each value. While in Cross sampling we calculate a standardized value for the whole series (or a specified interval) according to mean/stddev.

Settings for calculation methods
*Relative dates

Most Cross sampling calculations require either a point in time or a time interval to be specified. You can use specific dates, but you may want the dates to update when new data is added. In that case, leaving the date box blank or using relative dates, such as '-1y' can be useful. It’s important to understand what default dates are chosen when none is specified, and how relative dates work in each context.

Point in time

First, we’ll talk about calculations that require only one point in time, such as value at. If the point in time box is left blank, the last valid value for each series will be used.

If you specify a relative date here, that date will be relative to the last calendar date, not relative to the last date for each series. If you would like the last calendar date to be used, even though not all series may have values, you should use the relative date '+0'.

Time intervals

If you leave the range start blank, the first available value for each series is used. If you leave the range end blank, the last available value for each series is used.

When you use a relative date for the range start and leave the range end blank, the end point will be the last valid value for each series and the starting point for each series will be relative to its last point, not the last calendar date.

If you use relative dates for both the range start and the range end, they will both be relative to the last calendar date.

Output mode

You can select one of two ways to create the output series.

One series per calculation

For each calculation defined, there will be one series per column defined in the organization pane. In this mode you can select what metadata to use for generating the labels by selecting Label generation:

Based on the example in the screenshot above, this means that there will be one series with the GDP values and one series with the unemployment numbers.

One series per input

In this case there will be one output series with all the defined calculations for each input series. For example, if you have an input like this:

you will get a category chart like this:

Organizing output

The analysis is tailored for lists of series. You organize the output by selecting a list of series on the left-hand side. In most cases, you drag a list over to the right-hand side, which will add the list to the output. If you want to change/add series, you need to do this on List in Series list.

Series will be paired automatically by sub-region metadata.
Note you can only place lists side-by-side if the series belong to the same family or at least one of them is a list by region. The entries in the lists will be automatically aligned.

Groups

You can separate each data set in analysis by creating separate Group tabs. The effect will be same as if you added lists to one Group, but it might be easier to keep track with separate Groups.

Individual series

You can also add with drag and drop individual series (or group of them). Mark series and on the right navigate so you would see bolder horizontal line - then you can drop series and it will be added to the group.

Note, when using series from a list you cannot drag individual series like this. Instead please go to Lists tab and add series there.

Order

The order will be determined by one of the columns that is based on a list. You can select which column decides the order by clicking on the button in the column header.

Missing input

If any of the lists have missing series, a red background will appear. A series may be missing if no series has been entered in the list for that position in a plain list or if two lists by region do not have the same set of regions.

You have a few options how to handle such missing input and you can select the strategy in the setting called 'Missing inputs'.

Replacing individual series

You can replace individual series by dragging a series to a position in the table to the right. This can be used for doing an exception in a list or for filling in missing entry when you do not want to change the underlying list.

Any series in a list that has been replaced, will have a yellow background and a button for reverting the change.

How to create simple chart with Cross sampling and Lists?

  1. Copy/Cut series you want to use.
  2. Go to Series list > Lists tab, use 'Add new by region list' and paste series.
  3. Add list to Series list.
  4. Add Cross sampling analysis, select calculation and Output mode.
  5. Drag list to from 'Series' to 'Group'.
  6. Add chart or table.

How to have two different colors for one List?

If you want to add a second color to a chart you can wrap selected series with flagforecast() and use separate color setting to introduce another color. But if you have a List this cannot be done. See below steps with solution. Note that the partition of series into different colors doesn't have to be even.

  1. Add a random extra series or constant in the Series list that you do not need. In our example,
    we add a '0'. This series will be used only to construct groups of series and will be deleted at the end. In Cross Sampling, firstly add this extra series as a new column. Then add a calculation 'Last'
    so we have values generated.
  2. Drag and drop the series for the first part of series under the 0 as we do below. Make sure that
    you see a 'special line' below 0 as in the video so that the series stack up below 0.
  3. Add 0 as a new column again so we can create the second part. Drag and drop the second part series below the last red box.
  4. In Group switch 'Missing input' from 'Error' to 'Missing'.
  5. Go into series list and delete '0' (or extra series you have added). This removes the 0 from the Cross
    Sampling and we are left with only the series we want.
  6. Add a Category Chart. To get a column look, go into Graph layout (Ctrl+L), and then change Graph type to Stacked Column.

If you want to sort the series, add the entire List as a new column. Now let’s add a Sorting analysis. Expand the 'Last.' Then sort the earlier established 'List' and give it a direction. Then sort all the other series by that List. However, in Graph layout only use the first and second group of series, and not the list. The list is only needed to sort all the series at once both parts have missing series.

For a ready-file see Different colors for two or more groups inside one list.

Using Transpose analysis after Cross sampling

With Transpose analysis you can change data's place from x-axis to y-axis and vice versa without rearranging data or rebuilding Cross sampling. See Transpose for examples.

Examples

Single series column with average line

In this example, we calculated average of GDP values for several countries and plotted it together with those values.

Cross sampling analysis on three indicators

In this document we worked on lists containing city level series for three different indicators, which then were combined in one bubble chart to compare values.

Conditional formatting rules

We applied conditional formatting rules to the Bar chart table created with Cross sampling analysis with use of lists.

Highlighting series

See how to use formula to highlight chosen series or series based on a conditions.

Different colors for two or more groups inside one list

See how to divide series from one list into two (or more) groups with different colors for each.

Two by two stacked columns

Create two stacked columns with different calculations for each of two countries using Cross sampling and Transpose.

Questions

How to add a single series column (constant series)?

Sometimes you want to create a group with constant series. For example, here a sum of the GDP series has been created and this series has been added by selecting it ono the left and pressing the 'Add selected series as new single series column.' See the file under Examples: Single series column with average line.

If new series are added to the lists, the single series column will be extended to include more rows of the same series.

How to show date(s) of observation?

When using 'Last common' or 'Value at' calculation method you can select metadata {s .ObservationDate}

Example:

What is the difference between 'Add selected series as new column' and 'Add selected series as new single series column'?

If you are working with series not arranged in a list, you should use 'Add selected series as new column.'

If you want to have one series as a whole column select 'Add selected series as new single series column.'

What is the difference between the Rate of change analysis and selecting ‘Rate of change since’ when doing a Cross sampling analysis?

  • Rate of change analysis calculates the changes from the end of each time series while
  • 'Rate of change since' in Scalar/Cross sampling analysis calculates it from the end of the whole calendar. Meaning that if some series do not end at the same observation date, the calculation range will differ.

You can set the 'Range Start' and 'Range End' in the Scalar/Cross sampling analysis to make sure the calculation is done on the same range across all input series.

Example:

Why annualization in Rate of change analysis works differently than in Cross sampling analysis?

Rate of change is by default set to 'Mode: Fixed period'. There is also another mode there - 'Calendar date' - which is helpful when working with a Daily series or for when using Annual rate. Annualization is done differently when you select 'Calendar mode' since Macrobond then use the actual length of the period to do annualization. If you switch it to 'Calendar mode' you will get same value in both analyses.

Why I can see 'Strict' option in one document and can't in other?

Since version 1.29 the 'Strict' option is not available in new documents as it is always turned on for calculation. File where you can see that option was created in an older version of Macrobond.

Lists of series

Overview

This is a great way to organize large sets of series. You create lists in documents in the tab called Lists in the Series list. Lists are great to work with in Cross sampling or Formula analysis. For example, in the latter analysis you can use just one formula to calculate on each of the series from list.

You can also create and share lists with My lists tab.

Creating list

To create a list first please select a type (region/plain). For more information about them see Types of lists.

The easiest way to add series to a list is to copy one or more series from the Data browser and paste it into the list.

You can also create list in My lists tab.

Moving list to an account

If you have created list on Series list then you can move it to an account to use it elsewhere or share it  with your colleagues. Go to Lists tab, right-click on a list and select Move list. List will be removed from Lists tab, moved to a designated account, and added under Expression on Series list.

Working with lists

Single list

Here are a few things you can do with a single list:

  • You can use the name of a list in a formula expression and the calculation will be applied on all the series of the list. From the example in the screenshot, you can use the list called 'cpicomp' in a formula expression like this:
    cop(list:doc:cpicomp,1)

    Not only is this easier to write and gives you better overview, but it also works nicely if you add more series to your list.

  • You can change the order of the series in a list, and it will be retained and will not break any calculations.
  • In most analyses, like Rate of change, you can set parameters for the whole list.
  • In most analyses, you can make exceptions on series level:

Combining lists

Lists can also be combined. This works in formulas and in the new Cross sampling analysis.

Combining lists works if either the lists have a relationship (parent/child lists) or if they are organized by region using the type of list called 'list by region'.

Here is an example where several 'lists by region' have been created:

You can then use a formula expression like

list:doc:gdp/list:doc:pop

The calculation will be applied pairwise to the series in the lists and the result is a new list.

My lists tab

You can use My lists tab feature to create and share your lists. For more information see My lists tab.

Missing input

You can choose how Macrobond should treat missing series in lists:

  • Error - missing series in lists are treated as error.
  • Exclude - missing series in lists are excluded from output.

Types of lists

There are three types of lists, one is designed as a country-code list, while other two can be added from Lists on Series list. You decide the type when you create the list by pressing either the 'Add new by region list' or 'Add new plain list'.

List of regions

A list of unique regions (no series code is used here). It can be used to perform searches

or as a parent list for 'by region' lists. It cannot be used on its own in Analytics tab's Series list - it will automatically add child list.

Child list

Child list created under 'Of regions' will be automatically a 'list by region' type.

List by region

All series in a list by region are associated with a region. The region is determined by the selection in the Region column and will be populated automatically when a series is added provided that it has an associated region.

There can only be one series per region, and you cannot have two series where one is associated with a sub-region of the other. Thus, you cannot have a series for U.S. and one for California in the same list.

The benefit of this type of list is:

  • When combining lists, in a formula expression or in the Cross sampling analysis, the regions can be used to match the series in one list with series in another list. Lists by region can be combined even if they do not contain exactly the same set of regions.
  • The descriptions of the items in the list will be the name of the region, which is known to be unique, and you get short and distinct labels.
  • You get help avoiding mistakes of adding duplicates or the wrong series.

Child list

This type of list can be combined with all other lists of this type.

Concept for Region and child list

Concepts which connects series in Concept & Category data-tree view can be used to create Region list and its child list.

Region list

Use 'Add new by region list' and select Concept. After that start selecting countries/regions in Region column. Macrobond will display the right series based on the Concept group.

Child list

You can easily create lists with same regions, but other Concepts defined by Macrobond. This is done by creating a child list from a List by regions. If the parent list is changed, the child list will be automatically updated.

Plain list

A plain list is not restricted to have just one series per region. You can have any mix of series.

Child list

If you want to combine two plain lists - they must belong to the same family. You create a family of lists by adding child lists. A child list will have a corresponding set of series as the parent and in the same order.


The parent list and its children form a family and can be combined in formulas and in the Cross sampling analysis. The two series in the screenshot above can be used in an expression like

join(list:doc:index, list:doc:forecast)

Vintage data (Revision history)

This feature requires Macrobond Data+ license. 

Overview

This feature requires Macrobond Data+ license. 

If you have Data+ license you can download vintage versions of time series with Python/R API or through Macrobond main-app. Note that not all time series may have vintage data or they do not have data, from the period you want.

Series with and without vintage data

For various reasons (i.e., contract agreements, source not keeping historical information) not all series have revisions. Those who has will have next to them a three-violet-lines sign and when you hover over the sign you will see a flag with 'This series has revision history available' confirmation and a dates for first and last revision.

Viewing available revision history

In data-tree you can right-click on a series and select Show revision history. New window will pop up where you can see all historical releases we have in our database. Any change in values will be highlighted in beige color.

What does Start column mean?

Consider the Start event as the point in time when the time series was created, as it represents the initial appearance of the series. For newer series it might mean there was no earlier revision before but for older ones it simply is the last available to us recorded state of series.

Set Vintage request for all series in document

Select date in the Vintage field on Series list. This setting will be applied to all series on Series list - program will try to get the revision for all series at this point in time.

How to set time in Vintage field?

The calendar date and time is show in your local time according to your Windows time zone setting.

For example, if you selected the vintage 1st December 2001, and your Windows time zone is UTC+1 , the absolute time is 2001-12-01 09:30+1.

Set Vintage request for one series

Revision history works together with { } brackets. To see historical data for one series, use in Expression field code with date for which you want data:

segdp{2001-12-01 09:30+1}

The format must be {YYYY-MM-DD HH:MM+offset}.

You can also add vintage time series directly from data-tree.

Formula editor hints

This feature is available in Macrobond 1.29 and later.

Since version 1.29 when you end typing series code you will see general hints about the possibilities of vintage data (revision history). If you type a { after code you will see all the specific vintages for that series.

How to add vintage time series from data-tree?

Go to Analytics tab > Series browser panel. Right-click on a series which have vintage data and select one of the possibilities:

How to set relative vintage time?

This feature is available in Macrobond 1.29 and later.

If you don't want to use exact date but want to go back/forth from certain point in time instead you can now use relative formulas to get the right series.

Relative to 'now':

atbopa0003{now} The last vintage (headline version), regardless of the document vintage timestamp.
atbopa0003{now,-1} The previous vintage relative to head version, regardless of the document vintage.

Relative to vintage date set in document on Series list:

atbopa0003{doc,-1} The previous vintage relative to any document vintage.
atbopa0003{doc,-2} The vintage before the previous relative to any document vintage.
atbopa0003{doc,+1} The next vintage relative to any document vintage.

Relative to timestamp:

atbopa0003{2023-12-22 09:07+1,-1} The previous vintage relative to the timestamp.
atbopa0003{2023-12-22 09:07+1,+1} The next vintage relative to the timestamp.

How to set time in vintage formula?

Revision history's time is expressed in UTC, therefore you always need to specify a time zone offset. Offset is always needed in formula for vintage data.

Applying document-wide vintage date on Series list will not affect vintage formula.

Using offset

Update for series segdp was released on 27th August 2021 9:30 Swedish local time. Sweden is in time zone UTC+2 so formula should be:

segdp{2021-08-27 09:30+2}

Most probably you know the release time in your local time. For example, if you are in Hong Kong formula should be:

segdp{2021-08-27 15:30+8}

In effect you can call same vintage data in many ways:

segdp{1999-12-01 09:30+1}
segdp{1999-12-01 15:30+7}
segdp{1999-11-30 23:30-9}
segdp{1999-12-01 08:30z}
segdp{1999-12-01 08:30+0}

UTC+0 time zone

If you want to have UTC time you can write it in two ways:

segdp{1999-12-01 08:30z}
segdp{1999-12-01 08:30+0}

Omitting time

You can omit the time, program will then assume time 00:00. Examples below show same result:

segdp{1999-12-01+0} 
segdp{1999-12-01 00:00+0}

Vintage (revision history) in Excel

This feature requires Macrobond Data+ license. 

You can download vintage data directly in Excel. Under All series panel use ‘Vintage’ field to select the date. it will set vintage date for all series in the list.

If you want to set vintage only for one series right-click on a series and select revision.

After converting object to document data set vintage information will be written as a formula (i.e., clnaac0143{2022-05-18 12:31+0} ).

Downloading values by release order (initial release)

To get first value (initial release) for each point in time type in code:

segdp{0}

In this example (colors won't appear in program):

  • Segdp{0} will return the initial release, the values marked in green
  • Segdp{1} will return the first revision, the values marked in blue

Note that the number means the nth time that value was changed. If the values weren't changed with each release it might look like this (colors won't appear in program):

  • usfcst4775{0} will return the initial release, the values marked in green
  • usfcst4775{1} will return the first revision, the values marked in blue. The last two values marked in pink are filled due to 'Conversion settings' > 'Missing value method'. If you do not want to have them select there 'Do not fill in'.

Questions

Can I sort by First/Last available revision date?

Yes. Go to data browser (for example in Search tab), right-click on a main bar, choose Select column. In the new window check box next to First revision and Last revision.  

How to check if revision was applied to a series?

On a chart's legend you will see analysis text for each expression that uses revision history informing you what was used.

What happens after applying revision if series doesn't have any?

There will appear a yellow bar with general information 'One or more series does not have revision history for the specified vintage.' When you got Series information tab and hover over highlighted series you will see why, there is no revision for this series.

On Series information tab you can check also which series has what revision in Vintage column. Right-click on a main bar, press Select columns and add Vintage.
If there is no revision data for selected point in time, program will display the first available data, highlight series on Series Information tab and in Vintage column will not show any date.

Why current series is so much different than in vintage release?

Series are presented as they were shown by the source at that point in time. The source could decide to shorten series, rebase it, or add more historical values.

Transpose

Overview

This analysis changes position of data - data from x-axis will be shown on y-axis and vice versa. You do not need to re-arrange series or change analysis' settings.

Transpose can be used with Category chartCategory scatter chart and Category table.

After which analyses I can use Transpose?

You can add it only after those analyses which outcome is presented as category data:

  • Cross sampling
  • Scalar
  • Slice
  • Seasonal adjustment MA
  • Yield Curve
  • Correlation
  • PCA
  • Regression

How to use it?

Add Transpose analysis and check the box in the 'Include' column. The Output labels can be automatically generated with Title generation tool. Below in Analysis tree add the chart/table.

Examples

Cross sampling with Transpose

In this example we performed Cross sampling analysis. With Transpose instead of having each country separately we can have them grouped by region.

Slice with Transpose

Here we sliced 2020 year from unemployment rate series. Then with Transpose we changed place of data - country is on x-axis and color is for each month.

Distribution stack chart - New houses

See Transpose used to show distribution of years (with date) by number of starts of new house construction.

Two by two stacked columns

Create two stacked columns with different calculations for each of two countries using Cross sampling and Transpose.

User defined formulas (User defined functions)

Purpose

Not everything can be pre-programmed in Macrobond formula language but from these blocks you can build exactly what you need. Writing custom functions is also useful when you want to re-use custom formulas multiple times or share them with your colleagues.

How to create and save your own formula?

Create new formula

Select Edit from the main menu at the top of the application and click 'User defined functions.' The Function editor window will pop up. On the left you have a list with all User defined formulas and on the right formulas from Macrobond formula language. The main window allows you to define a function.

To start the process, select which account you want to save it in, and press 'Add' under left panel.

Note that User defined formula always starts from a dot (point) to differentiate it from standard formulas.

Simple example

A simple example could be:

.gdpru(x)=(x/rugdp)*100

This function calculates what % of the Russian GDP is covered by series x. In this case, '.gdpru' is the formula’s name, and (x) is the parameter it takes in. After saving this formula, you can use it in the Series list by substituting x for any series. For example:

.gdpru(rugpfi0026)

Which will yield the Russian debt to GDP ratio.

Advance example

You can use let/in/end to nest formulas:

.PerformanceYearsBack(series, years) =
let
.end = End(series)
.start = DateAtOrAfter(Year(.end)-years, Month(.end), Day(.end))
.short = CutStart(series, .start+1)
.prev = At(series, .start)
in
(.short - .prev) / .prev * 100
end

This function calculates the performance since a point a number of years back. After saving this formula, you can use it in the Series list by substituting  series and years. For example:

.PerformanceYearsBack(sek, 20)

Incorrect formula

If any part of the formula is written incorrectly that portion will be underlined in red, with a description of the problem being shown in the text field at the bottom of the editor.

Description

To finish off the process, add a description for your new formula, in the field above. Then choose a formula group. And press Save.

Finding and applying saved formula

It works as any other built-in formula. Begin writing the name in the formula expression field. Don’t forget that you need to include the point at the start. Otherwise select it from the formula list on the right.

How to copy user defined formula to other accounts?

It is not possible to move formula to another account. You need to copy & paste formula manually. The easiest way is to use CTrl+C/Ctrl+V as below:

Or you can use this method:

  1. Go to Edit > User defined formula...
  2. Copy formula.
  3. Click on the existing account (Personal/Department/Company) where you want to have this formula.
  4. Press 'Add'
  5. Paste formula in new formula window.
  6. Press 'Save'.

Examples

Growth by Annual rate

In this example we used user defined formula which creates a series with a growth rate specified by an annual rate from end of one series to the end of another.

Growth by rate

In this example we used user defined formula which calculates series based on a growth rate (which can also be a negative number).

Oil Brent futures with lag

In this example we used user defined formula to output last observation from many series.

Winsorizing (winsorization)

Formula which limits extreme values.

The Macrobond formula language

Overview

The purpose of the Macrobond formula language is to make calculations on series. The series are typically time series, and the result of a formula is a time series. By using formulas, you can combine arithmetic operations with built in functions to create powerful calculations. You can find a complete list of built-in functions here.

Important information

Formulas can be utilized in both the Series list and Formula analysis. The method of writing formulas is identical between the two except for how references to series are written. The instructions below are targeted at making calculations in the Series list. Thus, when referencing to a series, the name of the series will be used and not an alias. To learn more about aliases and the differences between using the Series list and the Formula analysis for calculations, click here.

How calculation works

Series names

You can use the name of a series from the database in formulas. A very simple example is:

sek

Which references to the Macrobond series called 'sek'.

Numerical constants

Examples of numerical constants are:

123

-123

123.456

123.456e-3

Arithmetic operations

An example of a simple formula is the following:

 sek/nok

which calculates the cross rate between the Swedish and the Norwegian krona.

The time series sek and nok are treated as vectors of values. In a vector, each value can either be a number or a null value.

Before any calculations are made, all series in the formulas are converted to the same frequency. The series also need to be the same length. This is achieved by adding null values to the beginning of the vectors until they are of equal length.

Example

The division operator / takes two series and produces a new series by dividing the corresponding values in the two series. For most calculations, the result is null when any of the values is null.

This can be illustrated using a fictional example:

Observation number sek nok sek/nok
1971-01-04 0 5.1643
1971-01-05 1 5.1628
1971-01-06 2 7.1367 5.1614 0.723219
1971-01-07 3 7.1386 5.1649 0.723517
1971-01-08 4 7.1382 5.1631 0.723306
1971-01-11 5 7.1390 5.1642 0.723379

The operators, such as +, - and * handles null in the same way.

Functions

In addition to the arithmetic operators there are also functions. A function can take zero or more parameters. Parameters are written within parentheses and separated with commas.

Example

An example of a formula using a function is the following:

 Sum(sek)

This will calculate the sum of all values in the series. With the same fictional values, the result would look like this:

Observation number sek Sum(sek)
1971-01-06 0 7.1367 42.8326
1971-01-07 1 7.1386 42.8326
1971-01-08 2 7.1382 42.8326
1971-01-11 3 7.1390 42.8326
1971-01-12 4 7.1411 42.8326
1971-01-13 5 7.1390 42.8326

Another example is the following formula:

 sek*100

The result is a new series where each value is multiplied by 100. There is no function multiplying a series with a constant. Instead, the formula interpreter will create a series where each value is 100 and then multiply these series:

Observation number sek constant sek*100
1971-01-06 0 7.1367 100 713.67
1971-01-07 1 7.1386 100 713.86
1971-01-08 2 7.1382 100 713.82
1971-01-11 3 7.1390 100 713.90
1971-01-12 4 7.1411 100 714.11
1971-01-13 5 7.1390 100 713.90

In most cases you do not need to know about the conversion of constants to series, but it can help you to understand what happens with a formula like this:

Sum(100)

If you have this formula together with the series 'sek', the result would look like this:

Observation number sek Sum(100)
1971-01-06 0 7.1367 600
1971-01-07 1 7.1386 600
1971-01-08 2 7.1382 600
1971-01-11 3 7.1390 600
1971-01-12 4 7.1411 600
1971-01-13 5 7.1390 600

You get the value 600 because the constant 100 is converted to a series with six values of 100 and then this series is summed.

Observation windows (rolling)

There are functions in Macrobond with the same name that take different number of parameters. Many of these functions take a parameter that specifies the length of a window. With them you can create rolling series.

Example

Sum(sek) This is the sum of all values in the series.
Sum(sek, 3) This is the sum of three observations including the current observation and the two previous ones.

If there are not enough data points to fill the window, the values will be null.  This is depicted in the table below.

Observation number sek Sum(sek,3)
1971-01-06 0 7.1367
1971-01-07 1 7.1386
1971-01-08 2 7.1382 21.4135
1971-01-11 3 7.1390 21.4158
1971-01-12 4 7.1411 21.4183
1971-01-13 5 7.1390 21.4191

Another useful example of a function that takes a window length is Mean(series, window). The result of this function is also known as the moving average.

Observation numbers

Each position in the series has an ordinal number called an observation number. You can get a series of the ordinal numbers with the function Counter().

Observation number sek Counter()
1971-01-06 0 7.1367 0
1971-01-07 1 7.1386 1
1971-01-08 2 7.1382 2
1971-01-11 3 7.1390 3
1971-01-12 4 7.1411 4
1971-01-13 5 7.1390 5

If it is a time series, each ordinal number is associated with a date.

Please note that you should never make any assumption about what the ordinal numbers are. Thus, you should not use an expression like At(sek, 0) to get the first value of a series. The ordinal number of the first observation can be any number. Instead, you should use FirstValid(sek), which returns the ordinal number of the first observation that is not null.

Example

There are a number of functions related to dates and ordinal numbers. A useful function is Date(year, month, day) that returns the ordinal number for a specific date.

Observation number sek Date(1971,1,8)
1971-01-06 0 7.1367 2
1971-01-07 1 7.1386 2
1971-01-08 2 7.1382 2
1971-01-11 3 7.1390 2
1971-01-12 4 7.1411 2
1971-01-13 5 7.1390 2

With the function At(series, observation) you can get the value at a specific observation number.

Observation number sek At(sek, Date(1971,1,8))
1971-01-06 0 7.1367 7.1382
1971-01-07 1 7.1386 7.1382
1971-01-08 2 7.1382 7.1382
1971-01-11 3 7.1390 7.1382
1971-01-12 4 7.1411 7.1382
1971-01-13 5 7.1390 7.1382

Logical values and functions

The logical values in a series are represented by 1 for True and 0 for False.

Example

In the following example the operator > (greater than) is used to compare a series with a lagged series. The result will be True (1) when the value has increased since the last observation.

Observation number sek Lag(sek, 1) sek > Lag(sek, 1)
1971-01-06 0 7.1367
1971-01-07 1 7.1386 7.1367 1
1971-01-08 2 7.1382 7.1386 0
1971-01-11 3 7.1390 7.1382 1
1971-01-12 4 7.1411 7.1390 1
1971-01-13 5 7.1390 7.1411 0
1971-01-14 6 7.1390

The function If is commonly used with logical expressions. The function takes three parameters. The first parameter is interpreted as a logical expression that determines if the result should be the value of the second or the third parameter.

The following example replaces any null values (missing values) with the highest value of the last ten observations

If(IsNull(sek),High(sek, 10), sek)

Comments

You can write comments in formulas by typing two slashes ' // '. Everything after the slashes to the end of the line, will be regarded as a comment and not part of the formula expression.

Operators

All operators work on constants and series. Here are the operators sorted in order of precedence. Parentheses can be used to further control the order of evaluation.

Operator Description Precedence
- Unary minus 7
* Multiplication 6
/ Division 6
+ Plus 5
- Minus 5
! Logical “not” 4
> Greater than 3
>= Greater than or equal 3
< Less than 3
<= Less than or equal 3
= Equal 3
<> Not equal 3
& Logical “and” 2
| Logical “or” 1

The formula editor

The Formula editor helps you write formulas. In the Series list, you can bring up the Formula editor by clicking on the 'fx' button to the right of an expression, or by pressing Ctrl+E on the keyboard.

Some of the useful features of the editor are:

  • You can double click on a function in the list to the right to insert a function.
  • You can filter the list of functions by typing a text in the box above the list of functions. Only functions that contain the text in the function name or description will be included in the list.
  • The tooltip of a function in the list or in an expression will show details about the function:
  • The tooltip of a series will show the title of the series: 
  • When the text cursor is just before a left parenthesis or directly after a right parenthesis , a blue shade on the background will show the area between the matching parentheses:

 

  • When the text cursor is within or next to an identifier, all instances of that identifier in the expression are shaded:

 

  • The editor will check the syntax as you write and show a red squiggle if there is an error. You can see the error text at the bottom of the editor or as a tooltip:
  • Numerical constants are dark red. Series, parameters and variables are blue. Comments are green.
  • The series used in the document will be displayed at the end of list of functionsThe tooltip will tell you the title of the series. You can double click to enter them into the formula.

  • Auto-suggestions for series name. When you start typing a series name in the Formula editor, suggestions will be shown if the text you type matches the start of a series in the document or any of the series you have marked as favorite. Use the mouse or press the down arrow to walk the list. The tooltip will give you the series title. Press 'Enter' to select.

User defined functions

User defined functions are a way to structure formula expressions and to reuse commonly used expressions. It allows you to both reduce the amount of formulas you write and save the functions for future use. For more information see: User defined formulas

Introduction to Formula

Overview

The Formula analysis is used to preform calculations on time series using a formula language. Formulas and functions can be combined yielding powerful calculations in a single layer. The flexibility of the feature allows you to preform calculations that cannot be carried out using other analysis tools in Macrobond.

Why use formulas?

  • Combine calculations - apply several calculations at once, to perform complex calculations, which might not be available through the predefined set of analyses.
  • Customize - adjust the calculations to your needs and transform series in various ways.
  • User preference - create formulas from scratch with the ‘formula language’ or choose from a library of predefined formulas.

How to work with Formula analysis?

Aliases

When Formula analysis is applied all series in the document are assigned aliases. The aliases are on the form fx:s1, fx:s2, fex:s3, … , fx:sN where N is the number of added series. You can change an alias, however, it needs to start with 'fx:'.

Expression

In the expression column, there are two ways of adding a calculation. Either write the expression in the text field or click the 'fx' button to access the formula editor. The formula editor gives you a good overview when writing longer expressions and has help mechanisms for finding functions.

Case sensitivity

You can type formulas and series codes using capital letters or lower case, the application will know what you mean.

Include input series as output

This setting is selected by default. If selected, all series, both input series and calculations, will be available in the chart or analysis following the formula layer.

Applying formulas: Series list vs. Formula analysis

You can utilize the formula language in both the Formula analysis feature as well as in Series list. However, there are some differences in how references are made.

When writing formulas in Series list, the calculations will be applied to the raw data.  When writing formulas in Formula analysis the calculations will be applied on top of any preceding analysis. This means Formula analysis lets you add formulas as an intermediate step in your overall analysis.

Because the Formula analysis is an intermediate step the way you reference to series differs. In Series list you refer to the unique name of the series. In Series list, the name of the series can be observed in the expression column. The name of a series is also displayed in time series information or by hovering over the series in the database. In formula analysis you refer to a series by its alias displayed in the alias column and not the name of the series.

How to create a formula in the Series list?

NOTE:

When in the Series list, remember to use the Macrobond series codes to identify the series.

  1. Click on the 'fx' button located next to the series expression field.
  2. Scroll through the list of available functions or use the filter field to narrow your search.
  3. Double click to select a formula from the list.

If you want to use directly a series code as part of the formula, highlight the code first, that way it will be integrated into the formula once you select it from the list.

How to create a formula in the analysis tree?

NOTE:

The application will automatically assign aliases to all time series to indicate that analyses may already have been applied to the raw data, thus changing the series from its original form. Aliases can be customized, but must start with the prefix 'fx:'

  1. Start by clicking on the point at which you want to apply the formula in the tree.
  2. Click on 'Add' or 'Insert' and select Formula from the list to open the Formula editor.
  3. Enter the relevant formula expression in the field.

By default, 'Include input series as output' is checked. This will allow you to select both the input series as well as the calculated series, in any of the analyses or presentations you apply after. When unchecked, the input series will no longer be available as output.

Frequency

Every formula produces a time series matching the frequency of the document.

How does the Formula language work?

Input variables

Formulas contain a predefined set of variables which need to be added to the expression. Different formulas require different types of variables, for instance a series, a number or an observation number. Below you will find a few important ones:

  • series – represents the name of series, whether it is a series code from the Macrobond database or a reference alias when using the Formula analysis
  • number – a digit expressing the value of a parameter e.g., LastValid()
  • observation number – each cell in a series time table has an ordinal number called an observation number, used to reference the corresponding date. You can get a series of ordinal numbers with the formula Counter()
  • window – a parameter that specifies the length of a window of observation. The input is a number. It is used for rolling types of formulas

It’s crucial to input the correct variables and to know what the name refers to, as many formulas will require several variables. For example:

Cop(series, length)

This formula is built around two variables that you need to input: the series code and the length of the period, for which the change in percentage will be calculated.

Dates

In formula language, dates can only be written in a specific format. The format is Date(YYYY, MM, DD), which contains three variables.

To show an example we can use the Swedish currency, SEK. if you would like this time series to start from a specific date, you would use the CutStart() formula. Like this:

CutStart(sek , date(1971 , 01 , 04))

This will produce a modified version of the 'sek' time series, which will now start on the 4th of May 1971. Below is an example of how this would look in a table:

1971-01-08 4 5.1631
1971-01-07 3 5.1649
1971-01-06 2 5.1614
1971-01-05 1 5.1628
1971-01-04 0 5.1643

Notice that the earliest value in the series is assigned with the observation number ‘0’.

Window parameter (length)

Window parameter is designed for rolling calculation where said window represents moving time range. Length can be expressed as:

  • Number (of observation) i.e., 4, 20, 252;
  • Series i.e., an Account in-house series with numbers. If the values in a series are not natural numbers, application will round them down to the nearest natural number.
  • Period with formulas i.e.,:
yearlength()
YearsLength(1)
quarterslength(4)
Monthslength(12)
Weekslength(52)

For example:

Median(usgdp, 12)
StdDev(sek, YearsLength(10))
mean(decpi, Monthslength(4))

What happens at the start of series?

The window is getting more narrow with each observation.

If you do not want these to appear we recommend using below custom formula:

.mean(series, window) = if(Counter(series)>=window, mean(series, window), NULL())

For more information how to add and use formulas with dot see User defined formulas.

Which formulas should you get familiar with first?

The Macrobond Formula library is vast, but some formulas are more common than others. Below is a list of the most popular formulas that may be useful to start with.

Date(year, month, day)

Returns the observation number of the specified date, where month is 1-12 and day is 1-31. If there is no observation at the specified date, the observation number of the closest previous observation is returned.

COP(series, length)

Returns the percentage change over the number of specified observations. This is the same calculation as 'Change over period %' in the Rate of change analysis.

Momentum(series, length)

Returns the difference between the series and a lagged series. This is the same calculation as 'Change over period - value' in the Rate of change analysis.

YearLength()

Returns the number of observations per year based on the frequency.

This is a very useful formula when you need to set a length of 1 year, as it won’t be affected by a change of frequency in the document.

Example: cop(usgdp, 4) --> with a document set to quarterly frequency, this returns a 4-quarter percentage change (i.e., 1 year % change). But changing the document’s frequency to monthly will result in a 4-month growth rate. Yearlength() always calculates the numbers of observations in a year based on the frequency of the document and will automatically update if you change the frequency.

Join(series1, series2)

Returns a series that is a combination of two series, joined at the end of series1. There are variants to this formula depending on how you want to transform the two series into one.

Counter()

In Macrobond, every date has a unique observation number.  This command returns a series of all observation numbers in the document.

Now()

Returns the observation number of today's date. You should use this function carefully since it depends on the time zone of your computer. In most cases, it is often better to use the End() or End(series) functions.

First(series) & Last(series)

Returns a series consisting of the first or the last available value of a series.

Start(series) & End(series)

Returns a series consisting of the first or the last observation number of a series. This is used to select or adjust a value at a specific point in time.

Month()

Returns a series of the month (1-12) for each observation. This helps as with a more complex formula, which requires you to identify specific months.

If(condition, value1, value2)

Returns the first value if the condition is True and the second value if it is False.

Extend(series, observation, number)

Extends a series to a specific observation in the future using the specified number.

Standardize(series)

This is also known as a Z-Score. It returns a normalized series which counts (in standard deviations) how far a value is from the mean value of the time series. The mean is represented by '0' and one standard deviation is '1'.

For more about formulas see: The Macrobond formula language and Built-in formula functions

Example

United States GDP 1 Year Sum

Here, using formula we calculated 1-year rolling sum for United States GDP.

E-learning

After reading this article you should be equipped with a basic understanding of how formulas work to be ready for next step, which is completing the exercises in our Test Your Skills section for Formula language. Doing these tests will ensure that you’re fluent in our formula language.

Rolling regression

Overview

The Rolling regression analysis implements a linear multivariate rolling window regression model. Just like ordinary regression, the analysis aims to model the relationship between a dependent series and one or more explanatory series. The difference is that in Rolling regression you define a window of a certain size that will be kept constant through the calculation. The analysis preforms a regression on the observations contained in the window, then the window is moved one observation forward in time and process is repeated. Thus, many regressions will be performed as the window moves forward.

Estimation model

For more in-depth information regarding the estimation model see Regression analysis.

Working with Rolling regression analysis

Settings

Regression models

You can define one or more regression models. Each model has separate settings. When a new model is created, the settings of the current model are duplicated. Models can be renamed and deleted.

Output dependent series

Select this option to include the dependent series in the output.

Output explanatory series

Select this option to include the explanatory series in the output.

Date range

Specify the limits of the date range and window length. The default range will be the largest range where there is data for all the series.

No intercept

When this option is selected, the constant α is omitted from the model and it will be defined as:

y t = β 1 x t1 + β 2 x t2 + β 3 x t3 + ϵ t

Residuals

When this option is selected a series containing the residuals will be included in the output.

Durbin-Watson

The Durbin-Watson is a test statistic used to detect the presence of autocorrelation in the residuals. The value is in the range 0-4. A value close to 2 means that there is little auto correlation. Values from 0 to less than 2 point to positive autocorrelation and values from 2 to 4 means negative autocorrelation. The result from this test is not useful if any dependent series is included with several lags or if no intercept is included in the model.

For more information about this see Investopedia.

Schwarz

The Schwarz information criterion takes overfitting into account and estimates the efficiency of the model in terms of predicting the data. The criterion yields a positive value, where a lower value is considered better when comparing different models based on the same data.

R2

The R2 value compares the variance of the estimation with the total variance. The better the result fits the data compared to a simple average, the closer this value is to 1.

Coefficient

The estimated parameters.

P-values

The p-value is the probability of obtaining a value of t that is at least as extreme as the one that was actually observed if the true value of the coefficient is zero.

T-values

The t-value measures the size of the difference relative to the variation in your sample data.

Series settings

Include

Select if you want to include this series in the model.

Is dependent

Select which series is the dependent series. This must be specified.

Diff

By selecting Diff, the first order differences of the series will be calculated. The result will then be converted back to levels. First order of differences means that the series is transformed to 'Change over value (one observation)' while expressing the result in levels. If you tick that option, the result will output the coefficients for intercept and diff(x1) rather than intercept and x1.

This setting does not affect the model itself. It only influences the step after the calculation of the model when the levels are calculated from the differences.

Lag to/from and Lag range

Here you specify the lags you would like to include for a specific series. When lagging a series, the values are delayed in time and the series stretches further into the future.

If you for example set “Lag from” to 0 and 'Lag to' to 2 three series will be included, one series with no lag, one with a lag of 1 and one series with 2 lags. This will automatically change the lag range to '0 to 2'. You may specify the desired lags using 'Lag to/from' or 'Lag range', the result will be the same. If you set Lag range to a single digit or set 'Lag to' and 'Lag from' to the same value, a single lagged series will be included.

When lags are specified for the dependent series, the lagged series will be used as explanatory series in the model. The dependent series will always be without lag.

How to create simple rolling regression model?

  1. Check box for 'Output the dependent series' or 'Output the explanatory series'.
  2. Select window length.
  3. Select Output indicators (they will appear on chart).
  4. Check 'Include' for at least two series and mark one as 'Is dependent'.
  5. Add Time chart.

Common errors

Degree of freedom is too low

You cannot fit the regression coefficients if there are no degrees of freedom. The degrees of freedom are the number of observations - number of parameters that we are estimating. The number of estimated parameters includes the intercept.

The number of observations must thus be larger than the number of independent (explanatory) series.

Forecast

It is not possible to calculate forecast in Rolling regression analysis. In some way such functionality at some point would base forecast on itself because it keeps rolling.

As a workaround, we recommend using simple Regression analysis. In 'Estimation sample range' type in parameter '-window_length' (i.e.; -5m and -2m; -50 and -10). Thanks to that you will set last non-forecasted value in the desired point in time. When you enable 'Calculate forecast' box this will calculate the forecast based on the regression of the narrowed earlier number of observations of these two series.

Report

The fact that a rolling window is utilized has implications for the output. When using Regression analysis, a report is generated. In Rolling regression, no such report will be available. This is because, as explained in the overview, a rolling regression constitutes of many regressions, all of which will yield individual statistics. The output of statistics, information criteria and parameters will thus all be time series. You have many options regarding what information to include in the result.

How to output indicators?

Simply mark the indicator in the panel and it will be available as output.

Calculating regression with formulas

To calculate α and β use:

Intercept(series1, series2, window)
Slope(series1, series2, window)

where series1 is the dependent series and series2 is the explanatory series. If you get different values than from analysis check 'Estimation sample range' - it has to be calculated on identical time range. To avoid adding Cut() formula everywhere you can set data range on Series list.

Formulas above calculate the regression between two series, but if in Regression analysis are more series these won't be comparable models - you will get different outcomes.

Examples

Rolling regression

In this example, we used the model presented for the Regression analysis, and created a new regression model which is generated on 5 years rolling window. For the output, we've included the residuals and the R2.

Mulitvariable rolling regression

Here we calculate explanatory variable share.

Questions

Why rolling regression's average residual is not zero?

If you do a standard regression, the mean of the residuals is zero. With Rolling regression, it does one separate regressions at each point in time and thus residuals are not zero on average.

Why model's values are different than the ones coming from a same model but rolling?

The differences stems from different time ranges being taken into calculation because of:

  1. Mon-Sun daily series vs Mon-Fri daily series
  2. what Macrobond takes into account when calculating a sample range

To ensure that you are looking and comparing the exact same time periods use one of the below methods:

  • change frequency in the document to Daily (not Daily (highest) or Daily (lowest)) and set Observations to Monday-Friday
  • change the window size in Rolling regression from, for example 'X months', to number of observations in Regression i.e., '402'.

The first method will give closely similar results but still not 100% same due to the way Macrobond sets Start range i.e., '-18m' is not strictly '18m', but '18m' counting from the previous observation so, effectively '18 months +1'. In which case, you may want to set the Window length to a number of observations instead.
The latter approach will output the same values in both Regression and Rolling regression.