- Working with Cross sampling analysis
The Cross sampling analysis is a tool for extracting particular values or metrics and comparing them across series. Use this analysis when you want to create a chart with categories, such as countries, along the x-axis, and columns of values on the y-axis. The Cross sampling analysis is a successor to the Scalar analysis with a new workflow that is especially tailored for lists.
The lists are pre-prepared data sets on which you can easily operate in Cross sampling (and in other analyses). To change data, you just go back to Series list and replace or add a series without rebuilding whole analysis. Since version 1.25 you can also create and share lists through My list feature.
The Cross sampling analysis can perform a variety of calculations that result in one value per input series, such as the last value, the mean in a time range, or year to date performance. The output is always a category series, meaning that the time variable is replaced by a categorical variable. You can display this output in a Category chart, Bar chart, or Category scatter chart.
All input data in this analysis go in Lists - data sets defined under Series list's tab. For more information see: Lists of series. Since version 1.25 you can create and share lists with your colleagues through My list feature.
The settings consist of four parts. In the screenshot below, there are two lists and two series that are not part of any list to the left. On the right-hand side, the two lists have been added and can be seen as two columns.
- Select what calculations to apply to each series. Details about the calculations can be found below.
- Select how the output should be generated.
- This is the list of all input series and lists.
- Here you organize the series that should be used for the output.
Here, you can add one or more calculations that will be performed on all selected input series. The available calculations are:
Open, High, Low, Close
The first, highest, lowest, or last value of the specified range.
Mean, Median, Standard deviation
The mean, median or standard deviation of the range.
The last valid value of each series.
The value at the last point in time at which all the included series have values.
The last value that is not a forecast in the series.
The value at a specific point in time. If a series is missing a value for that date, the first available value before that date will be used.
Nth last value
The nth last value of a series, where a value of 1 gives the last value, 2 gives the second value to last, 3 gives the third value to last etc.
The specified percentile of the selected range.
Lower, Upper tail mean
The mean of the values in the upper or lower percentile of the range.
Year, Quarter, Month, Week to date
The performance from the start of the period to the specified date. The performance is measured as the change compared to the last value of the previous period.*
The performance between two specified dates. The performance is measured as the change compared to the last value of the previous period.*
Performance analysis works a bit different than performance calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value, while in Performance it gives an error if the specified start date is missing. Since version 1.25.46 you can use 'Strict' box to select the date:
Years, Quarters, Months, Weeks back
The change from a selected number and type of periods before the specified date.*
For years and quarters, this is the same as using the 'Rate of change since' method and specifying the start of the range as '-1y' or '-1q'.
Rate of change since
The rate of change between two points in time.
Rates of change as value, percentage or logarithmic are calculated in the following way:
where c is the typical number of observations in one year.
Rate of change analysis works a bit different than Rate of change since calculation. In Cross sampling and Scalar program finds the first non-missing value and use that as the base value. Since version 1.25.46 you can use 'Strict' box to select the date:
The percentage proportion of each series compared to the sum of all series at a specified point in time.
Most Cross sampling calculations require either a point in time or a time interval to be specified. You can use specific dates, but you may want the dates to update when new data is added. In that case, leaving the date box blank or using relative dates, such as '-1y' can be useful. It’s important to understand what default dates are chosen when none is specified, and how relative dates work in each context.
Point in time
First, we’ll talk about calculations that require only one point in time, such as value at. If the point in time box is left blank, the last valid value for each series will be used.
If you specify a relative date here, that date will be relative to the last calendar date, not relative to the last date for each series. If you would like the last calendar date to be used, even though not all series may have values, you should use the relative date '+0'.
If you leave the range start blank, the first available value for each series is used. If you leave the range end blank, the last available value for each series is used.
When you use a relative date for the range start and leave the range end blank, the end point will be the last valid value for each series and the starting point for each series will be relative to its last point, not the last calendar date.
If you use relative dates for both the range start and the range end, they will both be relative to the last calendar date.
You can select one of two ways to create the output series.
One series per calculation
For each calculation defined, there will be one series per column defined in the organization pane. In this mode you can select what metadata to use for generating the labels by selecting Label generation:
Based on the example in the screenshot above, this means that there will be one series with the GDP values and one series with the unemployment numbers.
One series per input
The analysis is tailored for lists of series. You organize the output by selecting a list of series on the left-hand side. In most cases, you drag a list over to the right-hand side, which will add the list to the output. If you want to change/add series, you need to do this on List in Series list.
Series will be paired automatically by sub-region metadata.
Note you can only place lists side-by-side if the series belong to the same family or at least one of them is a list by region. The entries in the lists will be automatically aligned.
You can also add with drag and drop individual series (or group of them). Mark series and on the right navigate so you would see bolder horizontal line - then you can drop series and it will be added to the group.
The order will be determined by one of the columns that is based on a list. You can select which column decides the order by clicking on the button in the column header.
If any of the lists have missing series, a red background will appear. A series may be missing if no series has been entered in the list for that position in a plain list or if two lists by region do not have the same set of regions.
You can replace individual series by dragging a series to a position in the table to the right. This can be used for doing an exception in a list or for filling in missing entry when you do not want to change the underlying list.
- Copy/Cut series you want to use.
- Go to Series list > Lists tab, use 'Add new by region list' and paste series.
- Add list to Series list.
- Add Cross sampling analysis, select calculation and Output mode.
- Drag list to from 'Series' to 'Group'.
- Add chart or table.
This feature is available in Macrobond 1.25 and later.
With Transpose analysis you can change data's place from x-axis to y-axis and vice versa without rearranging data or rebuilding Cross sampling. See Transpose for examples.
In this example, we calculated average of GDP values for several countries and plotted it together with those values.
In this document we worked on lists containing city level series for three different indicators, which then were combined in one bubble chart to compare values.
We applied conditional formatting rules to the Bar chart table created with Cross sampling analysis with use of lists.
Sometimes you want to create a group with constant series. For example, here a sum of the GDP series has been created and this series has been added by selecting it ono the left and pressing the 'Add selected series as new single series column.' See the file under Examples: Single series column with average line.
If new series are added to the lists, the single series column will be extended to include more rows of the same series.
What is the difference between 'Add selected series as new column' and 'Add selected series as new single series column'?
If you are working with series not arranged in a list, you should use 'Add selected series as new column.'
If you want to have one series as a whole column select 'Add selected series as new single series column.'