Aggregate

Overview

The Aggregate analysis transforms a time series so that the values are summed either periodically (each year for instance) or continuously, starting from a specified date. Aggregate is useful when you are working with 'flow' series.

Settings

In this analysis, you can define the following settings to determine how the calculation is done:

Period

The window of the calculation is set by the period you choose. If you select 'All' a continuous sum will be performed and you can set a date at which the sum should start.

Percentage

Selecting this option will express the result as a percentage, so it will calculate the sum and divide by 100.

Rolling

If you’ve chosen a period other than 'All', you can select 'rolling' to perform the sum on a rolling basis. The window of the rolling sum is the same as the period you have chosen.

Examples

Rolling aggregate

In this example, the aggregate analysis is used to calculate an annual rolling sum of the German current account. In other words, the sum is performed on a rolling window of 1 year.

Aggregate by fiscal year

How to calculate fiscal year rolling aggregate on a series from country which doesn't report it like calendar year?

Questions

How do I calculate a rolling sum?

There are two main possibilities to calculate a rolling sum:

  • The Aggregate analysis:


Set the 'Period' to the desired rolling length, and to not forget to tick the setting 'Rolling'.

  • Formula:

You can also use the formula

sum(series, window)

Example:

sum(usflof8344, YearsLength(2))

This will calculate a 2-years rolling sum on 'usflof8344'.

For more about formulas and how formula language in Macrobond works see  Formula analysis.

How to keep series as is and start aggregating it from certain point in time?

Use formula (on Series list or in Formula analysis):

AggregateSum(CutStart(series, Date(YYYY, MM, DD)))

in

join(older_series, newer_series, Start(newer_series))

CutStart() will create series from fragment you wish to aggregate. AggregateSum() will cumulate values. Then you need to connect that cumulated fragment with regular series using join(). As in below example:

join(sek, AggregateSum(CutStart(sek, Date(2024, 4, 2))), Start(AggregateSum(CutStart(sek, Date(2024, 4, 2)))))

Sorting

Overview

The Sorting analysis ranks the values of a series in an ascending or descending order. It’s most useful after calculating output series using another analysis such as Scalar or Cross section. You can then use sorting to rank categories and for arranging output series order.

Settings

Direction

Here, set the order in which values of the series are sorted. When you have one input series, you have the options of Ascending and Descending.

When working with multiple series, you can select a main series to rank, and sort the other series by the sorting off the main series.  You can do so by selecting 'by [series name]' under direction.

Observation limit

This sets the limit for the number of 'items' (i.e., observations or categories) to which the sorting should be applied.

This is very useful when you only want to display a smaller number of observations / categories than you have in the document.

Example: In a document containing 50 series, you would set the observation limit to 10 if you only want to display the top ten observations.

Missing values

You can choose to exclude values that are missing values in the master series by selecting this option.

Examples

Descending order

Here, we created two category series using the scalar analysis, one with the last available values and one with the values at 2015. We sorted the last available values series in descending order and sorted the second series by the order set for the first series.

Sorting with limit

See solutions for getting top X categories while using Descending order in Sorting analysis.

Questions

Why I see numbers instead of categories?

Sorting analysis needs to have all data sets pointed in some way, it can't be left with 'None'. If one data set has Direction set to Ascending/Descending, then the other(s) data set must follow that one:

How to show the top 10 highest values in Descending order?

For Category chart 'Limit' is always calculated from the right, and for Bar chart from the bottom.

There are three solutions:

  1. Sorting after Sorting (recommended) - use first analysis with Ascending order to filter out top 10, and second analysis to reverse the order to Descending.
  2. Flip sorting - change the presentation's order in the chart from Descending to Ascending.
  3. Display range (only Category chart) - sort without the limit and set display range on axis.

See these examples in a file Sorting with limit.

I set Observation limit number but it doesn't work?

You need to set Observation limit to all data sets:

Please use down arrow next to column's name to apply value to all fields below.

 

Slice

Overview

The Slice analysis cuts a time series into several pieces to compare different periods of time on a graph. For instance, to compare GDP recovery after each recession period you would use this functionality.

Settings

Period

is the first setting that you should specify. You have 3 main options to slice the series:

Set to year/quarter/month

this setting cuts series into calendar periods. The time series will therefore be sliced on a yearly/quarterly/monthly basis, meaning that one series per period will be produced. You can then specify the calculation range, which refers to the time horizon on which you want to slice the series.

Custom ranges

This setting slices time series for specific ranges of observations that you set manually.

The ranges can be open-ended.

Custom points in time

Series will be cut into several periods that are defined by:

  1. a referenced 'point in time'
  2. the length, that is specified in terms of number of observations
  3. the relation between the length set and the referenced point in time: after / around / before.

Calendar mode

This setting determines how to align the different years (or quarter or months, depending on your selection of the period). It makes a difference for series that are daily or have skipped dates.

If the mode is set to 'date' the observations for each slice are align based on the month and day of the observations.
If the mode is set to 'ordinal' the observations are aligned purely based on the order within the period. This mode is recommended for daily series.

Include only periods available in all series

It will exclude periods that are not available in all series. Recommended for removing gap when there's a leap year.

Rebase

Rebase with base value

When selected, this setting will rebase all series produced at their starting point with chosen base value, typically 100.

Additive rebase with base value

This feature is available in Macrobond 1.30 and later.

Each value of the timeseries in the underlying segment will be subtracted with the first value of the segment minus the value you input. This is particularly useful when you want to have result of each segment starting at 0.

Use legacy format

Checking this option will enable legacy output meaning the analysis won't group slices into lists. Please note that by enabling this option all following analyses will lose their settings.

Examples

Periods set to year/quarter/month

In this example, we look at average monthly performance of S&P Index since 1928.

Custom range

In this example, we chose to have 3 sliced series starting respectively in 1929-09-01, 1987-09-01 and 2008-08-15, with a length of 2500 observations after these dates. Here, we used the S&P 500, and comparing it after the 3 main financial crises. The sliced series are rebased to 100 to facilitate the comparison.

Custom points in time

Here, we listed the periods between US recessions. We ticked the rebase option to ensure sliced series are comparable - they all will start at 100. This way we will see how jobless claims evolved after each recession period.

Lag

Overview

When you are investigating the relationship between series, it can happen that their movements are not synchronized. This is where the Lag analysis becomes useful. It moves values of a series backward or forward in time by the number of observations that you specify. This functionality relates to another analysis, Correlation, which helps you identify the optimal lag setting to get the highest correlation between two series.

Settings

Method

You decide in which direction you want to move the values of the series:

  • Lag: moves a series forward by specified length.
  • Lead: moves a series backwards by specified length.

Base

Here you specify by how many observations you want to move the series.

Example

Lagged by 3 observations

In this example, we used the output from the Correlation analysis to decide which settings to apply in the Lag analysis. As a result, we lagged the US ISM PMI by 3 observations the. In the chart, we can now make assumptions on the future movements of Industrial Production, based on the current values of the ISM PMI.

Questions

How do I lag or lead a series?

There are two main possibilities to lead or lag a time series.

  • In Lag analysis:

Set the direction as 'Method' and length as 'Base'.

NOTE: 'Lead' moves values backwards in time, while 'Lag' moves values forward.

  • With formula:

You can also use the formula:

Lag(series, length)
  • The function returns the series lagged by the number of observations specified with the variable 'length'. The 'length' variable is rounded to an integer.

Examples:

Lag(sek, -2)

This will move the series 2 observations backward (meaning 'Lead').

Lag(sek, YearLength())

This will move the series 1 year forward (meaning 'Lag').

For more about Formula and how it's working see Formula analysis.

Smoothing

Overview

The Smoothing analysis is used for minimizing the fluctuations of a time series. You can choose from several smoothing methods which are described below.

Settings

Method

None

This method is used to include a series without any calculation applied.

Moving average

The value at each point is calculated as the average of the series over a specified window length. If there are any missing values within the window, the average will use fewer values.


y t = i = 0 w - 1 x t - i i = 0 w - 1 IsValid ( x t - i )

where w is the window length and IsValid is a function that is 1 for valid values and 0 for missing values.

Moving average, cenetred

The centered moving average is calculated symmetrically around each point, except towards the ends, where it becomes increasingly asymmetric. The calculation is different depending on whether the window length is odd or even. When the window length, w, is an odd number, the calculation is similar to a lagged ordinary moving average:


h = w - 1 2

 


y t = i = 0 w - 1 x t - 1 - h i = 0 w - 1 IsValid x t - i - h

When the window length, w, is an even number, the calculation is a second order moving average. It is calculated in the same way as when w is odd, but with a weight of ½ for the outer observations of the window and h = w 2  .

Moving average, exponential

This calculates an exponential moving average. The exponential factor, α, is calculated from the specified number of observations, f, as follows:


α = 2 f + 1

The smoothed series is calculated using a recursive formula:


y 0 = x 0

   


y t = α y t - 1 + 1 - α x t

HP filter

This method uses the Hodrick-Prescott symmetric filter as described by Hodrick and Prescott (1997). The factor, λ, can be specified either directly or by using the frequency adjusted power rule described by Ravn and Uhlig (2002). When the frequency adjusted rule is used, the λ is calculated from f as follows:


λ = 1 6 0 0 p 4 f

where p is the number of observations per year for the frequency at hand.

A factor of 4 is recommended by Ravn and Uhlig. A factor of 2 will give you the original Hodrick and Prescott values.

We calculate the HP filter according to Hodrick–Prescott filter - Wikipedia, which is the same method as can be found in MATLAB and EViews.

The rule of thumb is to use 𝜆=1600 for quarterly data; 𝜆=14400 for monthly data; and 𝜆=100 for yearly data.

HP filter, one-sided

The one-sided Hodrick-Prescott filter is calculated by using only the historical data available at each point in time. Thus, the last value is the same as for the full HP filter.

The rule of thumb is to use 𝜆=1600 for quarterly data; 𝜆=14400 for monthly data; and 𝜆=100 for yearly data.

CF filter, stationary

This uses the Christiano-Fitzgerald full sample asymmetric band-pass filter as described by Christiano and Fitzgerald (1999). Use this version of the CF filter when the series is stationary and there is no drift. If there is a trend in the series, you may want to remove it using the detrend analysis to before applying this filter.

BK filter

This method uses the Baxter-King symmetric band-pass filter as described by Baxter and King (1995). The specified length determines the lead/lag length of the filter and is equal to number of observations lost at each end of the filtered series. You should specify the minimum and maximum periods of oscillation.

Length

The window and period length can be expressed as many observations or as many units of the specified time unit (Year, Quarter, Month etc.), which is then converted to a number of observations based on the frequency of the data.

Standard deviation

When selected, the application will calculate a pair of series that forms a confidence band around the mean so that the specified percentage of the values are within the band if the data is normally distributed. The corrected sample standard deviation, s, is calculated as the deviation from the smoothed line.

The empirical rule, states that for normally distributed data  ~68.3% of the values are within the a band of 1 standard deviation on each side, ~95.4% are within 2 standard deviations and ~99.7% are within 3 standard deviations.

Formally, the band is μ±f∙s, where μ is the smoothed series, c is the confidence coefficient, 


f = Φ - 1 1 0 0 - c 2 + c 1 0 0

 and Φ is the cumulative normal distribution function.

Coefficient

The confidence coefficient used for the standard deviation confidence band.

Examples

Smoothing with HP filter

The Smoothing analysis is used here to remove the cyclical component of the BIS Residential Price Index. To do so, we calculated the HP Filter of the BIS index, and included the original series as output.

Moving average

Here, we calculated 3 months moving average on the Retail Trade series, to smooth it and make it easier to read.

Questions

How do I calculate a moving average / rolling mean?

There are two main possibilities to calculate rolling mean:

  • In Smoothing analysis:

Add 'Moving average' as a calculation and set the length.

  • With formula:

You can use the formula:

Mean(series, window)

Example:

Mean(spx, Monthslength(3))

This will calculate a three month rolling mean for the S&P 500.

For more information about Formula language see Formula analysis.

Seasonal adjustment Census X-13

Overview

The Seasonal adjustment Census X-13 analysis removes seasonal patterns, such as weather fluctuations or holiday effects, from time series. It’s useful when you want to analyze any data affected by seasonality.

Method

This analysis uses the X-13-ARIMA-SEATS program from the US Census Bureau, which is the most common method used around the world. A documentation regarding the full X-13 ARIMA-SEATS Seasonal Adjustment Program can be found here.

At the beginning Census invented a method and program that they have called "X-11". Then they have added extra steps like ARIMA or Easter effect which, when active, transform method into X-12 and after a few other addition into X-13-ARIMA-SEATS. In short, X-13 we use is X-11 base with additional settings you see in the analysis.

Limitations

We offer configuration of the most common settings from X-13-ARIMA-SEATS program.

Additionally, the input series must meet the following requirements:

  • The series cannot be daily, weekly, or annual.
  • There must be no missing values in the series (you can fill in missing values by using one of the methods in the conversion settings tab of the series list analysis).
  • There must be no skipped dates in the series (you can make sure that all dates are included in the series by selecting all points from the observations drop-down menu in the series list).
  • There are also limitations in the X-13-ARIMA-SEATS program when it comes to the maximum number of observations, maximum number of years etc..

Settings

See here pre-1.28 version view

 

Method

You can select from the following methods:

  • X-11 - method from the Census Bureau program
  • SEATS - program developed by the Bank of Spain
  • Auto (X-11/SEATS) - for some series it is hard to find working settings in X-11/SEATS method. This one provides automated selection based on series' metadata. Additional columns which are controlled by this setting are disabled.

Type

Select whether your input series is a stock or flow series.

If Auto is selected, the class property of the time series is used to determine if it is a stock or a flow.

If you select Stock, the instruction type=stock will be added to the series element of the configuration passed to the X-13 ARIMA-SEATS program.

Holiday regressor

An option for selecting certain holidays as a regression variable in the X-13 Seasonal adjustment analysis. This works only for monthly data.

Chinese New Year

The Chinese New Year Holiday regressor utilizes a standard approach to account for the moving holiday effects from the Lunar holiday observed in some high-frequency time series data. While no universally agreed method exists, the literature thus far suggests an approach similar to what is being used in the application. Three regressors (in the form of dummy variables) are assigned to each of three sub-period windows of 20 days before, 7 days during, and 20 days after the holiday, capturing the differential effects on data from the lunar holiday.

Brazilian Carnival

The Brazilian Carnival holiday regressor utilizes a standard approach to account for the moving holiday effects from the holiday observed in some high-frequency time series data. While no universally agreed method exists, the literature thus far suggests an approach similar to what is being used in the application. Three regressors (in the form of dummy variables) are assigned to each of three sub-period windows of 3 days before, 6 days during, and 1 day after the Carnival, capturing the differential effects on data from the holiday.
Historical dates for the Brazilian Carnival were calculated using the dates of Easter sourced from U.S. Census Bureau using following formula:

  • Beginning = Friday before Ash Wednesday, 51 days to Easter
  • End = Ash Wednesday, 46 days to Easter

Brazilian Carnival was suspended in 1912 (following the death of the Baron of Rio Branco) and in 2021 (due to COVID-19 Pandemic). In 2022, the Carnival was held on 20-30 April.

For further insights into the theoretical and practical considerations on seasonal adjustment and moving holiday adjustments, please click here.

ARIMA

Selecting this option instructs the program to use an automatic ARIMA model to calculate short term forecasts based on the model used by TRAMO. Using the ARIMA model often improves estimation of the different time series components.

You may get an error if you try using ARIMA on a series that doesn’t meet the necessary conditions, such as having at least three years of history and only positive values. In this case, the report will include a description of the problem.

Note that ARIMA is always needed for the SEATS method, so this option will be automatically selected.

Trading day

This instructs the program to do an AIC-based test to check for a trading day effect, using Monday-Friday weeks. If there is a significant effect, this factor will be included in the ARIMA model.

The instruction aictest=td will be added to the regression element of the configuration passed to the X-13-ARIMA-SEATS program.

Easter

This instructs the program to do an AIC-based test to see if there is an effect of the Easter holiday. If there is a significant effect, this factor will be included in the ARIMA model.

The instruction aictest=easter will be added to the regression element of the configuration passed to the X-13-ARIMA-SEATS program.

Outlier

Included in X-13 ARIMA SEATS program, automatically checks for single point outliers and level shifts. The results can be found in Report.

Constant

You can add a trend constant regression variable by checking box in the constant column.

Conditional

When this option is selected, the seasonal adjustment will only be applied if the series is not already seasonally adjusted by the source or by using another seasonal adjustment analysis.

Output

Drop-down menu to output additional components.

Trend

This produces a series of the final trend-cycle, which is the long-term and medium-to-long term movements of the series.

Seasonal component

This feature is available in Macrobond 1.28 and later.

This produces a series whose values quantify variations in the level of the observed series that recur with the same direction and a similar magnitude at time intervals of length one year. (Length is measured in the calendar units of the observed series-- usually quarters or months, sometimes semesters, weeks, or other units.)

Irregular component

This feature is available in Macrobond 1.28 and later.

The irregular component is equal to the seasonally adjusted series divided by the trend-cycle component after removal of outlier effects.

 

The information about outliers and transform function can be found in Report.

Report

A Report is available as standard output for this analysis, it includes relevant statistics and information. It is generated by Census X-13 ARIMA-SEATS program and automatically added once you select the analysis. The report will contain any errors reported by the program.

To open the report, first click on the series you're interested in. In the same window, whole report will be available:

Example

Seasonal adjustment

In this example we applied Seasonal adjustment Census X-13 to Russian retail trade and added the calculated trend.

Seasonal adjustment with Chinese New year

Compare series with and without our Holiday regressor on series with applied SEATS method.

Seasonal adjustment with Brazilian Carnival

Compare series with and without our Holiday regressor on series with applied SEATS method..

Rate of change

Overview

The Rate of change analysis is used to calculate the change in value or in percentage over periods of time (COP). These changes can also be annualized. The analysis will help you visualize and compare how series are changing over time and is commonly used to calculate differences before applying regressions or correlations

Settings

Method

Change over period value

Calculates the difference in value over the time. This can also be called “momentum”.


y t = z t - z t - h

Change over period %

Calculates the percentage change over the time.


y t = 1 0 0 z t - z t - h z t - h


Annual rate value

Calculates the sum of the values over the time and then scales the result to a yearly level.


y ( t ) = c h i = 0 h 1 z ( t i )



where c is the typical number of observations for one year.

Annual rate %

Calculates the percentage change over the time and then annualizes.


y t = 100 · z t z t - h c h - 1



where c is the typical number of observations for one year.

Logarithmic change over period %

Calculates the logarithmic percentage change over the time.


y t = 100 ln z t z t - h


Length

Specify the length of the period, or horizon, over which the change will be calculated.

Mode

The mode determines how the period length is measured.

Fixed period

The period length will be a constant number of observations that is determined based on the frequency of the series. Select this option if it is important to use a constant period length and that each value of the past is used only once.

Calendar date

The period length is determined for each observation based on the calendar date so that an observation with a corresponding date in the past is selected. Select this option when it’s important to compare values of corresponding dates even if it means that some values might be used twice and some not at all. A common use case for this mode is when you have converted a series from a lower frequency. The calendar date mode uses the ISDA standard of Actual/Actual day count fraction.

In general, 'Calendar date' more is designed for daily series. You should use it when you want to compare values corresponding to the same date at a previous point in time.

Force

The application recognizes if a series is already expressed as a rate of change and will return an error if you attempt to calculate a rate of change twice. You can bypass this check by selecting the force option.

Examples

Change over period value

In this example, we calculated by how many persons the jobless claims have increased or decreased compared to a year ago. We did this by selecting change over period value for a length of 1 year.

Annual rate (value or percentage)

Here, we compared two different growth rates of the German GDP. The line displays year on year growth rates and the columns represent annualized quarterly growth rates.

Questions

What is the difference between the Rate of change analysis and selecting ‘Rate of change since’ when doing a Cross sampling analysis?

  • Rate of change analysis calculates the changes from the end of each time series while
  • 'Rate of change since' in Scalar analysis calculates it from the end of the whole calendar. Meaning that if some series do not end at the same observation date, the calculation range will differ.

You can set the 'Range Start' and 'Range End' in the Scalar analysis to make sure the calculation is done on the same range across all input series.

Example:

Why annualization in Rate of change analysis works differently than in Cross sampling analysis?

Rate of change is by default set to 'Mode: Fixed period'. There is also another mode there - 'Calendar date' - which is helpful when working with a Daily series or for when using Annual rate. Annualization is done differently when you select 'Calendar mode' since Macrobond then use the actual length of the period to do annualization. If you switch it to 'Calendar mode' you will get same value in both analyses.

Performance

Overview

The Performance analysis allows you to see how a time series has changed in relation to some point in the past. It’s different from the rate of change analysis in that performance calculates the change from a specific point in time, whereas rate of change calculates changes over time.

Settings

Change since, date and period

Specify the period in which to calculate performance.

Note that the performance will be calculated relative to the first value before the specified starting point.

Year to date, Quarter to date, Month to date, Week to date

  • Calculate the performance during a year, quarter, month, or week
  • Enter a date to select a specific period
  • If no date is specified, the last period of the series will be used

Years back, Quarter back, Months back, Weeks back

  • Calculate the performance in the last years, quarters, months, or weeks
  • Specify the number of periods back at which to start the calculation

Specific date

  • Calculate the performance starting at a specific date until the end of the series

Method

Select whether to measure performance as a percentage change, value, or logarithmic change.

Value

Calculate the change in value over the time.


y t = z t + h - z t

Percentage

Calculate the percentage change over the time.


y t = 1 0 0 z t + h - z t z t


Logarithm

Calculate the logarithmic percentage change over the time.


y ( t ) = c h i = 0 h 1 z ( t i )


Examples

Calculating year to date change

We calculated the percentage change since the beginning of the year for a group of equity indices using the performance analysis. We selected 'Year to date' under change since and 'Percentage' under method.

Performance starting from 0 (with formulas)

We calculated the percentage change since the beginning of the year with formulas and cut series at the certain date.

Performance with added 0

Add '0' before the outcome from Performance analysis.

Questions

Why does the output doesn't start with 0?

The Performance analysis calculates the change from the observation that is before the specified start date. In other words, the observation prior to the start date is the value '0'.

If you want a performance chart that starts with 0 as the first value, you can use formulas to as in this example:

File: Perf YTD% with formulas

or push '0' before the beginning of each series as in: Performance with added 0.

Why Year-to-date isn't calculated till selected date?

When you select Year to date, Quarter to date, Month to date or Week to date, the analysis will calculate the performance during a year, quarter, month, or week. The period is selected by a date, but not all parts of the date will be used to identify the period. If you select YTD, only the year of the date is relevant.

Zero coupon rate

Overview

The Zero coupon rate analysis uses the Libor Market model to construct zero coupon rates at different maturities. Such rates can typically not be observed directly on the market except for short deposit rates. Longer rates are then constructed based on the shorter rates by using FRAs and swaps since these are typically readily available. The basic idea is that you start with a short rate and then use a chain of FRAs to construct the medium rates and finally a chain of swap for the long rates.

Here's documentation on the methodology used: Swap based zero rates

Settings

Output series

Define the maturity of the rate you would like to calculate by providing a value in the maturity text box. To add another rate, simply click the add series button. For each series you may add a description.

Epochs

Here you define which series to use for calculating each segment of the swap curve. The series added in the series list will initially be located at the bottom of the window as unclassified series. To place the series in the desired category, simply highlight the series and select the type. You also need to set a start time for the epoch.

Prefer before previous

When selected it means that series from that category will be used when there is an overlap of maturity lengths.

Deposit rates

Here, interbank deposit rates should be located. You have the option of using automatic or custom maturity settings. The custom settings allow you to choose maturity, define an offset and specify the rate type.

FRA-IMM rates

For the middle part, FRAs needs to be added. By choosing the custom maturity setting, you can modify the position and the offset.

Swap rates

Finally, for the long end of the curve, Swap rates should be added. The custom settings allow you to choose maturity and define an offset.

Example

Zero coupon rates: Sweden

In our example, we created zero coupon yield using series with different maturities for Sweden. Then we combined the results with the Yield Curve analysis

Histogram

Overview

With the Histogram analysis, you can view the distribution of values of one or more series. It’s typically presented as a report containing the main distribution statistics and as a category chart displaying the distribution among bins. Use it if you want to know whether values are within normal limits or if they’re skewed in one direction.

Theory

Automatic bin size

The automatic bin size, h, is calculated as follows:

h = 1.06 σ n 5

This can be shown to minimize the total estimation error in some situations. It’s often called the 'normal distribution approximation' or 'Silverman's rule of thumb'.

Density methods

The density function measures the proportion of items that fall into each bin. One series contains the estimated density and the other the midpoints of the bins. We offer two ways of measuring this.

Uniform kernel

If y is the midpoint of a bin, then the value of that bin is the number of values in the bin where

y - h 2 < x y + h 2

If relative output is selected, then the result is divided by n and by the bin size.

Normal kernel

If y is the midpoint of a bin, then the value of that bin is

f ^ y , h = 1 h 2 π i = 0 n e - y - x i h 2 2

If relative output is selected, then the result is divided by n.

Normal density function

The normal density for the midpoint is calculated for each bin as follows:

f x , μ , σ = 1 2 π σ 2 e - x - μ 2 2 σ 2

where µ is the estimated or specified mean value of the series.

Cumulative distribution methods

The estimated cumulative distribution function is calculated in one of two ways.

Uniform

If y is the midpoint of a bin, then the value of that bin is the number of values where

x y + h 2

If relative output is selected, the result is divided by n.

The other series contains the end points of the bins.

Empirical

The empirical distribution is calculated not by dividing the range into bins, but by using the actual points of observations. The first series simply contains the number of the observation, starting at 1. The second series contains the values in ascending order. If relative output is selected, each value is divided by n.

Settings

Density vs. Cumulative distribution

This setting determines if the density or the cumulative distribution should be estimated.

  • Density - measures the items that fall into each individual bin.
  • Cumulative distribution - measures the items that fall into each bin plus all previous bins.

Relative vs. Count

This setting determines the unit of the histogram.

  • Relative - the unit is the proportion of items per bin.
  • Count - the unit is the absolute number of items per bin.

Method

Methods available when density is selected

  • Uniform kernel - measures the number of items that falls exactly into each bin.
  • Normal kernel - uses a smooth windowing function based on the standard normal density function when calculating the value for each bin.

Methods available when cumulative distribution is selected

  • Uniform - measures the number of items until the end of each bin.
  • Empirical - is calculated not by dividing the range into bins, but by using the actual points of observations.

Start & End

Here, you can specify a data sample range. If left blank, the whole series is used.

Auto bin

If checked, the application will automatically calculate a bin size as described in the theory section below.

The labels this analysis shows are for the midpoints of the bins.

Width 

Here, you can specify the bin size if the automatic bin size is turned off.

Normal

This contains options for outputting a normal distribution series.

  • None - does not calculate a normal distribution series.
  • Automatic - calculates and outputs a normal distribution series with the same mean and standard deviation as the empirical data.
  • Manual - calculates and outputs a normal distribution series with the specified mean and standard deviation.

The normal distribution is calculated by using the same function we use for NormDist(value, mean, stddev) in the Macrobond formula language.

If relative output is not selected, then the result is multiplied by n, the number of elements in the series.

Report

The Histogram analysis automatically generates a report. This report contains statistical measurements and information.

Example

Relative density

In this example, we looked at the distribution of the S&P 500 daily performances. We used the uniform kernel method and set the bin size to 0.5.

Questions

Can I plot two histograms on one chart?

It's not possible to properly plot them on a Category chart. The histogram analysis creates two series for each distribution: one with the value and one with the bin. You can plot these in a Category scatter chart - histogram will be available only as a Line chart.

How to display the relative density as percentage of the total number of observations?

You need to multiply the height (i.e., the density) by the bin width. You can achieve that by using the Arithmetic analysis.