Introduction

This document refers to Macrobond version 1.11 and later.


With the Histogram analysis you can view the distribution of the values of one or more series.

Essentially, the range of values, from the highest to the lowest, is split into a number of equally sized intervals called bins. We then calculate the density or distribution of the values in each bin.

The density function measures the proportions of the items that falls into each bin. In the chart below we have also enabled the option to add a graph for the normal distribution with the same mean and std. dev. as the empirical data.


Histogram - 1

The cumulative distribution function measures the number of items that falls into each bin and all previous bins.

Histogram - 2

The analysis produces two series for each processed series: one with the bin values and one with the calculated measurement. These two series are typically plotted in a scatter chart to form a histogram.

Settings

Density/Cumulative distribution

This setting determines if the density or the cumulative distribution should be estimated.

Relative/Count

This setting determines if the unit of the histogram should be the absolute or relative number of the values.

Method

None

Do not perform any calculations on the series.

Uniform kernel

Measure the number of items that falls exactly into each bin. This option is available when Density is selected.

Normal kernel

Use a smooth windowing function based on the standard normal density function when calculating the value for each bin. This option is available when Density is selected.

Uniform

Measure the number of items until the end of each bin. This option is available when Cumulative distribution is selected.

Empirical

Measure the number of items until the end of each bin. This option is available when Cumulative distribution is selected.

Start/End

Defines the data sample range. If not specified, the whole time series is used.

Auto bin

If checked, the application will automatically calculate an appropriate bin size as described in the Theory section below.

Width

The bin size, which can be set if the automatic bin size is turned off.

Normal

None

Do not calculate output series for a normal distribution.

Automatic

Calculate output series for a normal distribution with the same mean and std. dev. as the empirical data.

Manual

Calculate output series for a normal distribution with the specified mean and std. dev.

Report

The report contains a number of statistical measurements.

Mean, variance, skewness and excess kurtosis.

Percentiles 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%

Tail expectations. E.g. extreme percentiles 1% and 99% and the averages below and above these levels.

Theory

σThe estimated standard deviation of a series
nThe number of valid elements of a series

Automatic bin size

The automatic bin size is calculated as:

h = 1.06 σ n 5

which can be shown to minimize the total estimation error in certain situations. This is often called the normal distribution approximation or Silverman's rule of thumb.

Estimation of the density

The density function measures the proportions of the items that falls into each bin.

One series contains the estimated density and the other the midpoints of the bins.

We offer two ways of measuring this.

Uniform kernel

If y is the midpoint of a bin then the value of that bin is the number of values in the bin where

y - h 2 < x y + h 2

The count of the bin . If relative output is selected, then the result is divided by n and by the bin size.

Normal kernel

If y is the midpoint of a bin then the value of that bin is

f ^ y , h = 1 h 2 π i = 0 n e - y - x i h 2 2

If relative output is selected, then the result is divided by n.

Normal density function

The normal density for the midpoint is calculated for each bin like this:

f x , μ , σ = 1 2 π σ 2 e - x - μ 2 2 σ 2

where µ is the estimated or specified mean value of the series.

Estimation of the cumulative distribution

The estimated cumulative distribution function is calculated in one of two ways.

At bin ends

If y is the midpoint of a bin, then the value of that bin is the number of values where

x y + h 2

If relative output is selected, the result is divided by n.

The other series contains the end points of the bins.

Empirical

The empirical distribution is calculated not by dividing the range into bins, but by using the actual points of observations.

The first series simply contains the number of the observation, starting at 1. If relative output is selected, each value is divided by n.

The second series contains the values in ascending order.

Normal distribution function

The normal distribution is calculated by using the same function we use for NormDist(value, mean, stddev) in the Macrobond formula language.

If relative output is not selected, then the result is multiplied by n.

Sample documents