Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for an old version of Boost. Click here to view this page for the latest version.
PrevUpHomeNext

User guide

Boost.Histogram is designed to make simple things simple, yet complex things possible. Correspondingly, this guides covers the basic usage first, and the advanced usage in later sections. For an alternative quick start guide, have a look at the Getting started section.

A histogram consists of a storage and a sequence of axis objects. The storage represents a grid of cells of counters. The axis objects maps input values to indices, which are used to look up the cell. You don't normally have to worry about the storage, since the library provides a very good default. There are many interesting axis types to choose from, but for now let us stick to the most common axis, the regular axis. It represents equidistant intervals on the real line.

Use the convenient factory function make_histogram to make the histograms. In the following example, a histogram with a single axis is created.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // create a 1d-histogram in default configuration which
  // covers the real line from -1 to 1 in 100 bins
  auto h = make_histogram(axis::regular<>(100, -1.0, 1.0));

  // rank is the number of axes
  assert(h.rank() == 1);
}

An axis object defines how input values are mapped to bins, which means that it defines the number of bins along that axis and a mapping function from input values to bins. If you provide one axis, the histogram is one-dimensional. If you provide two, it is two-dimensional, and so on.

In the example above, the compiler knows the number of axes and their type at compile-time, the information can be deduced from the arguments to make_histogram. This gives the best performance, but sometimes you only know the axis configuration at run-time, because it depends on information that's only available at run-time. So you can also create axes at run-time and pass them to an overload of the make_histogram function. Here is an example.

#include <boost/histogram.hpp>
#include <cassert>
#include <sstream>
#include <vector>

const char* config = "4 1.0 2.0\n"
                     "5 3.0 4.0\n";

int main() {
  using namespace boost::histogram;

  // read axis config from a config file (mocked here with std::istringstream)
  // and create vector of regular axes, the number of axis is not known at compile-time
  std::istringstream is(config);
  auto v1 = std::vector<axis::regular<>>();
  while (is.good()) {
    unsigned bins;
    double start, stop;
    is >> bins >> start >> stop;
    v1.emplace_back(bins, start, stop);
  }

  // create histogram from iterator range
  // (copying or moving the vector also works, move is shown below)
  auto h1 = make_histogram(v1.begin(), v1.end());
  assert(h1.rank() == v1.size());

  // with a vector of axis::variant (polymorphic axis type that can hold any one of the
  // template arguments at a time) the types and number of axis can vary at run-time
  auto v2 = std::vector<axis::variant<axis::regular<>, axis::integer<>>>();
  v2.emplace_back(axis::regular<>(100, -1.0, 1.0));
  v2.emplace_back(axis::integer<>(1, 7));

  // create dynamic histogram by moving the vector
  auto h2 = make_histogram(std::move(v2));
  assert(h2.rank() == 2);
}
[Note] Note

When the axis types are known at compile-time, the histogram internally holds them in a std::tuple, which is very efficient and avoids a memory allocation from the heap. If they are only known at run-time, it stores the axes in a std::vector. The interface hides this difference, but not perfectly. There are a corner cases where the difference becomes apparent. The rationale has more details on this point.

The factory function named make_histogram uses the default storage type, which provides safe counting, is fast, and memory efficient. If you want to create a histogram with another storage type, use make_histogram_with. To learn more about other storage types and how to create your own, have a look at the section Advanced Usage.

The library provides a number of useful axis types. Here is some advice when to use which.

boost::histogram::axis::regular

Axis over an interval on the real line with bins of equal width. Value-to-index conversion is O(1) and very fast. The axis does not allocate memory dynamically. The axis is very flexible thanks to transforms (see below). Due to finite precision of floating point calculations, bin edges may not be exactly at expected values. If you need bin edges at exactly defined floating point values, use the next axis.

boost::histogram::axis::variable

Axis over an interval on the real line with bins of variable width. Value-to-index conversion is O(log(N)). The axis allocates memory dynamically to store the bin edges. Use this if the regular axis with transforms cannot represent the binning you want. If you need bin edges at exactly defined floating point values, use this axis.

boost::histogram::axis::integer

Axis over an integer sequence [i, i+1, i+2, ...]. Can also handle real input values, then it represents bins with a fixed bin width of 1. Value-to-index conversion is O(1) and faster than for the regular axis. Does not allocate memory dynamically. Use this when your input consists of a sequence of integers.

boost::histogram::axis::category

Axis over a set of unique values of an arbitrary equal-comparable type. Value-to-index conversion is O(N), but faster than variable axis for N < 10, the typical use case. The axis allocates memory dynamically to store the values.

Check the class descriptions for more information about each axis type. You can write your own axis types, too, see Advanced usage.

Here is an example which shows the basic use case for each axis type.

#include <boost/histogram/axis.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a regular axis with 10 bins over interval from 1.5 to 2.5
  auto r = axis::regular<>{10, 1.5, 2.5};
  // `<>` is needed in C++14 because the axis is templated,
  // in C++17 you can do: auto r = axis::regular{10, 1.5, 2.5};
  assert(r.size() == 10);
  // alternatively, you can define the step size with the `step` marker
  auto r_step = axis::regular<>{axis::step(0.1), 1.5, 2.5};
  assert(r_step == r);

  // histogram uses the `index` method to convert values to indices
  // note: intervals of builtin axis types are always semi-open [a, b)
  assert(r.index(1.5) == 0);
  assert(r.index(1.6) == 1);
  assert(r.index(2.4) == 9);
  // index for a value below the start of the axis is always -1
  assert(r.index(1.0) == -1);
  assert(r.index(-std::numeric_limits<double>::infinity()) == -1);
  // index for a value below the above the end of the axis is always `size()`
  assert(r.index(3.0) == 10);
  assert(r.index(std::numeric_limits<double>::infinity()) == 10);
  // index for not-a-number is also `size()` by convention
  assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 10);

  // make a variable axis with 3 bins [-1.5, 0.1), [0.1, 0.3), [0.3, 10)
  auto v = axis::variable<>{-1.5, 0.1, 0.3, 10.};
  assert(v.index(-2.0) == -1);
  assert(v.index(-1.5) == 0);
  assert(v.index(0.1) == 1);
  assert(v.index(0.3) == 2);
  assert(v.index(10) == 3);
  assert(v.index(20) == 3);

  // make an integer axis with 3 bins at -1, 0, 1
  auto i = axis::integer<>{-1, 2};
  assert(i.index(-2) == -1);
  assert(i.index(-1) == 0);
  assert(i.index(0) == 1);
  assert(i.index(1) == 2);
  assert(i.index(2) == 3);

  // make an integer axis called "foo"
  auto i_with_label = axis::integer<>{-1, 2, "foo"};
  // all builtin axis types allow you to pass some optional metadata as the last
  // argument in the constructor; a string by default, but can be any copyable type

  // two axis do not compare equal if they differ in their metadata
  assert(i != i_with_label);

  // integer axis also work well with unscoped enums
  enum { red, blue };
  auto i_for_enum = axis::integer<>{red, blue + 1};
  assert(i_for_enum.index(red) == 0);
  assert(i_for_enum.index(blue) == 1);

  // make a category axis from a scoped enum and/or if the identifiers are not consecutive
  enum class Bar { red = 12, blue = 6 };
  auto c = axis::category<Bar>{Bar::red, Bar::blue};
  assert(c.index(Bar::red) == 0);
  assert(c.index(Bar::blue) == 1);
  // c.index(12) is a compile-time error, since the argument must be of type `Bar`

  // category axis can be created for any copyable and equal-comparable type
  auto c_str = axis::category<std::string>{"red", "blue"};
  assert(c_str.index("red") == 0);
  assert(c_str.index("blue") == 1);
}
[Note] Note

All builtin axes over the real-line use semi-open bin intervals by convention. As a mnemonic, think of iterator ranges from begin to end, where end is also not included.

As mentioned in the previous example, you can assign an optional label to any axis to keep track of what the axis is about. Assume you have census data and you want to investigate how yearly income correlates with age, you could do:

#include <boost/histogram.hpp>

int main() {
  using namespace boost::histogram;

  // create a 2d-histogram with an "age" and an "income" axis
  auto h = make_histogram(axis::regular<>(20, 0.0, 100.0, "age in years"),
                          axis::regular<>(20, 0.0, 100.0, "yearly income in 1000 EUR"));

  // do something with h
}

Without the metadata it would be difficult to see which axis was covering which quantity. Metadata is the only axis property that can be modified after construction by the user. Axis objects with different metadata do not compare equal.

All builtin axis types have template arguments for customization. All arguments have reasonable defaults so you can use empty brackets. If your compiler supports C++17, you can drop the brackets altogether. Suitable arguments are then deduced from the constructor call. The template arguments are in order:

Value

The value type is the argument type of the index() method. An argument passed to the axis must be implicitly convertible to this type.

Transform (only regular axis)

A class that implements a monotonic transform between the data space and the space in which the bins are equi-distant. Users can define their own transforms and use them with the axis.

Metadata

The axis uses an instance this type to store metadata. It is a std::string by default, but it can by any copyable type. If you want to save a small amount of stack memory per axis, you pass the empty boost::histogram::axis::null_type here.

Options

Compile-time options for the axis. This is used to enable/disable under- and overflow bins, to make an axis circular, or to enable dynamic growth of the axis beyond the initial range.

Allocator (only variable and category axes)

Allocator that is used to request memory dynamically to store values. If you don't know what an allocator is you can safely ignore this argument.

You can use the boost::histogram::use_default tag type for any of these options, except for the Value and Allocator, to use the library default.

Transforms are a way to customize a regular axis. The default is the identity transform which forwards the value. Transforms allow you to chose the faster stack-allocated regular axis over the generic variable axis in some cases.

A common need is a regular binning in the logarithm of the input value. This can be achieved with a log transform. The follow example shows the builtin transforms.

#include <boost/histogram/axis/regular.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a regular axis with a log transform over [10, 100), [100, 1000), [1000, 10000)
  axis::regular<double, axis::transform::log> r_log{3, 10., 10000.};
  // log transform:
  // - useful when values vary dramatically in magnitude, like brightness of stars
  // - edges are not exactly at 10, 100, 1000, because of finite floating point precision
  // - value >= 0 below is mapped to -1
  // - value < 0 below is mapped to `size()`, because the result of std::log is NaN
  assert(r_log.index(10.1) == 0);
  assert(r_log.index(100.1) == 1);
  assert(r_log.index(1000.1) == 2);
  assert(r_log.index(1) == -1);
  assert(r_log.index(0) == -1);
  assert(r_log.index(-1) == 3);

  // make a regular axis with a sqrt transform over [4, 9), [9, 16), [16, 25)
  axis::regular<double, axis::transform::sqrt> r_sqrt{3, 4., 25.};
  // sqrt transform:
  // - bin widths are more mildly increasing compared to log transform
  // - starting axis at value == 0 is ok, sqrt(0) == 0 unlike log transform
  // - value < 0 is mapped to `size()`, because the result of std::sqrt is NaN
  assert(r_sqrt.index(0) == -1);
  assert(r_sqrt.index(4.1) == 0);
  assert(r_sqrt.index(9.1) == 1);
  assert(r_sqrt.index(16.1) == 2);
  assert(r_sqrt.index(25.1) == 3);
  assert(r_sqrt.index(-1) == 3);

  // make a regular axis with a power transform x^1/3 over [1, 8), [8, 27), [27, 64)
  using pow_trans = axis::transform::pow;
  axis::regular<double, pow_trans> r_pow(pow_trans{1. / 3.}, 3, 1., 64.);
  // pow transform:
  // - generalization of the sqrt transform, power index is first argument of ctor
  // - starting the axis at value == 0 is ok, 0^p == 0 for p != 0
  // - value < 0 is mapped to `size()`, because the result of std::pow is NaN
  assert(r_pow.index(0) == -1);
  assert(r_pow.index(1.1) == 0);
  assert(r_pow.index(8.1) == 1);
  assert(r_pow.index(27.1) == 2);
  assert(r_pow.index(64.1) == 3);
  assert(r_pow.index(-1) == 3);
}

As shown in the example, due to the finite precision of floating point calculations, the bin edges of a transformed regular axis may not be exactly at the expected values. If you need exact correspondence, use a variable axis.

Users may write their own transforms and use them with the builtin regular axis, by implementing a type that matches the Transform concept.

Options can be used to configure each axis type. The option flags are implemented as tag types with the suffix _t. Each tag type has a corresponding value without the suffix. The values behave set semantics. You can compute the union with operator| and the intersection with operator&. When you pass a single option flag to an axis as a template parameter, use the tag type. When you need to pass the union of several options to an axis as a template parameter, surround the union of option values with a decltype. Both ways of passing options are shown in the following examples.

Under- and overflow bins

By default, under- and overflow bins are added automatically for each axis (except if adding them makes no sense). If you create an axis with 20 bins, the histogram will actually have 22 bins along that axis. The extra bins are very useful, as explained in the rationale. If the input cannot exceed the axis range, you can disable the extra bins to save memory. Example:

#include <boost/histogram.hpp>
#include <string>

int main() {
  using namespace boost::histogram;

  // create a 1d-histogram over integer values from 1 to 6
  auto h1 = make_histogram(axis::integer<int>(1, 7));
  // axis has size 6...
  assert(h1.axis().size() == 6);
  // ... but histogram has size 8, because of overflow and underflow bins
  assert(h1.size() == 8);

  // create a 1d-histogram for throws of a six-sided die without extra bins,
  // since the values cannot be smaller than 1 or larger than 6
  auto h2 = make_histogram(axis::integer<int, use_default, axis::option::none_t>(1, 7));
  // now size of axis and histogram is equal
  assert(h2.axis().size() == 6);
  assert(h2.size() == 6);
}

The category axis comes only with an overflow bin, which counts all input values that are not part of the initial set.

Circular axes

Each builtin axis except the category axis can be made circular. This means that the axis is periodic at its ends, like a polar angle that wraps around after 360 degrees. This is particularly useful if you want make a histogram over a polar angle. Example:

#include <boost/histogram/axis.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a circular regular axis ... [0, 180), [180, 360), [0, 180) ....
  using opts = decltype(axis::option::overflow | axis::option::circular);
  auto r = axis::regular<double, use_default, use_default, opts>{2, 0., 360.};
  assert(r.index(-180) == 1);
  assert(r.index(0) == 0);
  assert(r.index(180) == 1);
  assert(r.index(360) == 0);
  assert(r.index(540) == 1);
  assert(r.index(720) == 0);
  // special values are mapped to the overflow bin index
  assert(r.index(std::numeric_limits<double>::infinity()) == 2);
  assert(r.index(-std::numeric_limits<double>::infinity()) == 2);
  assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 2);

  // since the regular axis is the most common circular axis, there exists an alias
  auto c = axis::circular<>{2, 0., 360.};
  assert(r == c);

  // make a circular integer axis
  auto i = axis::integer<int, use_default, axis::option::circular_t>{1, 4};
  assert(i.index(0) == 2);
  assert(i.index(1) == 0);
  assert(i.index(2) == 1);
  assert(i.index(3) == 2);
  assert(i.index(4) == 0);
  assert(i.index(5) == 1);
}

A circular axis cannot have an underflow bin, passing both options together generates a compile-time error. Since the highest bin wraps around to the lowest bin, there is no possibility for overflow either. However, an overflow bin is still added by default if the value is a floating point type, to catch NaNs.

Growing axes

Each builtin axis has an option to make it grow beyond its initial range when a value outside of that range is discovered, instead of counting this value in the under- or overflow bins (or to discard it). This can be incredibly convenient. Example:

#include <boost/histogram.hpp>
#include <cassert>

#include <iostream>

int main() {
  using namespace boost::histogram;

  // make a growing regular axis
  // - it grows new bins with its constant bin width until the value is covered
  auto h1 = make_histogram(axis::regular<double,
                                         use_default,
                                         use_default,
                                         axis::option::growth_t>{2, 0., 1.});
  // nothing special happens here
  h1(0.1);
  h1(0.9);
  // state: [0, 0.5): 1, [0.5, 1.0): 1
  assert(h1.axis().size() == 2);
  assert(h1.axis().bin(0).lower() == 0.0);
  assert(h1.axis().bin(1).upper() == 1.0);

  // value below range: axis grows new bins until value is in range
  h1(-0.3);
  // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1
  assert(h1.axis().size() == 3);
  assert(h1.axis().bin(0).lower() == -0.5);
  assert(h1.axis().bin(2).upper() == 1.0);

  h1(1.9);
  // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1, [1.0, 1.5): 0 [1.5, 2.0): 1
  assert(h1.axis().size() == 5);
  assert(h1.axis().bin(0).lower() == -0.5);
  assert(h1.axis().bin(4).upper() == 2.0);

  // make a growing category axis (here strings)
  // - empty axis is allowed: very useful if categories are not known at the beginning
  auto h2 = make_histogram(axis::category<std::string,
                                          use_default,
                                          axis::option::growth_t>());
  assert(h2.size() == 0); // histogram is empty
  h2("foo"); // new bin foo, index 0
  assert(h2.size() == 1);
  h2("bar"); // new bin bar, index 1
  assert(h2.size() == 2);
  h2("foo");
  assert(h2.size() == 2);
  assert(h2[0] == 2);
  assert(h2[1] == 1);
}

So this feature is very convenient, but keep two caveats in mind.

  • Growing axes come with a run-time cost, since the histogram has to reallocate memory for all cells when any axis size changes. Whether this performance hit is noticeable depends on your application. This is a minor issue, the next is more severe.
  • If you have unexpected outliers in your data which are far away from the normal range, the axis could grow to a huge size and the corresponding huge memory request could bring your computer to its knees. This is the reason why growing axes are not the default.

A growing axis can have under- and overflow bins, but these only count the special floating point values +-infinity and NaN.

A histogram has been created and now you want to insert values. This is done with the flexible histogram::operator(), which you typically use in a loop. It accepts N arguments or a std::tuple with N elements, where N is equal to the number of axes of the histogram. It finds the corresponding bin for the input and increments the bin counter by one.

After the histogram has been filled, use histogram::at (in analogy to std::vector::at) to access the cell values. It accepts integer indices, one for each axis of the histogram.

#include <boost/histogram.hpp>
#include <cassert>
#include <functional>
#include <numeric>
#include <utility>
#include <vector>

int main() {
  using namespace boost::histogram;

  auto h = make_histogram(axis::integer<>(0, 3), axis::regular<>(2, 0.0, 1.0));

  // fill histogram, number of arguments must be equal to number of axes,
  // types must be convertible to axis value type (here integer and double)
  h(0, 0.2); // increase a cell value by one
  h(2, 0.5); // increase another cell value by one

  // fills from a tuple are also supported; passing a tuple of wrong size
  // causes an error at compile-time or an assertion at runtime in debug mode
  auto xy = std::make_tuple(1, 0.3);
  h(xy);

  // functional-style processing is also supported, generate some data...
  std::vector<std::tuple<int, double>> input_data;
  input_data.emplace_back(0, 0.8);
  input_data.emplace_back(2, 0.4);
  input_data.emplace_back(5, 0.7); // goes into overflow bin of first axis

  // std::for_each takes the functor by value, a reference wrapper avoid copies
  std::for_each(input_data.begin(), input_data.end(), std::ref(h));

  // once histogram is filled, access cells using operator[] or at(...)
  // - operator[] can only accept a single argument in the current version of C++,
  //   it is convenient when you have a 1D histogram
  // - at(...) can accept several values, so use this by default
  // - underflow bins are at index -1, overflow bins at index `size()`
  // - passing an invalid index triggers a std::out_of_range exception
  assert(h.at(0, 0) == 1);
  assert(h.at(0, 1) == 1);
  assert(h.at(1, 0) == 1);
  assert(h.at(1, 1) == 0);
  assert(h.at(2, 0) == 1);
  assert(h.at(2, 1) == 1);
  assert(h.at(-1, -1) == 0); // underflow for axis 0 and 1
  assert(h.at(-1, 0) == 0);  // underflow for axis 0, normal bin for axis 1
  assert(h.at(-1, 2) == 0);  // underflow for axis 0, overflow for axis 1
  assert(h.at(3, 1) == 1);   // overflow for axis 0, normal bin for axis 1

  // iteration over values works, but see next example for a better way
  // - iteration using begin() and end() includes under- and overflow bins
  // - iteration order is an implementation detail and may change in future versions
  const double sum = std::accumulate(h.begin(), h.end(), 0.0);
  assert(sum == 6);
}

For a histogram hist, the calls hist(weight(w), ...) and hist(..., weight(w)) increment the bin counter by the value w instead, where w may be an integer or floating point number. The helper function weight() marks this argument as a weight, so that it can be distinguished from the other inputs. It can be the first or last argument. You can freely mix calls with and without a weight. Calls without weight act like the weight is 1. Why weighted increments are sometimes useful is explained in the rationale.

[Note] Note

The default storage loses its no-overflow-guarantee when you pass floating point weights, but maintains it for integer weights.

When the weights come from a stochastic process, it is useful to keep track of the variance of the sum of weights per cell. A specialized histogram can be generated with the make_weighted_histogram factory function which does that.

#include <boost/histogram.hpp>

int main() {
  using namespace boost::histogram;

  // Create a histogram with weight counters that keep track of a variance estimate.
  auto h = make_weighted_histogram(axis::regular<>(3, 0.0, 1.0));

  h(0.0, weight(1)); // weight 1 goes to first bin
  h(0.1, weight(2)); // weight 2 goes to first bin
  h(0.4, weight(3)); // weight 3 goes to second bin
  h(0.5, weight(4)); // weight 4 goes to second bin

  // Weight counters have methods to access the value (sum of weights) and the variance
  // (sum of weights squared, why this gives the variance is explained in the rationale)
  assert(h[0].value() == 1 + 2);
  assert(h[0].variance() == 1 * 1 + 2 * 2);
  assert(h[1].value() == 3 + 4);
  assert(h[1].variance() == 3 * 3 + 4 * 4);
}

When iterating over all cells, using histogram::at can be inconvenient. The indexed range generator is provided for this case, which is very convenient and faster than naive for-loops.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <numeric> // for std::accumulate
#include <sstream>

using namespace boost::histogram;

int main() {
  // make histogram with 2 x 2 = 4 bins (not counting under-/overflow bins)
  auto h = make_histogram(axis::regular<>(2, -1.0, 1.0), axis::regular<>(2, 2.0, 4.0));

  h(weight(1), -0.5, 2.5); // bin index 0, 0
  h(weight(2), -0.5, 3.5); // bin index 0, 1
  h(weight(3), 0.5, 2.5);  // bin index 1, 0
  h(weight(4), 0.5, 3.5);  // bin index 1, 1

  // use the `indexed` range adaptor to iterate over all bins;
  // it is not only more convenient but also faster than a hand-crafted loop!
  std::ostringstream os;
  for (auto x : indexed(h)) {
    // x is a special accessor object
    const auto i = x.index(0); // current index along first axis
    const auto j = x.index(1); // current index along second axis
    const auto b0 = x.bin(0);  // current bin interval along first axis
    const auto b1 = x.bin(1);  // current bin interval along second axis
    const auto v = *x;         // "dereference" to get the bin value
    os << boost::format("%i %i [%2i, %i) [%2i, %i): %i\n") % i % j % b0.lower() %
              b0.upper() % b1.lower() % b1.upper() % v;
  }

  std::cout << os.str() << std::flush;

  assert(os.str() == "0 0 [-1, 0) [ 2, 3): 1\n"
                     "1 0 [ 0, 1) [ 2, 3): 3\n"
                     "0 1 [-1, 0) [ 3, 4): 2\n"
                     "1 1 [ 0, 1) [ 3, 4): 4\n");

  // `indexed` skips underflow and overflow bins by default, but can be called
  // with the second argument `coverage::all` to walk over all bins
  std::ostringstream os2;
  for (auto x : indexed(h, coverage::all)) {
    os2 << boost::format("%2i %2i: %i\n") % x.index(0) % x.index(1) % *x;
  }

  std::cout << os2.str() << std::flush;

  assert(os2.str() == "-1 -1: 0\n"
                      " 0 -1: 0\n"
                      " 1 -1: 0\n"
                      " 2 -1: 0\n"

                      "-1  0: 0\n"
                      " 0  0: 1\n"
                      " 1  0: 3\n"
                      " 2  0: 0\n"

                      "-1  1: 0\n"
                      " 0  1: 2\n"
                      " 1  1: 4\n"
                      " 2  1: 0\n"

                      "-1  2: 0\n"
                      " 0  2: 0\n"
                      " 1  2: 0\n"
                      " 2  2: 0\n");
}

Histograms from this library can do more than counting, they can hold arbitrary accumulators which accept samples. A histogram is called a profile, if it computes the means of the samples in each cell. Profiles can be generated with the factory function make_profile.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>
#include <utility>

int main() {
  using namespace boost::histogram;

  // make a profile, it computes the mean of the samples in each histogram cell
  auto h = make_profile(axis::regular<>(3, 0.0, 1.0));

  // mean is computed from the values marked with the sample() helper function
  h(0.10, sample(2.5)); // 2.5 goes to bin 0
  h(0.25, sample(3.5)); // 3.5 goes to bin 0
  h(0.45, sample(1.2)); // 1.2 goes to bin 1
  h(sample(3.4), 0.51); // 3.4 goes to bin 1, sample be at the front

  // fills from tuples are also supported, 1.3 and 1.9 go to bin 2
  auto xs1 = std::make_tuple(0.81, sample(1.3));
  auto xs2 = std::make_tuple(0.86, sample(1.9));
  h(xs1);
  h(xs2);

  // builtin accumulators have methods to access their state
  std::ostringstream os;
  for (auto x : indexed(h)) {
    // use `.` to access methods of accessor, like `index()`
    // use `->` to access methods of accumulator
    const auto i = x.index();
    const auto n = x->count();     // how many samples are in this bin
    const auto vl = x->value();    // mean value
    const auto vr = x->variance(); // estimated variance of the mean value
    os << boost::format("bin %i count %i value %.1f variance %.1f\n") % i % n % vl % vr;
  }

  std::cout << os.str() << std::flush;

  assert(os.str() == "bin 0 count 2 value 3.0 variance 0.5\n"
                     "bin 1 count 2 value 2.3 variance 2.4\n"
                     "bin 2 count 2 value 1.6 variance 0.2\n");
}

Just like weight(), sample() is a marker function. It must be the first or last argument.

Weights and samples may be combined, if the histogram accumulators can handle weights. When both weight() and sample() appear in histogram::operator(), they can be in any order with respect to other, but they must be the first and/or last arguments. To make a profile which can compute weighted means use the make_weighted_profile factory function.

The following operators are supported for pairs of histograms +, -, *, /, ==, !=. Histograms can also be multiplied and divided by a scalar. Only a subset of the arithmetic operators is available when the underlying cell value type only supports a subset.

The arithmetic operators can only be used when the histograms have the same axis configuration. This checked at compile-time if possible, or at run-time. An exception is thrown if the configurations do not match. Two histograms have the same axis configuration, if all axes compare equal, which includes a comparison of their metadata. Two histograms compare equal, when their axis configurations and all their cell values compare equal.

[Note] Note

If the metadata type has operator== defined, it is used in the axis configuration comparison. Metadata types without operator== are considered equal, if they are the same type.

#include <boost/histogram.hpp>
#include <cassert>
#include <vector>

int main() {
  using namespace boost::histogram;

  // make two histograms
  auto h1 = make_histogram(axis::regular<>(2, -1.0, 1.0));
  auto h2 = make_histogram(axis::regular<>(2, -1.0, 1.0));

  h1(-0.5); // counts are: 1 0
  h2(0.5);  // counts are: 0 1

  // add them
  auto h3 = h1;
  h3 += h2; // counts are: 1 1

  // adding multiple histograms at once is likely to be optimized by the compiler so that
  // superfluous temporaries avoided, but no guarantees are given; use this equivalent
  // code when you want to make sure: h4 = h1; h4 += h2; h4 += h3;
  auto h4 = h1 + h2 + h3; // counts are: 2 2

  assert(h4.at(0) == 2 && h4.at(1) == 2);

  // multiply by number; h4 *= 2 is not allowed, because the result of a multiplication is
  // not a histogram anymore, it has the type boost::histogram::grid<...>)
  auto g4 = h4 * 2; // counts are: 4 4

  // divide by number; g4 /= 4 also works
  auto g5 = g4 / 4; // counts are: 1 1

  assert(g5.at(0) == 1 && g5.at(1) == 1);
  assert(g4 != g5 && g4 == 4 * g5);

  // note the special effect of multiplication on weight_storage
  auto h = make_histogram_with(weight_storage(), axis::regular<>(2, -1.0, 1.0));
  h(-0.5);

  // counts are: 1 0
  assert(h.at(0).value() == 1 && h.at(1).value() == 0);

  auto h_sum = h + h;
  auto g_mul = 2 * h;

  // values are the same as expected...
  assert(h_sum.at(0).value() == g_mul.at(0).value());
  // ... but variances differ
  assert(h_sum.at(0).variance() == 2 && g_mul.at(0).variance() == 4);

  // equality operator checks variances, so histograms are not equal
  assert(h_sum != g_mul);
}
[Note] Note

A histogram with default storage converts its cell values to double when they are to be multiplied with or divided by a real number, or when a real number is added or subtracted. At this point the no-overflow-guarantee is lost.

[Note] Note

When the storage tracks weight variances, such as boost::histogram::weight_storage, adding two copies of a histogram produces a different result than scaling the histogram by a factor of two, as shown in the last example. The is a consequence of the mathematical properties of variances. They add like normal numbers, but scaling by s means that variances are scaled by s^2.

Simple ostream operators are shipped with the library, which are internally used by the unit tests. These give text representations of axis and histogram configurations, but do not show the histogram content. They may be useful for debugging and more useful text representations are planned for the future. Since user may want to use their own implementations, the headers with the builtin implementations are not included by the super header #include <boost/histogram.hpp>. The following example shows the effect of output streaming.

#include <boost/histogram.hpp>
#include <boost/histogram/ostream.hpp>
#include <cassert>
#include <iostream>
#include <sstream>
#include <string>

int main() {
  using namespace boost::histogram;
  namespace tr = axis::transform;

  auto h =
      make_histogram(axis::regular<>(2, -1.0, 1.0),
                     axis::regular<double, tr::log>(2, 1.0, 10.0, "axis 1"),
                     axis::regular<double, tr::pow, use_default, axis::option::growth_t>(
                         tr::pow{1.5}, 2, 1.0, 10.0, "axis 2"),
                     // axis without metadata
                     axis::circular<double, axis::null_type>(4, 0.0, 360.0),
                     // axis without under-/overflow bins
                     axis::variable<double, use_default, axis::option::none_t>(
                         {-1.0, 0.0, 1.0}, "axis 4"),
                     axis::category<>({2, 1, 3}, "axis 5"),
                     axis::category<std::string>({"red", "blue"}, "axis 6"),
                     axis::integer<>(-1, 1, "axis 7"));

  std::ostringstream os;
  os << h;

  std::cout << os.str() << std::endl;

  assert(os.str() ==
         "histogram(\n"
         "  regular(2, -1, 1, options=underflow | overflow),\n"
         "  regular_log(2, 1, 10, metadata=\"axis 1\", options=underflow | overflow),\n"
         "  regular_pow(2, 1, 10, metadata=\"axis 2\", options=growth, power=1.5),\n"
         "  regular(4, 0, 360, options=overflow | circular),\n"
         "  variable(-1, 0, 1, metadata=\"axis 4\", options=none),\n"
         "  category(2, 1, 3, metadata=\"axis 5\", options=overflow),\n"
         "  category(\"red\", \"blue\", metadata=\"axis 6\", options=overflow),\n"
         "  integer(-1, 1, metadata=\"axis 7\", options=underflow | overflow)\n"
         ")");
}

The library supports serialization via Boost.Serialization. The serialization code is not included by the super header #include <boost/histogram.hpp>, so that the library can be used without Boost.Serialization or with a different serialization library.

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/histogram.hpp>
#include <boost/histogram/serialization.hpp> // includes serialization code
#include <cassert>
#include <sstream>

int main() {
  using namespace boost::histogram;

  auto a = make_histogram(axis::regular<>(3, -1.0, 1.0, "axis 0"),
                          axis::integer<>(0, 2, "axis 1"));
  a(0.5, 1);

  std::string buf; // to hold persistent representation

  // store histogram
  {
    std::ostringstream os;
    boost::archive::text_oarchive oa(os);
    oa << a;
    buf = os.str();
  }

  auto b = decltype(a)(); // create a default-constructed second histogram

  assert(b != a); // b is empty, a is not

  // load histogram
  {
    std::istringstream is(buf);
    boost::archive::text_iarchive ia(is);
    ia >> b;
  }

  assert(b == a); // now b is equal to a
}

The library was designed to work with algorithms from the C++ standard library. In addition, a support library of algorithms is included with common operations on histograms.

The histogram class has standard random-access iterators which can be used with various algorithms from the standard library. Some things need to be kept in mind:

  • The histogram iterators also walk over the under- and overflow bins, if they are present. If this is inconvenient, the corresponding axis options may be disabled. A way to skip the extra bins is to use the iterators from the boost::histogram::indexed range generator instead of the raw histogram iterators, but the values then need to be accessed through an extra dereference operation.
  • The iteration order for histograms with several axes is an implementation-detail, but for histograms with one axis it is guaranteed to be the unsurprising order: bins are sequentially increasing, the upper edge of the current bin is the lower edge of the next bin.
#include <boost/histogram.hpp>
#include <cassert>

#include <algorithm> // fill, any_of, min_element, max_element
#include <cmath>     // sqrt
#include <numeric>   // partial_sum, inner_product

int main() {
  using namespace boost::histogram;

  // make histogram that represents a probability density function (PDF)
  auto h1 = make_histogram(axis::regular<>(4, 1.0, 3.0));

  // use std::fill to set all counters to 0.25, including *flow cells
  std::fill(h1.begin(), h1.end(), 0.25);
  // reset *flow cells to zero
  h1.at(-1) = h1.at(4) = 0;

  // compute the cumulative density function (CDF), overriding cell values
  std::partial_sum(h1.begin(), h1.end(), h1.begin());

  assert(h1.at(-1) == 0.0);
  assert(h1.at(0) == 0.25);
  assert(h1.at(1) == 0.50);
  assert(h1.at(2) == 0.75);
  assert(h1.at(3) == 1.00);
  assert(h1.at(4) == 1.00);

  // use any_of to check if any cell values are smaller than 0.1,
  // and use indexed() to skip underflow and overflow cells
  auto h1_ind = indexed(h1);
  const auto any_small =
      std::any_of(h1_ind.begin(), h1_ind.end(), [](const auto& x) { return *x < 0.1; });
  assert(any_small == false); // underflow and overflow are zero, but skipped

  // find maximum element
  const auto max_it = std::max_element(h1.begin(), h1.end());
  assert(max_it == h1.end() - 2);

  // find minimum element
  const auto min_it = std::min_element(h1.begin(), h1.end());
  assert(min_it == h1.begin());

  // make second PDF
  auto h2 = make_histogram(axis::regular<>(4, 1.0, 4.0));
  h2.at(0) = 0.1;
  h2.at(1) = 0.3;
  h2.at(2) = 0.2;
  h2.at(3) = 0.4;

  // computing cosine similiarity: cos(theta) = A dot B / sqrt((A dot A) * (B dot B))
  const auto aa = std::inner_product(h1.begin(), h1.end(), h1.begin(), 0.0);
  const auto bb = std::inner_product(h2.begin(), h2.end(), h2.begin(), 0.0);
  const auto ab = std::inner_product(h1.begin(), h1.end(), h2.begin(), 0.0);
  const auto cos_sim = ab / std::sqrt(aa * bb);

  assert(std::abs(cos_sim - 0.78) < 1e-2);
}

It is easy to iterate over all histogram cells to compute the sum of cell values by hand or to use an algorithm from the standard library to do so, but it is not safe. The result may not be accurate or overflow, if the sum is represented by an integer type. The library provides boost::histogram::algorithm::sum as a safer, albeit slower, alternative. It uses the Neumaier algorithm in double precision for integers and floating point types, and does the naive sum otherwise.

Sometimes you want to remove some axes from a high-dimensional histogram and obtain the equivalent lower-dimensional version, that one gets by summing up the bin contents of the removed axes. This is called a projection. This is useful if you found out that there is no interesting structure along an axis, so it is not worth keeping that axis around, or if you want to visualize 1d or 2d projections of a high-dimensional histogram.

The library provides the boost::histogram::algorithm::project function for this purpose, which returns the reduced histogram. It accepts the original histogram and the indices (one or more) of the axes that are kept. If your histogram is static, meaning the axis configuration is known at compile-time, you need to use compile-time numbers as indices, while the histogram with a dynamic axis configuration can also accepts run-time indices and even an iterators over indices.

#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;
  using namespace literals; // enables _c suffix

  // make a 2d histogram
  auto h = make_histogram(axis::regular<>(3, -1.0, 1.0), axis::integer<>(0, 2));

  h(-0.9, 0);
  h(0.9, 1);
  h(0.1, 0);

  auto hr0 = algorithm::project(h, 0_c); // keep only first axis
  auto hr1 = algorithm::project(h, 1_c); // keep only second axis

  // reduce does not remove counts; returned histograms are summed over
  // the removed axes, so h, hr0, and hr1 have same number of total counts;
  // we compute the sum of counts with the sum algorithm
  assert(algorithm::sum(h) == 3 && algorithm::sum(hr0) == 3 && algorithm::sum(hr1) == 3);

  std::ostringstream os1;
  for (auto x : indexed(h))
    os1 << "(" << x.index(0) << ", " << x.index(1) << "): " << *x << "\n";
  std::cout << os1.str() << std::flush;
  assert(os1.str() == "(0, 0): 1\n"
                      "(1, 0): 1\n"
                      "(2, 0): 0\n"
                      "(0, 1): 0\n"
                      "(1, 1): 0\n"
                      "(2, 1): 1\n");

  std::ostringstream os2;
  for (auto x : indexed(hr0)) os2 << "(" << x.index(0) << ", -): " << *x << "\n";
  std::cout << os2.str() << std::flush;
  assert(os2.str() == "(0, -): 1\n"
                      "(1, -): 1\n"
                      "(2, -): 1\n");

  std::ostringstream os3;
  for (auto x : indexed(hr1)) os3 << "(- ," << x.index(0) << "): " << *x << "\n";
  std::cout << os3.str() << std::flush;
  assert(os3.str() == "(- ,0): 2\n"
                      "(- ,1): 1\n");
}

A projection removes an axis completely. Another way to obtain a smaller histogram which consume less memory is the boost::histogram::algorithm::reduce function to shrink and rebin axes.

Shrinking means that the range of an axis is reduced and the number of bins along this axis. Rebinning means that adjacent bins are merged. For each set of adjacent bins, new bin is formed which covers the interval of the merged bins and has their added content. These two operations can be combined and applied to several axes at once (which is more efficient than doing it sequentially).

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // make a 2d histogram
  auto h = make_histogram(axis::regular<>(4, 0.0, 4.0), axis::regular<>(4, -1.0, 1.0));

  h(0, -0.9);
  h(1, 0.9);
  h(2, 0.1);
  h(3, 0.1);

  // shrink the first axis to remove the highest bin
  // rebin the second axis by merging pairs of adjacent bins
  auto h2 = algorithm::reduce(h, algorithm::shrink(0, 0.0, 3.0), algorithm::rebin(1, 2));

  // reduce does not remove counts if the histogram has underflow/overflow bins
  assert(algorithm::sum(h) == 4 && algorithm::sum(h2) == 4);

  assert(h2.axis(0).size() == 3);
  assert(h2.axis(0).bin(0).lower() == 0.0);
  assert(h2.axis(0).bin(2).upper() == 3.0);
  assert(h2.axis(1).size() == 2);
  assert(h2.axis(1).bin(0).lower() == -1.0);
  assert(h2.axis(1).bin(1).upper() == 1.0);
}

The library is customizable and extensible by users. Users can create new axis types and use them with the histogram, or implement a custom storage policy, or use a builtin storage policy with a custom counter type.

There are two ways to generate a single histogram using several threads.

1. Each thread has its own copy of the histogram. Each copy is independently filled. The copies are then added in the main thread. Use this when you can afford having N copies of the histogram in memory for N threads.

2. Each thread is filling the same histogram, concurrently. This requires a thread-safe storage that can handle concurrent writes and a histogram made entirely of non-growing axes (the second requirement may be relaxed in the future).

The library currently does not provide a builtin thread-safe storage yet (one will be added in the future), but a provisional one can be made with std::atomic counters, as shown in the next example.

#include <atomic>
#include <boost/histogram.hpp>
#include <boost/histogram/algorithm/sum.hpp>
#include <cassert>
#include <functional>
#include <thread>
#include <vector>

// dummy fill function, to be executed in parallel by several threads
template <typename Histogram>
void fill(Histogram& h) {
  for (unsigned i = 0; i < 1000; ++i) { h(i % 10); }
}

/*
  std::atomic has deleted copy ctor, we need to wrap it in a type with a
  potentially unsafe copy ctor. It can be used in a thread-safe way if some
  rules are followed, see below.
*/
template <typename T>
class copyable_atomic : public std::atomic<T> {
public:
  using std::atomic<T>::atomic;

  // zero-initialize the atomic T
  copyable_atomic() noexcept : std::atomic<T>(T()) {}

  // this is potentially not thread-safe, see below
  copyable_atomic(const copyable_atomic& rhs) : std::atomic<T>() { this->operator=(rhs); }

  // this is potentially not thread-safe, see below
  copyable_atomic& operator=(const copyable_atomic& rhs) {
    if (this != &rhs) { std::atomic<T>::store(rhs.load()); }
    return *this;
  }
};

int main() {
  using namespace boost::histogram;

  /*
    Create histogram with container of atomic counters for parallel filling in
    several threads. You cannot use bare std::atomic here, because std::atomic
    types are not copyable. Using the copyable_atomic as a work-around is safe,
    if the storage does not change size while it is filled. This means that
    growing axis types are not allowed.
  */
  auto h = make_histogram_with(std::vector<copyable_atomic<std::size_t>>(),
                               axis::integer<>(0, 10));

  /*
    The histogram storage may not be resized in either thread.
    This is the case, if you do not use growing axis types.
  */
  auto fill_h = [&h]() { fill(h); };
  std::thread t1(fill_h);
  std::thread t2(fill_h);
  std::thread t3(fill_h);
  std::thread t4(fill_h);
  t1.join();
  t2.join();
  t3.join();
  t4.join();

  assert(algorithm::sum(h) == 4000);
}

It is easy to make custom axis classes that work with the library. The custom axis class must meet the requirements of the Axis concept.

Users can create a custom axis by derive from a builtin axis type and customize its behavior. The following examples demonstrates a modification of the integer axis.

#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;

  // custom axis, which adapts builtin integer axis
  struct custom_axis : public axis::integer<> {
    using value_type = const char*; // type that is fed to the axis

    using integer::integer; // inherit ctors of base

    // the customization point
    // - accept const char* and convert to int
    // - then call index method of base class
    axis::index_type index(value_type s) const { return integer::index(std::atoi(s)); }
  };

  auto h = make_histogram(custom_axis(3, 6));
  h("-10");
  h("3");
  h("4");
  h("9");

  std::ostringstream os;
  for (auto&& b : indexed(h)) {
    os << "bin " << b.index() << " [" << b.bin() << "] " << *b << "\n";
  }

  std::cout << os.str() << std::endl;

  assert(os.str() == "bin 0 [3] 1\n"
                     "bin 1 [4] 1\n"
                     "bin 2 [5] 0\n");
}

How to make an axis completely from scratch is shown in the next example.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // stateless axis which returns 1 if the input is even and 0 otherwise
  struct even_odd_axis {
    axis::index_type index(int x) const { return x % 2; }
    axis::index_type size() const { return 2; }
  };

  // threshold axis which returns 1 if the input is above threshold
  struct threshold_axis {
    threshold_axis(double x) : thr(x) {}
    axis::index_type index(double x) const { return x >= thr; }
    axis::index_type size() const { return 2; }
    double thr;
  };

  auto h = make_histogram(even_odd_axis(), threshold_axis(3.0));

  h(0, 2.0);
  h(1, 4.0);
  h(2, 4.0);

  assert(h.at(0, 0) == 1); // even, below threshold
  assert(h.at(0, 1) == 1); // even, above threshold
  assert(h.at(1, 0) == 0); // odd, below threshold
  assert(h.at(1, 1) == 1); // odd, above threshold
}

Such minimal axis types lack many features provided by the builtin axis types, for example, one cannot iterate over them, but this functionality can be added as needed.

Multi-dimensional histograms usually have an orthogonal system of axes. Orthogonal means that each axis takes care of only one value and computes its local index independently of all the other axes and values. A checker-board is such an orthogonal grid in 2D.

There are other interesting grids which are not orthogonal, notably the honeycomb grid in 2D. In such a grid, each cell is hexagonal and even though the cells form a perfectly regular pattern, it is not possible to sort values into these cells using two orthogonal axes.

The library supports non-orthogonal grids by allowing axis type to accepts two or more arguments instead of one, if they are packed into a std::tuple. The following example demonstrates this.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // axis which returns 1 if the input falls inside the unit circle and zero otherwise
  struct circle_axis {
    // accepts a 2D point in form of a std::tuple
    axis::index_type index(const std::tuple<double, double>& point) const {
      const auto x = std::get<0>(point);
      const auto y = std::get<1>(point);
      return x * x + y * y <= 1.0;
    }

    axis::index_type size() const { return 2; }
  };

  auto h1 = make_histogram(circle_axis());

  // fill looks normal for a histogram which has only one Nd-axis
  h1(0, 0);   // in
  h1(0, -1);  // in
  h1(0, 1);   // in
  h1(-1, 0);  // in
  h1(1, 0);   // in
  h1(1, 1);   // out
  h1(-1, -1); // out

  // 2D histogram, but only 1D index
  assert(h1.at(0) == 2); // out
  assert(h1.at(1) == 5); // in

  // other axes can be combined with a Nd-axis
  auto h2 = make_histogram(circle_axis(), axis::category<std::string>({"red", "blue"}));

  // now we need to pass arguments for Nd-axis explicitly as std::tuple
  h2(std::make_tuple(0, 0), "red");
  h2(std::make_tuple(1, 1), "blue");

  // 3D histogram, but only 2D index
  assert(h2.at(0, 0) == 0); // out, red
  assert(h2.at(0, 1) == 1); // out, blue
  assert(h2.at(1, 0) == 1); // in, red
  assert(h2.at(1, 1) == 0); // in, blue
}

Histograms which use a different storage class can easily created with the factory function make_histogram_with. For convenience, this factory function accepts many standard containers as storage backends: vectors, arrays, and maps. These are automatically wrapped with a boost::histogram::storage_adaptor to provide the storage interface needed by the library. Users may also place custom accumulators in the vector, as described in the next section.

[Warning] Warning

The no-overflow-guarantee is only valid if the default storage is used. If you change the storage policy, you need to know what you are doing.

A std::vector may provide higher performance than the default storage with a carefully chosen counter type. Usually, this would be an integral or floating point type. A std::vector-based storage may be faster than the default storage for low-dimensional histograms (or not, you need to measure).

Users who work exclusively with weighted histograms should chose a std::vector<double> over the default storage, it will be faster. If they also want to track the variance of the sum of weights, using the factor function make_weighted_histogram is a convenient, which provides a histogram with a vector-based storage of weighted_sum accumulators.

An interesting alternative to a std::vector is to use a std::array. The latter provides a storage with a fixed maximum capacity (the size of the array). std::array allocates the memory on the stack. In combination with a static axis configuration this allows one to create histograms completely on the stack without any dynamic memory allocation. Small stack-based histograms can be created and destroyed very fast.

Finally, a std::map or std::unordered_map is adapted into a sparse storage, where empty cells do not consume any memory. This sounds very attractive, but the memory consumption per cell in a map is much larger than for a vector or array. Furthermore, the cells are usually scattered in memory, which increases cache misses and degrades performance. Whether a sparse storage performs better than a dense storage depends strongly on the usage scenario. It is easy switch from dense to sparse storage and back, so one can try both options.

The following example shows how histograms are constructed which use an alternative storage classes.

#include <algorithm> // std::for_each
#include <array>
#include <boost/histogram.hpp>
#include <boost/histogram/algorithm/sum.hpp>
#include <functional> // std::ref
#include <unordered_map>
#include <vector>

int main() {
  using namespace boost::histogram;
  const auto axis = axis::regular<>(10, 0.0, 1.0);

  auto data = {0.1, 0.3, 0.2, 0.7};

  // Create static histogram with vector<int> as counter storage, you can use
  // other arithmetic types as counters, e.g. double.
  auto h1 = make_histogram_with(std::vector<int>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h1));
  assert(algorithm::sum(h1) == 4);

  // Create static histogram with array<int, N> as counter storage which is
  // allocated completely on the stack (this is very fast). N may be larger than
  // the actual number of bins used; an exception is raised if N is too small to
  // hold all bins.
  auto h2 = make_histogram_with(std::array<int, 12>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h2));
  assert(algorithm::sum(h2) == 4);

  // Create static histogram with unordered_map as counter storage; this
  // generates a sparse histogram where only memory is allocated for bins that
  // are non-zero. This sounds like a good idea for high-dimensional histograms,
  // but maps come with a memory and run-time overhead. The default_storage
  // usually performs better in high dimensions.
  auto h3 = make_histogram_with(std::unordered_map<std::size_t, int>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h3));
  assert(algorithm::sum(h3) == 4);
}

A storage can hold arbitrary accumulators which may accept an arbitrary number of arguments. The arguments are passed to the accumulator via the sample call, for example, sample(1, 2, 3) for an accumulator which accepts three arguments. Accumulators are often placed in a vector-based storage, so the library provides an alias, the boost::histogram::dense_storage, which is templated on the accumulator type.

The library provides several accumulators:

  • sum accepts no samples, but accepts a weight. It is an alternative to a plain arithmetic type as a counter. It provides an advantage when histograms are filled with weights that differ dramatically in magnitude. The sum of weights is computed incrementally with the Neumaier algorithm, which is more accurate than a normal sum of arithmetic types.
  • weighted_sum accepts no samples, but accepts a weight. It computes the sum of weights and the sum of weights squared, the variance estimate of the sum of weights. This type is used by the make_weighted_histogram.
  • mean accepts a sample and computes the mean of the samples. make_profile uses this accumulator.
  • weighted_mean accepts a sample and a weight. It computes the weighted mean of the samples. make_weighted_profile uses this accumulator.

Users can easily write their own accumulators and plug them into the histogram, if they adhere to the Accumulator concept.

#include <array>
#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <boost/histogram/accumulators/mean.hpp>
#include <iostream>

int main() {
  using namespace boost::histogram;
  const auto axis = axis::regular<>(3, 0.0, 1.0);

  // Create a 1D-profile, which computes the mean of samples in each bin. The
  // factory function `make_profile` is provided by the library as a shorthand.
  auto h1 = make_histogram_with(dense_storage<accumulators::mean<>>(), axis);

  // Argument of `sample` is passed to accumulator.
  h1(0.0, sample(2)); // sample 2 goes to first bin
  h1(0.1, sample(2)); // sample 2 goes to first bin
  h1(0.4, sample(3)); // sample 3 goes to second bin
  h1(0.5, sample(4)); // sample 4 goes to second bin

  std::ostringstream os1;
  for (auto x : indexed(h1)) {
    // Accumulators usually have methods to access their state. Use the arrow
    // operator to access them. Here, `count()` gives the number of samples,
    // `value()` the mean, and `variance()` the variance estimate of the mean.
    os1 << boost::format("%i count %i mean %.1f variance %.1f\n") % x.index() %
               x->count() % x->value() % x->variance();
  }
  std::cout << os1.str() << std::flush;
  assert(os1.str() == "0 count 2 mean 2.0 variance 0.0\n"
                      "1 count 2 mean 3.5 variance 0.5\n"
                      "2 count 0 mean 0.0 variance 0.0\n");

  // Let's make a custom accumulator, which tracks the maximum of the samples. It must
  // have a call operator that accepts the argument of the `sample` function. The return
  // value of the call operator is ignored.
  struct max {
    void operator()(double x) {
      if (x > value) value = x;
    }
    double value = 0;
  };

  // Create a histogram with the custom accumulator, initialize the accumulators
  // to 1.
  auto h2 = make_histogram_with(dense_storage<max>(), axis);
  h2(0.0, sample(2));   // sample 2 goes to first bin
  h2(0.1, sample(2.5)); // sample 2.5 goes to first bin
  h2(0.4, sample(3));   // sample 3 goes to second bin
  h2(0.5, sample(4));   // sample 4 goes to second bin

  std::ostringstream os2;
  for (auto x : indexed(h2)) {
    os2 << boost::format("%i value %.1f\n") % x.index() % x->value;
  }
  std::cout << os2.str() << std::flush;
  assert(os2.str() == "0 value 2.5\n"
                      "1 value 4.0\n"
                      "2 value 0.0\n");
}

PrevUpHomeNext