Arcsine Distribution

#include <boost/math/distributions/arcsine.hpp>

namespace boost{ namespace math{

 template <class RealType = double,
           class Policy   = policies::policy<> >
class arcsine_distribution;

typedef arcsine_distribution<double> arcsine; // double precision standard arcsine distribution [0,1].

template <class RealType, class Policy>
class arcsine_distribution
{
public:
   typedef RealType  value_type;
   typedef Policy    policy_type;

   // Constructor from two range parameters, x_min and x_max:
   arcsine_distribution(RealType x_min, RealType x_max);

   // Range Parameter accessors:
   RealType x_min() const;
   RealType x_max() const;
};
}} // namespaces

The class type arcsine_distribution represents an arcsine probability distribution function. The arcsine distribution is named because its CDF uses the inverse sin^-1 or arcsine.

This is implemented as a generalized version with support from x_min to x_max providing the 'standard arcsine distribution' as default with x_min = 0 and x_max = 1. (A few make other choices for 'standard').

The arcsine distribution is generalized to include any bounded support a <= x <= b by Wolfram and Wikipedia, but also using location and scale parameters by Virtual Laboratories in Probability and Statistics Arcsine distribution. The end-point version is simpler and more obvious, so we implement that. If desired, this outlines how the Beta Distribution can be used to add a shape factor.

The probability density function PDF for the arcsine distribution defined on the interval [x_min, x_max] is given by:

f(x; x_min, x_max) = 1 /(π⋅√((x - x_min)⋅(x_max - x_min))

For example, Wolfram Alpha arcsine distribution, from input of

N[PDF[arcsinedistribution[0, 1], 0.5], 50]

computes the PDF value

0.63661977236758134307553505349005744813783858296183

The Probability Density Functions (PDF) of generalized arcsine distributions are symmetric U-shaped curves, centered on (x_max - x_min)/2, highest (infinite) near the two extrema, and quite flat over the central region.

If random variate x is x_min or x_max, then the PDF is infinity. If random variate x is x_min then the CDF is zero. If random variate x is x_max then the CDF is unity.

The 'Standard' (0, 1) arcsine distribution is shown in blue and some generalized examples with other x ranges.

The Cumulative Distribution Function CDF is defined as

F(x) = 2⋅arcsin(√((x-x_min)/(x_max - x))) / π

Constructor

arcsine_distribution(RealType x_min, RealType x_max);

constructs an arcsine distribution with range parameters x_min and x_max.

Requires x_min < x_max, otherwise domain_error is called.

For example:

arcsine_distribution<> myarcsine(-2, 4);

constructs an arcsine distribution with x_min = -2 and x_max = 4.

Default values of x_min = 0 and x_max = 1 and a typedef arcsine_distribution<double> arcsine; mean that

arcsine as;

constructs a 'Standard 01' arcsine distribution.

Parameter Accessors

RealType x_min() const;
RealType x_max() const;

Return the parameter x_min or x_max from which this distribution was constructed.

So, for example:

using boost::math::arcsine_distribution;

arcsine_distribution<> as(2, 5); // Cconstructs a double arcsine distribution.
BOOST_ASSERT(as.x_min() == 2.);  // as.x_min() returns 2.
BOOST_ASSERT(as.x_max() == 5.);   // as.x_max()  returns 5.

Non-member Accessor Functions

All the usual non-member accessor functions that are generic to all distributions are supported: Cumulative Distribution Function, Probability Density Function, Quantile, Hazard Function, Cumulative Hazard Function, mean, median, mode, variance, standard deviation, skewness, kurtosis, kurtosis_excess, range and support.

The formulae for calculating these are shown in the table below, and at Wolfram Mathworld.

	Note
	There are always two values for the mode, at x_min and at x_max, default 0 and 1, so instead we raise the exception domain_error. At these extrema, the PDFs are infinite, and the CDFs zero or unity.

Applications

The arcsine distribution is useful to describe Random walks, (including drunken walks) Brownian motion, Weiner processes, Bernoulli trials, and their appplication to solve stock market and other ruinous gambling games.

The random variate x is constrained to x_min and x_max, (for our 'standard' distribution, 0 and 1), and is usually some fraction. For any other x_min and x_max a fraction can be obtained from x using

fraction = (x - x_min) / (x_max - x_min)

The simplest example is tossing heads and tails with a fair coin and modelling the risk of losing, or winning. Walkers (molecules, drunks...) moving left or right of a centre line are another common example.

The random variate x is the fraction of time spent on the 'winning' side. If half the time is spent on the 'winning' side (and so the other half on the 'losing' side) then x = 1/2.

For large numbers of tosses, this is modelled by the (standard [0,1]) arcsine distribution, and the PDF can be calculated thus:

std::cout << pdf(as, 1. / 2) << std::endl; // 0.637
// pdf has a minimum at x = 0.5

From the plot of PDF, it is clear that x = ½ is the minimum of the curve, so this is the least likely scenario. (This is highly counter-intuitive, considering that fair tosses must eventually become equal. It turns out that eventually is not just very long, but infinite!).

The most likely scenarios are towards the extrema where x = 0 or x = 1.

If fraction of time on the left is a ¼, it is only slightly more likely because the curve is quite flat bottomed.

std::cout << pdf(as, 1. / 4) << std::endl; // 0.735

If we consider fair coin-tossing games being played for 100 days (hypothetically continuously to be 'at-limit') the person winning after day 5 will not change in fraction 0.144 of the cases.

We can easily compute this setting x = 5./100 = 0.05

std::cout << cdf(as, 0.05) << std::endl; // 0.144

Similarly, we can compute from a fraction of 0.05 /2 = 0.025 (halved because we are considering both winners and losers) corresponding to 1 - 0.025 or 97.5% of the gamblers, (walkers, particles...) on the same side of the origin

std::cout << 2 * cdf(as, 1 - 0.975) << std::endl; // 0.202

(use of the complement gives a bit more clarity, and avoids potential loss of accuracy when x is close to unity, see why complements?).

std::cout << 2 * cdf(complement(as, 0.975)) << std::endl; // 0.202

or we can reverse the calculation by assuming a fraction of time on one side, say fraction 0.2,

std::cout << quantile(as, 1 - 0.2 / 2) << std::endl; //  0.976

std::cout << quantile(complement(as, 0.2 / 2)) << std::endl; // 0.976

Summary: Every time we toss, the odds are equal, so on average we have the same change of winning and losing.

But this is not true for an an individual game where one will be mostly in a bad or good patch.

This is quite counter-intuitive to most people, but the mathematics is clear, and gamblers continue to provide proof.

Moral: if you in a losing patch, leave the game. (Because the odds to recover to a good patch are poor).

Corollary: Quit while you are ahead?

A working example is at arcsine_example.cpp including sample output .

Related distributions

The arcsine distribution with x_min = 0 and x_max = 1 is special case of the Beta Distribution with α = 1/2 and β = 1/2.

Accuracy

This distribution is implemented using sqrt, sine, cos and arc sine and cos trigonometric functions which are normally accurate to a few machine epsilon. But all values suffer from loss of significance or cancellation error for values of x close to x_max. For example, for a standard [0, 1] arcsine distribution as, the pdf is symmetric about random variate x = 0.5 so that one would expect pdf(as, 0.01) == pdf(as, 0.99). But as x nears unity, there is increasing loss of significance. To counteract this, the complement versions of CDF and quantile are implemented with alternative expressions using cos^-1 instead of sin^-1. Users should see why complements? for guidance on when to avoid loss of accuracy by using complements.

Testing

The results were tested against a few accurate spot values computed by Wolfram Alpha, for example:

N[PDF[arcsinedistribution[0, 1], 0.5], 50]
0.63661977236758134307553505349005744813783858296183

Implementation

In the following table a and b are the parameters x_min and x_max, x is the random variable, p is the probability and its complement q = 1-p.

Function	Implementation Notes
support	x ∈ [a, b], default x ∈ [0, 1]
pdf	f(x; a, b) = 1/(π⋅√(x - a)⋅(b - x))
cdf	F(x) = 2/π⋅sin^-1(√(x - a) / (b - a) )
cdf of complement	2/(π⋅cos^-1(√(x - a) / (b - a)))
quantile	-a⋅sin²(½π⋅p) + a + b⋅sin²(½π⋅p)
quantile from the complement	-a⋅cos²(½π⋅p) + a + b⋅cos²(½π⋅q)
mean	½(a+b)
median	½(a+b)
mode	x ∈ [a, b], so raises domain_error (returning NaN).
variance	(b - a)² / 8
skewness	0
kurtosis excess	-3/2
kurtosis	kurtosis_excess + 3

The quantile was calculated using an expression obtained by using Wolfram Alpha to invert the formula for the CDF thus

solve [p - 2/pi sin^-1(sqrt((x-a)/(b-a))) = 0, x]

which was interpreted as

Solve[p - (2 ArcSin[Sqrt[(-a + x)/(-a + b)]])/Pi == 0, x, MaxExtraConditions -> Automatic]

and produced the resulting expression

x = -a sin^2((pi p)/2)+a+b sin^2((pi p)/2)

Thanks to Wolfram for providing this facility.

References

Sources

The probability of going through a bad patch Esteban Moro's Blog.
What soschumcks and the arc sine have in common Peter Haggstrom.
arcsine distribution.
Wolfram reference arcsine examples.
Shlomo Sternberg slides.