Histogramming

Several options for histograms are provided.

Full-sample histogram

nbi_stat.histogram() is similar to numpy.histogram in that it requires the full sample to be available.

However, nbi_stat.histogram() differ from numpy.histgram in some important aspects

  • We always divide the counts by the bin widths so that the area is proportional to the probability.

  • The function allows for both frequency and non-frequency weights - see Frequency and non-frequency weights - for all kinds of binning

  • The function always returns the uncertainty on each bin, including for the case of non-frequency weights

  • Rather than returning the bin boundaries, the function returns the bin centres and widths. This allow for easy drawing using matplotlib.pyplot.errorbar

Let’s show an example

>>> import nbi_stat
>>> np.random.seed(123456)
>>> x = np.random.normal(0,1,size=1000)
>>> h, x, w, e = nbi_stat.histogram(x,30)
>>> plt.errorbar(x,h,e,w/2,".",label='Data')
>>> plt.legend()

(Source code, png, hires.png, pdf)

_images/histograms-1.png

On-line histogramming

An “online” method for building histograms is likewise provided. This again allows for weights - both frequency and non-frequency weights. Three functions are defined for this purpose

Let’s do an example similar to the above

>>> import nbi_stat
>>> np.random.seed(123456)
>>> hist = nbi_stat.init_histogram(np.linspace(-3,3,31))
>>> for _ in range(1000):
>>>     hist = nbi_stat.fill_histogram(np.random.normal(),*hist)
>>> h, x, w, e = nbi_stat.fini_histogram(*hist)
>>> plt.errorbar(x,h,e,w/2,".",label='Data')
>>> plt.legend()

(Source code, png, hires.png, pdf)

_images/histograms-2.png

A histogram object

If one prefers a Object Oriented approach one can use objects of the class nbi_stat.Histogram. This class defines methods to initialize, fill, and finalize a histogram, as well methods to access the data.

>>> import nbi_stat
>>> np.random.seed(123456)
>>>
>>> hist = nbi_stat.Histogram(np.linspace(-3,3,31))
>>>
>>> for _ in range(1000):
>>>     hist.fill(np.random.normal())
>>>
>>> hist.finalize()
>>> s = plt.plot(hist.centers,hist.heights,
...              drawstyle='steps-mid',alpha=.5)
>>> plt.errorbar(hist.centers,
...              hist.heights,
...              hist.uncertainties,
...              hist.widths/2,
...              '.', color=s[0].get_color(),label='Histogram')
>>> plt.legend()

(Source code, png, hires.png, pdf)

_images/histograms-3.png

We can use weights when filling the histogram object. These weights can be either frequency or non-frequency weights, depending on the value of the weighted argument to the constructor. If this is False (default), then we assume frequency weights, while if True we assume non-frequency weights. Let’s do a frequency weighted histogram first

>>> import nbi_stat
>>> np.random.seed(123456)
>>>
>>> hist = nbi_stat.Histogram(np.linspace(-3,3,31))
>>>
>>> for x in hist.centers:
>>>     hist.fill(x,np.random.poisson(3))
>>>
>>> hist.finalize()
>>>
>>> s = plt.plot(hist.centers,hist.heights,
...              drawstyle='steps-mid',alpha=.5)
>>> plt.errorbar(hist.centers,
...              hist.heights,
...              hist.uncertainties,
...              hist.widths/2,
...              '.', color=s[0].get_color(),
...              label='Frequency weights')
>>> plt.legend()

(Source code, png, hires.png, pdf)

_images/histograms-4.png

And now one with non-frequency weights

>>> import nbi_stat
>>> np.random.seed(123456)
>>>
>>> hist = nbi_stat.Histogram(np.linspace(-3,3,31),True)
>>>
>>> for _ in range(1000):
>>>     hist.fill(np.random.normal(),np.random.random())
>>>
>>> hist.finalize()
>>>
>>> plt.fill_between(hist.centers,
...                  np.zeros_like(hist.heights),
...                  hist.heights,
...                  step='mid',alpha=.5)
>>> plt.errorbar(hist.centers,
...              hist.heights,
...              hist.uncertainties,
...              hist.widths/2,
...              '.', label='Non-frequency weighted')
>>> plt.plot(hist.centers,
...          hist.sums.sum()/np.sqrt(2*np.pi)*
...          np.exp(-.5*hist.centers**2),label='Normal')
>>> plt.legend()

(Source code, png, hires.png, pdf)

_images/histograms-5.png