Static Histograms

OnlineStats.Hist histograms are used extensively in IRMA analysis code. Because Hist uses dynamic arrays, it is not a Julia isbitstype. This means that you must serialize/deserialize histograms if you want to pass them between MPI ranks. A SHist or Static Histogram uses a SVector from StaticArrays.jl, making an isbitstype. Therefore, you do not need to use serialization with MPI.

Construction and conversion

You can create an SHist from a Hist with the constructor.

julia> using IRMA

julia> using OnlineStats

julia> h = fit!(Hist(-5:0.2:5), randn(1_000))
Hist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])

julia> sh = SHist(h)
SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])

Note that an SHist is immutable. If you want to do anything real with it, you need to change it back into a Hist.

julia> hh = Hist(sh)
Hist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])

The conversions are very fast (~300 ns).

If you have a Series of histograms you can also go back and forth with the following...

julia> s1 = Series(Hist(-5:0.2:5), Hist(-10:0.1:10)) ; fit!(s1, randn(1000))
Series
├─ Hist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])
└─ Hist: n=1000 | value=(x = -10.0:0.1:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

julia> sh1 = Series(SHist.(s1.stats)...)
Series
├─ SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])
└─ SHist: n=1000 | value=(x = -10.0:0.1:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

julia> ss1 = Series(Hist.(sh1.stats)...)
Series
├─ Hist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0  …  1, 1, 0, 0, 0, 0, 0, 0, 0, 0])
└─ Hist: n=1000 | value=(x = -10.0:0.1:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Named groups are also possible.

julia> s2 = Series(h1=Hist(-5:0.2:5), h2=Hist(-10:0.1:10)) ; fit!(s2, randn(1000))
Series
├─ Hist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  1, 0, 1, 1, 0, 0, 0, 0, 0, 0])
└─ Hist: n=1000 | value=(x = -10.0:0.1:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

julia> sh2 = Series((; zip(keys(s2.stats), SHist.(values(s2.stats)))...))
Series
├─ SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  1, 0, 1, 1, 0, 0, 0, 0, 0, 0])
└─ SHist: n=1000 | value=(x = -10.0:0.1:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Series is the only collection type that is implemented. Note that there is no way to make a FTSeries isbits (due to the function objects), so you'll have to construct a different object from its parts.

Merging

Functions are provided to handle merging Static Histograms and their collections. Note that since they are immutable, there is no merge! method. The merge occurs by converting to a Hist, doing the merge, and then converting back to a SHist.

julia> s1 = fit!(Hist(-5:0.2:5), randn(1000)) ; sh1 = SHist(s1)
SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 1, 1, 0, 0, 0, 0, 0, 0, 0])

julia> s2 = fit!(Hist(-5:0.2:5), randn(1000)) ; sh2 = SHist(s2)
SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 1, 0, 0, 0, 1, 0, 0, 0  …  0, 0, 0, 1, 0, 0, 0, 0, 0, 0])

julia> shm = merge(sh1, sh2)
SHist: n=2000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 1, 0, 0, 0, 1, 0, 0, 0  …  0, 1, 1, 1, 0, 0, 0, 0, 0, 0])

For Series, we have to use a special function, mergeStatsCollectionWithSHist.

julia> ser1   = Series(h1=Hist(-5:0.2:5), h2=Hist(-10:0.2:10)) ; fit!(ser1,  randn(1_000));

julia> sher1  = Series((; zip(keys(ser1.stats), SHist.(values(ser1.stats)))... ))
Series
├─ SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
└─ SHist: n=1000 | value=(x = -10.0:0.2:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

julia> ser2 = Series(h1=Hist(-5:0.2:5), h2=Hist(-10:0.2:10)) ; fit!(ser2, randn(1_000));

julia> sher2 = Series((; zip(keys(ser2.stats), SHist.(values(ser2.stats)))... ))
Series
├─ SHist: n=1000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1  …  0, 1, 1, 0, 1, 0, 0, 0, 0, 0])
└─ SHist: n=1000 | value=(x = -10.0:0.2:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

julia> sherm = mergeStatsCollectionWithSHist(sher1, sher2)
Series
├─ SHist: n=2000 | value=(x = -5.0:0.2:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 3  …  0, 1, 1, 0, 1, 0, 0, 0, 0, 0])
└─ SHist: n=2000 | value=(x = -10.0:0.2:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])