Counting Functions

The package provides functions to count the occurrences of distinct values.

Counting over an Integer Range

StatsBase.countsFunction
counts(x, [wv::AbstractWeights])
counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
counts(x, k::Integer, [wv::AbstractWeights])

Count the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

The output is a vector of length length(levels).

source
StatsBase.proportionsFunction
proportions(x, levels=span(x), [wv::AbstractWeights])

Return the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
proportions(x, k::Integer, [wv::AbstractWeights])

Return the proportion of integers in 1 to k that occur in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])

Add the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

source

Counting over arbitrary distinct values

StatsBase.countmapFunction
countmap(x; alg = :auto)
countmap(x::AbstractVector, wv::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its number of occurrences.

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source
StatsBase.proportionmapFunction
proportionmap(x)
proportionmap(x::AbstractVector, w::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its proportion in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(dict, x; alg = :auto)
addcounts!(dict, x, wv)

Add counts based on x to a count map. New entries will be added if new values come up.

If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source