Normalized entropy, for measuring dispersion in draws from categorical distributions.

```
entropy(x)
# S3 method for default
entropy(x)
# S3 method for rvar
entropy(x)
```

- x
(multiple options) A vector to be interpreted as draws from a categorical distribution, such as:

A factor

An rvar, rvar_factor, or rvar_ordered

If `x`

is a factor or numeric, returns a length-1 numeric vector with a value
between 0 and 1 (inclusive) giving the normalized Shannon entropy of `x`

.

If `x`

is an rvar, returns an array of the same shape as `x`

, where each
cell is the normalized Shannon entropy of the draws in the corresponding cell of `x`

.

Calculates the normalized Shannon entropy of the draws in `x`

. This value is
the entropy of `x`

divided by the maximum entropy of a distribution with `n`

categories, where `n`

is `length(unique(x))`

for numeric vectors and
`length(levels(x))`

for factors:

$$-\frac{\sum_{i = 1}^{n} p_i \log(p_i)}{\log(n)}$$

This scales the output to be between 0 (all probability in one category) and 1 (uniform). This form of normalized entropy is referred to as \(H_\mathrm{REL}\) in Wilcox (1967).

Allen R. Wilcox (1967). *Indices of Qualitative Variation*
(No. ORNL-TM-1919). Oak Ridge National Lab., Tenn.

```
set.seed(1234)
levels <- c("a", "b", "c", "d", "e")
# a uniform distribution: high normalized entropy
x <- factor(
sample(levels, 4000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2)),
levels = levels
)
entropy(x)
#> [1] 0.9999008
# a unimodal distribution: low normalized entropy
y <- factor(
sample(levels, 4000, replace = TRUE, prob = c(0.95, 0.02, 0.015, 0.01, 0.005)),
levels = levels
)
entropy(y)
#> [1] 0.1659647
# both together, as an rvar
xy <- c(rvar(x), rvar(y))
xy
#> rvar_factor<4000>[2] mode <entropy>:
#> [1] d <1.00> a <0.17>
#> 5 levels: a b c d e
entropy(xy)
#> [1] 0.9999008 0.1659647
```