Confusion Matrix#

This function evaluates the (multiclass¹) Confusion Matrix (CM) associated to a classifier output predictions. The CM is a metric that measures the misclassification rates between all the classes in the classification task.

Mathematical definition#

Mathematically, if the classification task has \(N_c\) target classes, the CM is an \(N_c \times N_c\) matrix whose entry \(C_{i,j}\) is the number of predictions known to be in group \(i\) and predicted to be in group \(j\). The matrix can then be normalized in different ways, obtaining rates of misclassifications instead of raw counts (more on that in normalization).

Implementation#

The function confusion_matrix in puma.utils.confusion_matrix computes the CM from two arrays of target and predicted labels. The basic usage is:

targets = np.array([2, 0, 2, 2, 0, 1])
predictions = np.array([0, 0, 2, 2, 0, 2])
confusion_matrix(targets, predictions)

Eventually, samples can be weighted by their relative importance by providing an array of weights \(w_i \in [0,1]\):

targets = np.array([2, 0, 2, 2, 0, 1])
predictions = np.array([0, 0, 2, 2, 0, 2])
weights = np.array([1, 0.5, 0.5, 1, 0.2, 1])
confusion_matrix(targets, predictions, sample_weights=weights)

Normalization#

There are four possible normalization choices, which can be selected through the normalize argument of the function: None to use raw counts; "rownorm" to normalize across the prediction class, i.e. such that the rows add to one (default); "colnorm" to normalize across the target class, i.e. such that the columns add to one; "all" to normalize across all examples, i.e. such that all matrix entries add to one. Defaults to "rownorm".

Example#

"""Produce the confusion matrix for a tagger output."""

from __future__ import annotations

import numpy as np

from puma.utils.confusion_matrix import confusion_matrix

# Sample size
N = 100

# Number of target classes
Nclass = 3

# Dummy target labels
targets = np.random.randint(0, Nclass, size=N)
# Making sure that there is at least one sample for each class
targets = np.append(targets, np.array(list(range(Nclass))))

# Dummy predicted labels
predictions = np.random.randint(0, Nclass, size=(N + Nclass))


# Confusion matrix examples:

# Unweighted confusion matrix, normalized on all entries
unweighted_cm = confusion_matrix(targets, predictions, normalize="all")
print("Unweighted, normalized on all entries, CM:")
print(unweighted_cm)
print(" ")

# Unweighted confusion matrix, normalized on true labels
unweighted_cm = confusion_matrix(targets, predictions, normalize="rownorm")
print("Unweighted, normalized true labels (rownorm), CM:")
print(unweighted_cm)
print(" ")

# Unweighted confusion matrix, with raw counts (non-normalized)
unweighted_cm = confusion_matrix(targets, predictions, normalize=None)
print("Unweighted, non-normalized, CM:")
print(unweighted_cm)
print(" ")

# Weighted Confusion Matrix
# Dummy sample weights
sample_weights = np.random.rand(N + Nclass)

weighted_cm = confusion_matrix(targets, predictions, sample_weights=sample_weights)
print("Weighted CM:")
print(weighted_cm)

In a multiclass task, each sample belongs to one and only class (the true label, or target label). ↩