Learn the introduction to the confusion matrix in Python — what a confusion matrix is, how it evaluates classification models, and how to create one with examples using scikit‑learn metrics.
A confusion matrix is a matrix used to evaluate the performance of a classification model in machine learning, data analytics, and statistics.
It shows how many predictions the model got right and wrong by comparing actual values with predicted values.
| Actual \ Predicted | Positive | Negative |
|---|---|---|
| Positive | True Positive (TP) | False Negative (FN) |
| Negative | False Positive (FP) | True Negative (TN) |
True Positive (TP) Model correctly predicts Positive (e.g., predicts “spam” and it is spam)
True Negative (TN) Model correctly predicts Negative (predicts “not spam” and it is not spam)
False Positive (FP) Model predicts Positive, but actual is Negative (also called Type-I error)
False Negative (FN) Model predicts Negative, but actual is Positive (also called Type-II error)
It helps us calculate key evaluation metrics:
Accuracy [ \frac{TP + TN}{TP + TN + FP + FN} ]
Precision [ \frac{TP}{TP + FP} ]
Recall (Sensitivity) [ \frac{TP}{TP + FN} ]
F1-Score Harmonic mean of Precision and Recall
Suppose we have 100 students and predict whether they passed or failed:
This information forms the confusion matrix.
sklearnfrom sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# Actual class labels
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
# Predicted class labels
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]
cm = confusion_matrix(y_true, y_pred)
print(cm)
Output
[[4 1]
[1 4]]
This means:
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
X, y = make_classification(n_samples=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
from sklearn.metrics import accuracy_score, precision_score, recall_score
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
Confusion matrix is used to evaluate classification models by showing True Positives, True Negatives, False Positives, and False Negatives.
| [Logistic Regression in Python / | Learn Logistic Regression with Examples](/python/docs/statistics/logistic-regression.html) |
| [Introduction to Regression & Regression Models in Python / | Tutorial & Examples](/python/docs/statistics/regression-model.html) |