Learn with Yasir

Share Your Feedback

Box Plot in Matplotlib – Python Visualization Guide with Examples


Learn how to create box plots in Matplotlib using Python. This tutorial covers box plot components, customization, outlier detection, and side-by-side comparisons with violin plots.

Table of Contents

  1. What is a Box Plot (Box-and-Whisker Plot)?
  2. How Outliers Are Determined in a Box Plot

📦 What is a Box Plot (Box-and-Whisker Plot)?

A box plot is a graphical summary of data distribution. It helps visualize:

  • Median (middle value)
  • Quartiles (25th and 75th percentiles)
  • Minimum and Maximum (within limits)
  • Outliers (data points that fall outside 1.5×IQR)

It’s great for comparing distributions across multiple datasets.(atlassian)


📊 Structure of a Box Plot

box plot

Image source: www.atlassian.com

  • Q1: 25th percentile
  • Q2 (Median): 50th percentile
  • Q3: 75th percentile
  • IQR: Q3 - Q1
  • Whiskers: Extend to 1.5 × IQR or actual min/max
  • Dots: Outliers

🐍 Simple Example in Matplotlib

import matplotlib.pyplot as plt
import numpy as np

# Example datasets
np.random.seed(0)
model_A = np.random.normal(70, 10, 100)
model_B = np.random.normal(75, 15, 100)
model_C = np.random.normal(65, 20, 100)

data = [model_A, model_B, model_C]

# Updated box plot with tick_labels
plt.figure(figsize=(8, 5))
plt.boxplot(data, tick_labels=['Model A', 'Model B', 'Model C'], patch_artist=True)

plt.title('Box Plot Example: Model Score Distributions')
plt.ylabel('Scores')
plt.grid(True)
plt.show()

box plot


✅ Output Explanation:

  • Each box shows the spread of scores.
  • You can quickly compare:

    • Which model has higher median
    • Which model is more consistent (smaller IQR)
    • Presence of outliers

Great question! A box plot is specifically designed to help you identify outliers in your dataset.


📦 How Outliers Are Determined in a Box Plot

Box plots use the Interquartile Range (IQR) to find outliers.

✅ Definitions:

  • Q1: First quartile (25th percentile)
  • Q3: Third quartile (75th percentile)
  • IQR: Q3 − Q1

🚨 Outlier Rule:

Any data point is considered an outlier if it is:

  • < Q1 − 1.5 × IQR
  • > Q3 + 1.5 × IQR

These points will show up as individual dots outside the whiskers in a box plot.

Example: Let’s say you have a dataset where: Q1 = 20, Q3 = 80, and IQR = 80 - 20 = 60. Then:

Lower Fence (Q1 - 1.5 * IQR) = 20 - 1.5 * 60 = -70
Upper Fence (Q3 + 1.5 * IQR) = 80 + 1.5 * 60 = 170

Any data point less than -70 or greater than 170 would be flagged as an outlier.


🐍 Python Example Using Matplotlib and NumPy

import numpy as np
import matplotlib.pyplot as plt

# Sample data with outliers
data = np.array([12, 13, 14, 15, 16, 17, 18, 30])  # 40 is an outlier

# Box plot
plt.boxplot(data, vert=False, patch_artist=True)
plt.title("Box Plot with Outlier")
plt.xlabel("Value")
plt.grid(True)
plt.show()

# Calculate outlier thresholds
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = data[(data < lower_bound) | (data > upper_bound)]

print("Outliers:", outliers)

📤 Output:

  • Box plot showing a dot at 30
  • Console prints:

    Outliers: [30]
    

✅ Summary:

  • Box = middle 50% of data (Q1 to Q3)
  • Whiskers = within 1.5×IQR
  • Dots beyond whiskers = outliers

References and Bibliography