Pearson Correlation

September 06, 2025

1. The program code

import pandas as pd

from scipy.stats import pearsonr

# Load dataset

df = pd.read_csv("gapminder.csv")

# Convert variables to numeric

df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")

df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")

# Drop missing values

df_clean = df.dropna(subset=["incomeperperson", "internetuserate"])

# Calculate Pearson correlation

r, p = pearsonr(df_clean["incomeperperson"], df_clean["internetuserate"])

print("Correlation Coefficient (r):", r)

print("p-value:", p)

print("R-squared:", r**2)

2. Output

3. Interpretation

The Pearson correlation between income per person and internet use rate was r = 0.75, p < .0001. This indicates a strong positive linear relationship: as countries’ income per person increases, their internet usage rate also tends to increase.

When squared, R² = 0.56, meaning that about 56% of the variability in internet use rates can be explained by differences in income per person. This provides strong evidence that economic status is closely associated with digital adoption worldwide.

Search This Blog

Sanjoy

Pearson Correlation

Comments

Post a Comment

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2

Simple Linear Regression