Pearson Correlation
1. The program code
import pandas as pd
from scipy.stats import pearsonr
# Load dataset
df = pd.read_csv("gapminder.csv")
# Convert variables to numeric
df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")
df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")
# Drop missing values
df_clean = df.dropna(subset=["incomeperperson", "internetuserate"])
# Calculate Pearson correlation
r, p = pearsonr(df_clean["incomeperperson"], df_clean["internetuserate"])
print("Correlation Coefficient (r):", r)
print("p-value:", p)
print("R-squared:", r**2)
When squared, R² = 0.56, meaning that about 56% of the variability in internet use rates can be explained by differences in income per person. This provides strong evidence that economic status is closely associated with digital adoption worldwide.
Comments
Post a Comment