Simple Linear Regression
1. Program/Script
import pandas as pd
import statsmodels.api as sm
# Load dataset
df = pd.read_csv("gapminder.csv")
# Convert variables to numeric
df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")
df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")
# Drop missing values
df_clean = df.dropna(subset=["incomeperperson", "internetuserate"])
# Center the explanatory variable (income per person)
df_clean["income_centered"] = df_clean["incomeperperson"] - df_clean["incomeperperson"].mean()
# Check the mean after centering
print("Mean of centered income:", df_clean["income_centered"].mean())
# Add constant and run regression
X = sm.add_constant(df_clean["income_centered"])
y = df_clean["internetuserate"]
model = sm.OLS(y, X).fit()
print(model.summary())
Since the explanatory variable (income per person) is quantitative, it was centered to have a mean of approximately 0 (−4.37×10⁻¹³) — confirming successful centering.
4. Interpretation
A simple linear regression was conducted to examine the relationship between income per person (explanatory variable) and internet use rate (response variable).
The results show that income per person is a significant positive predictor of internet use rate (β = 0.0017, p < .001). The model explains approximately 56.4% of the variance (R² = 0.564) in internet use rate.
This suggests that countries with higher income per person tend to have substantially higher internet use rates, indicating a strong positive association between economic prosperity and digital connectivity.
Comments
Post a Comment