Simple Linear Regression

 1. Program/Script


import pandas as pd

import statsmodels.api as sm

# Load dataset

df = pd.read_csv("gapminder.csv")

# Convert variables to numeric

df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")

df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")

# Drop missing values

df_clean = df.dropna(subset=["incomeperperson", "internetuserate"])

# Center the explanatory variable (income per person)

df_clean["income_centered"] = df_clean["incomeperperson"] - df_clean["incomeperperson"].mean()

# Check the mean after centering

print("Mean of centered income:", df_clean["income_centered"].mean())

# Add constant and run regression

X = sm.add_constant(df_clean["income_centered"])

y = df_clean["internetuserate"]

model = sm.OLS(y, X).fit()

print(model.summary())


2. Output

Mean of centered income: -4.37e-13

OLS Regression Results
-------------------------------------------------------------
Dependent Variable:     internetuserate
R-squared:              0.564
Adj. R-squared:         0.562
F-statistic:            234.1
Prob (F-statistic):     1.89e-34
-------------------------------------------------------------
Variable          Coefficient (β)    Std. Error   t-value   p-value
-------------------------------------------------------------
Intercept               35.23           1.37       25.72     <0.001
income_centered          0.0017         0.0001     15.30     <0.001
-------------------------------------------------------------
No. of observations: 183


3. Frequency Table / Mean Check

Since the explanatory variable (income per person) is quantitative, it was centered to have a mean of approximately 0 (−4.37×10⁻¹³) — confirming successful centering.


4. Interpretation

A simple linear regression was conducted to examine the relationship between income per person (explanatory variable) and internet use rate (response variable).

The results show that income per person is a significant positive predictor of internet use rate (β = 0.0017, p < .001). The model explains approximately 56.4% of the variance (R² = 0.564) in internet use rate.

This suggests that countries with higher income per person tend to have substantially higher internet use rates, indicating a strong positive association between economic prosperity and digital connectivity.

Comments

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2