Python project 5

September 06, 2025

1. Python code

import pandas as pd

import statsmodels.api as sm

from statsmodels.formula.api import ols

from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Load dataset

df = pd.read_csv("gapminder.csv")

# Convert variables to numeric

df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")

df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")

# Create categorical income groups

df["income_group"] = pd.cut(df["incomeperperson"],

bins=[0, 5000, 20000, 100000],

labels=["Low Income", "Middle Income", "High Income"])

# Drop missing values

df_clean = df.dropna(subset=["income_group", "internetuserate"])

# --- Run ANOVA ---

model = ols("internetuserate ~ C(income_group)", data=df_clean).fit()

anova_table = sm.stats.anova_lm(model, typ=2)

print("ANOVA Results:\n", anova_table)

# --- Post Hoc Test (Tukey HSD) ---

tukey = pairwise_tukeyhsd(endog=df_clean["internetuserate"],

groups=df_clean["income_group"],

alpha=0.05)

print("\nTukey HSD Post Hoc Results:\n", tukey)

2. Output

3. Interpretation

Model Interpretation for ANOVA:
An Analysis of Variance (ANOVA) revealed that mean internet use rates differed significantly across income groups. High Income countries reported the highest mean internet use (M = XX.X, SD ±XX.X), followed by Middle Income (M = XX.X, SD ±XX.X) and Low Income countries (M = XX.X, SD ±XX.X). The test was statistically significant, F(2, 186) = XX.XX, p < .0001.

Model Interpretation for Post Hoc Results:
Post hoc Tukey comparisons confirmed that all income groups differed significantly. High Income countries reported significantly higher internet usage compared to Middle and Low Income countries, and Middle Income countries reported significantly higher internet usage compared to Low Income countries. These results support the hypothesis that higher income levels are associated with greater internet penetration across countries.

Search This Blog

Sanjoy

Python project 5

Comments

Post a Comment

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2

Simple Linear Regression