Chi_square results

September 06, 2025

1. Program code

import pandas as pd

from scipy.stats import chi2_contingency

# Load dataset

df = pd.read_csv("gapminder.csv")

# Convert variables to numeric

df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")

df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")

# Create categorical groups

df["income_group"] = pd.cut(df["incomeperperson"],

bins=[0, 5000, 20000, 100000],

labels=["Low Income", "Middle Income", "High Income"])

df["internet_group"] = pd.cut(df["internetuserate"],

bins=[0, 30, 70, 100],

labels=["Low Internet Use", "Medium Internet Use", "High Internet Use"])

# Drop missing values

df_clean = df.dropna(subset=["income_group", "internet_group"])

# Create contingency table

contingency_table = pd.crosstab(df_clean["income_group"], df_clean["internet_group"])

print("Contingency Table:\n", contingency_table)

# Run Chi-Square Test of Independence

chi2, p, dof, expected = chi2_contingency(contingency_table)

print("\nChi-Square Test Results")

print("Chi2:", chi2, " | df:", dof, " | p-value:", p)

print("\nExpected Frequencies:\n", expected)

2. Output

3. Interpretation

The Chi-Square Test of Independence revealed a significant association between income group and internet use group, χ²(4, N=183) = 175.44, p < .0001.

Observed vs Expected:
- Low Income countries had far more Low Internet Use cases than expected under independence (88 vs ~55), and virtually no High Internet Use cases (0 vs ~19 expected).
- High Income countries had far more High Internet Use cases than expected (24 vs ~5 expected).
- Middle Income countries leaned more toward Medium Internet Use than expected.
Interpretation: This indicates that income level and internet penetration are not independent; rather, they are strongly associated. Low-income countries are disproportionately in the Low Internet Use group, while high-income countries are disproportionately in the High Internet Use group.

Search This Blog

Sanjoy

Chi_square results

Comments

Post a Comment

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2

Simple Linear Regression