Chi_square results

 1. Program code

import pandas as pd

from scipy.stats import chi2_contingency


# Load dataset

df = pd.read_csv("gapminder.csv")


# Convert variables to numeric

df["incomeperperson"] = pd.to_numeric(df["incomeperperson"], errors="coerce")

df["internetuserate"] = pd.to_numeric(df["internetuserate"], errors="coerce")


# Create categorical groups

df["income_group"] = pd.cut(df["incomeperperson"],

                            bins=[0, 5000, 20000, 100000],

                            labels=["Low Income", "Middle Income", "High Income"])

df["internet_group"] = pd.cut(df["internetuserate"],

                              bins=[0, 30, 70, 100],

                              labels=["Low Internet Use", "Medium Internet Use", "High Internet Use"])


# Drop missing values

df_clean = df.dropna(subset=["income_group", "internet_group"])


# Create contingency table

contingency_table = pd.crosstab(df_clean["income_group"], df_clean["internet_group"])

print("Contingency Table:\n", contingency_table)


# Run Chi-Square Test of Independence

chi2, p, dof, expected = chi2_contingency(contingency_table)


print("\nChi-Square Test Results")

print("Chi2:", chi2, " | df:", dof, " | p-value:", p)

print("\nExpected Frequencies:\n", expected)



2. Output





3. Interpretation

The Chi-Square Test of Independence revealed a significant association between income group and internet use group, χ²(4, N=183) = 175.44, p < .0001.

  • Observed vs Expected:

    • Low Income countries had far more Low Internet Use cases than expected under independence (88 vs ~55), and virtually no High Internet Use cases (0 vs ~19 expected).

    • High Income countries had far more High Internet Use cases than expected (24 vs ~5 expected).

    • Middle Income countries leaned more toward Medium Internet Use than expected.

  • Interpretation: This indicates that income level and internet penetration are not independent; rather, they are strongly associated. Low-income countries are disproportionately in the Low Internet Use group, while high-income countries are disproportionately in the High Internet Use group.


Comments

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2

Simple Linear Regression