Python project
1. The Script
import pandas as pd
df = pd.read_csv("gapminder.csv")
# Select the variables of interest
vars_of_interest = ["incomeperperson", "internetuserate", "lifeexpectancy"]
# Convert columns to numeric (errors='coerce' turns bad data into NaN)
df[vars_of_interest] = df[vars_of_interest].apply(pd.to_numeric, errors="coerce")
# Run frequency distributions (value counts with bins)
freq_income = pd.cut(df["incomeperperson"], bins=5).value_counts().sort_index()
freq_internet = pd.cut(df["internetuserate"], bins=5).value_counts().sort_index()
freq_lifeexp = pd.cut(df["lifeexpectancy"], bins=5).value_counts().sort_index()
# Display results
print("Frequency Distribution: Income per Person")
print(freq_income)
print("\nFrequency Distribution: Internet Use Rate")
print(freq_internet)
print("\nFrequency Distribution: Life Expectancy")
print(freq_lifeexp)
Frequency Distribution: Income per Person
-
0 – 21,112 → 162 countries
-
21,112 – 42,121 → 24 countries
-
42,121 – 63,129 → 2 countries
-
63,129 – 84,138 → 1 country
-
84,138 – 105,147 → 1 country
Frequency Distribution: Internet Use Rate
-
0 – 19% → 73 countries
-
19 – 38% → 35 countries
-
38 – 57% → 37 countries
-
57 – 76% → 24 countries
-
76 – 95% → 23 countries
Frequency Distribution: Life Expectancy
-
48 – 55 years → 24 countries
-
55 – 62 years → 17 countries
-
62 – 69 years → 31 countries
-
69 – 76 years → 69 countries
-
76 – 83 years → 50 countries
3. Summary of Frequency Distributions
-
Income per Person: The majority of countries (162 out of 190+) fall in the lowest income bracket (under $21,000). Only a few countries exceed $40,000 per person, indicating strong global income inequality.
-
Internet Use Rate: The distribution is more balanced, but still uneven. 73 countries have less than 20% internet penetration, while only 23 countries reach above 76%. This highlights the persistence of the global digital divide.
-
Life Expectancy: Most countries cluster between 69–83 years (119 countries). A smaller group (41 countries) has life expectancies below 69 years, often reflecting poorer health systems or lower socioeconomic conditions.
Missing values appear minimal across these variables. Overall, the distributions confirm that income, internet use, and life expectancy vary widely across nations, providing a strong basis for studying associations between them.
Comments
Post a Comment