Python project 3
1. The Script
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset
df = pd.read_csv("gapminder.csv")
# Convert variables to numeric
vars_of_interest = ["incomeperperson", "internetuserate", "lifeexpectancy"]
df[vars_of_interest] = df[vars_of_interest].apply(pd.to_numeric, errors="coerce")
# Data management: group variables
df["income_group"] = pd.cut(df["incomeperperson"],
bins=[0, 5000, 20000, 100000],
labels=["Low Income", "Middle Income", "High Income"])
df["internet_group"] = pd.cut(df["internetuserate"],
bins=[0, 30, 70, 100],
labels=["Low Internet Use", "Medium Internet Use", "High Internet Use"])
df["lifeexp_group"] = pd.cut(df["lifeexpectancy"],
bins=[0, 60, 75, 90],
labels=["Low Life Expectancy", "Medium Life Expectancy", "High Life Expectancy"])
# --- Univariate graphs ---
sns.countplot(x="income_group", data=df)
plt.title("Distribution of Income Groups")
plt.show()
sns.countplot(x="internet_group", data=df)
plt.title("Distribution of Internet Use Groups")
plt.show()
sns.countplot(x="lifeexp_group", data=df)
plt.title("Distribution of Life Expectancy Groups")
plt.show()
# --- Bivariate graph ---
sns.scatterplot(x="incomeperperson", y="internetuserate", data=df)
plt.title("Association between Income per Person and Internet Use Rate")
plt.xlabel("Income per Person (US$)")
plt.ylabel("Internet Use Rate (%)")
plt.show()
3. Summary of Frequency Distributions
Income Groups (Univariate): Most countries fall into the Low Income category, with fewer in Middle Income and only a small number in High Income. This shows global inequality in economic development.
-
Internet Use Groups (Univariate): The majority of countries are in Low and Medium Internet Use, while relatively few countries reach High Internet Use. This highlights the persistence of the digital divide.
-
Life Expectancy Groups (Univariate): Most countries cluster in Medium or High Life Expectancy, while some still fall into the Low Life Expectancy category, suggesting disparities in health outcomes.
-
Income vs Internet Use (Bivariate): The scatter plot reveals a clear positive relationship: countries with higher income per person also tend to have higher internet usage. Low-income countries cluster in the bottom-left, while high-income countries are in the top-right, strongly supporting the hypothesis that income is associated with internet access.




Comments
Post a Comment