Python project 3

  1. The Script

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns


# Load dataset

df = pd.read_csv("gapminder.csv")


# Convert variables to numeric

vars_of_interest = ["incomeperperson", "internetuserate", "lifeexpectancy"]

df[vars_of_interest] = df[vars_of_interest].apply(pd.to_numeric, errors="coerce")


# Data management: group variables

df["income_group"] = pd.cut(df["incomeperperson"],

                            bins=[0, 5000, 20000, 100000],

                            labels=["Low Income", "Middle Income", "High Income"])

df["internet_group"] = pd.cut(df["internetuserate"],

                              bins=[0, 30, 70, 100],

                              labels=["Low Internet Use", "Medium Internet Use", "High Internet Use"])

df["lifeexp_group"] = pd.cut(df["lifeexpectancy"],

                             bins=[0, 60, 75, 90],

                             labels=["Low Life Expectancy", "Medium Life Expectancy", "High Life Expectancy"])


# --- Univariate graphs ---

sns.countplot(x="income_group", data=df)

plt.title("Distribution of Income Groups")

plt.show()


sns.countplot(x="internet_group", data=df)

plt.title("Distribution of Internet Use Groups")

plt.show()


sns.countplot(x="lifeexp_group", data=df)

plt.title("Distribution of Life Expectancy Groups")

plt.show()

# --- Bivariate graph ---

sns.scatterplot(x="incomeperperson", y="internetuserate", data=df)

plt.title("Association between Income per Person and Internet Use Rate")

plt.xlabel("Income per Person (US$)")

plt.ylabel("Internet Use Rate (%)")

plt.show()



2. The Output








3. Summary of Frequency Distributions

  • Income Groups (Univariate): Most countries fall into the Low Income category, with fewer in Middle Income and only a small number in High Income. This shows global inequality in economic development.

  • Internet Use Groups (Univariate): The majority of countries are in Low and Medium Internet Use, while relatively few countries reach High Internet Use. This highlights the persistence of the digital divide.

  • Life Expectancy Groups (Univariate): Most countries cluster in Medium or High Life Expectancy, while some still fall into the Low Life Expectancy category, suggesting disparities in health outcomes.

  • Income vs Internet Use (Bivariate): The scatter plot reveals a clear positive relationship: countries with higher income per person also tend to have higher internet usage. Low-income countries cluster in the bottom-left, while high-income countries are in the top-right, strongly supporting the hypothesis that income is associated with internet access.


Comments

Popular posts from this blog

Exploring the Relationship Between Economic Prosperity, Health, and Internet Adoption Across Countries

Python project 2

Simple Linear Regression