Analyzes data from 179 countries (2000–2015) to identify which health and economic indicators are most closely associated with GDP per capita, using log-transformed regression models and mixed-effects analysis.
Problem
GDP per capita is one of the main ways we measure a country's economic well-being, but many different factors can influence it. Health, education, economic development, and disease burden all vary widely across countries, making it hard to understand which factors truly matter. Without a clear, data-driven model, it's difficult for researchers and policymakers to see which health or economic conditions are most strongly connected to higher GDP.
Overview
This project analyzes data from 179 countries (2000–2015) to identify which health and economic indicators are most closely associated with GDP per capita. Because GDP is highly skewed, the analysis uses a log transformation to make relationships clearer and easier to model. The goal is to understand—at a global scale—how factors like life expectancy, years of schooling, immunization rates, and disease incidence relate to economic prosperity.
How It Works (Approach)
The project begins with exploratory data analysis, looking at distributions, correlations, and country-level patterns. Several regression models are then built to test which variables meaningfully predict log GDP per capita. To avoid issues with correlated predictors, the project removes variables with high multicollinearity using VIF analysis. A mixed-effects model is also used to account for repeated observations within each country over time. Model performance is compared using AIC scores and diagnostic checks.
Impact / Value
The analysis shows that life expectancy is one of the strongest and most consistent predictors of GDP per capita worldwide, even after adjusting for education and other health indicators. By highlighting which factors truly matter, the project helps policymakers and researchers focus on the interventions—such as improving healthcare or expanding education access—that may have the greatest impact on economic development.
Key Features
- Global dataset covering 179 countries across 16 years
- Log-transformed GDP modeling for clearer relationships
- Multicollinearity reduction using VIF
- Forward, backward, and stepwise model comparison
- Mixed-effects model to account for country-level differences
- Clear comparison of health and economic predictors
