A DATA SCIENCE APPROACH TO APPLIED ECONOMICS: PREDICTING GDPGROWTH USING MACHINE LEARNING TECHNIQUES

Authors

  • I. Javaid
  • M. Iqbal

DOI:

https://doi.org/10.57041/ywr5sd52

Keywords:

GDP growth forecasting, machine learning, random forest, LSTM, linear regression, hybrid dataset, simulated shocks, panel data

Abstract

The convergence of data science and applied economics has ushered in a
transformative era for macroeconomic forecasting, particularly in predicting gross domestic product
(GDP) growth—a cornerstone metric for assessing national economic vitality, guiding fiscal and
monetary policy, and informing global investment strategies. This comprehensive research paper
presents a rigorous, data-driven framework for forecasting annual GDP growth rates using advanced
machine learning techniques applied to a hybrid panel dataset comprising six major economies: the
United States, China, India, Germany, Brazil, and Japan, over the period 2000–2023. The dataset
integrates realistic economic trends extracted from World Bank Development Indicators with
carefully simulated data to address common empirical challenges such as missing observations, short
time series, and the underrepresentation of extreme economic events. Realistic components are
calibrated to historical averages—for instance, the United States exhibits a mean GDP growth of 2.5%
with a standard deviation of 1.5%, while China averages 8.0% ± 2.5%. Simulated values are generated
via multivariate normal distributions with country-specific parameters and overlaid with structural
shocks mimicking the 2008 Global Financial Crisis (GDP drop of 4–6%, unemployment spike of 3–
5%) and the 2020 COVID-19 pandemic (GDP contraction of 5–8%, unemployment surge of 4–7%).
Three machine learning models are rigorously evaluated:
1. Linear Regression – a classical econometric baseline grounded in ordinary least squares (OLS);
2. Random Forest Regression – an ensemble method leveraging bagging and feature randomness to capture nonlinear interactions;
3. Long Short-Term Memory (LSTM) Networks – a deep recurrent neural network designed to model temporal
dependencies in sequential economic data.
Predictive features include lagged GDP growth, inflation (CPI annual %), unemployment rate (% of labor force),
and exports as % of GDP, selected based on established macroeconomic theory (e.g., Okun’s Law, Phillips
Curve, export-led growth hypothesis).
Empirical results demonstrate the random forest model’s superiority, achieving a Mean Absolute Error (MAE) of
1.85 and Root Mean Squared Error (RMSE) of 2.45 on the test set—representing a 37% improvement in MAE over
linear regression (MAE: 2.95, RMSE: 3.82) and a 12% edge over LSTM (MAE: 2.10, RMSE: 2.68). Feature importance
analysis reveals lagged GDP growth as the dominant predictor (importance: 0.52), followed by unemployment (0.21),
inflation (0.15), and exports (0.12), reinforcing the autoregressive nature of economic momentum and the critical role of
labor market conditions.
The study’s contributions are threefold:
 Methodological: Introduces a reproducible hybrid data construction pipeline for economic forecasting under
data constraints.
 Empirical: Provides cross-country comparative evidence of machine learning’s efficacy across developed and
emerging markets.
 Policy-Relevant: Offers actionable insights for real-time nowcasting and scenario-based policymaking.
Limitations include reliance on simulated shocks, exclusion of fiscal policy variables, and the annual frequency
of data. Future research should incorporate high-frequency indicators (e.g., PMI, satellite night lights), geopolitical risk
indices, and hybrid neuro-econometric models. This work advances the field of econoinformatics, demonstrating that
machine learning, when grounded in economic theory and robust data practices, can significantly enhance predictive
accuracy and support evidence-based economic governance in an era of uncertainty.

Author Biographies

  • I. Javaid

    Department of Economics, National College of Business Administration & Economics Lahore, Pakistan

  • M. Iqbal

    Department of Mechatronics and control Engineering, University of Engineering and Technology Lahore, Pakistan

Downloads

Published

2023-07-06

How to Cite

A DATA SCIENCE APPROACH TO APPLIED ECONOMICS: PREDICTING GDPGROWTH USING MACHINE LEARNING TECHNIQUES. (2023). Pakistan Journal of Scientific Research, 3(1), 234-250. https://doi.org/10.57041/ywr5sd52

Most read articles by the same author(s)