James Dargan

Logo

Aspiring Data Scientist

Seattle, WA
Python, SQL, R

Actively searching


Pandas, BeautifulSoup, Requests, Plotly, Dash, nltk, Sklearn, Keras
Regression, PCA, Time Series, Trees, RFs, SVMs, Boosting, Neural Networks

View My GitHub Profile

Walmart Sales (Repo)

Project Description: Utilizing data provided by Walmart via Kaggle available here, I construct a SARIMAX model to forecast future store sales.

Baseline Models

I propose 2 computation-free baselines to compare any forecast model against which perform with an average mean absolute percent error under 10%.

Baseline MAPE: Train 6.5, Test 6.0

Seasonal Component: (1,1,1,52)

ARIMA Component: (2,0,1)

Exogenous Features: I exclude provided feature data which are not available at time of forecast. I explore temperature and changes in fuel prices as exogenous predictors but find both worsen model performance.

Business Context Simulation

To properly evaluate my model, I add 3 constraints to my model building and forecast process.

  1. I delay sales data availability for the following week to simulate data collection and cleaning delay.
  2. I extend forecasts to a two week advance minimum to simulate real-world demands of advanced notice for planning.
  3. I maintain a consistent training window to simulate limited data storage.

All prediction errors will be measured from a 3-week advance to accommodate these simulated business demands.

Final Model Performance