Pandas, BeautifulSoup, Requests, Plotly, Dash, nltk, Sklearn, Keras
Regression, PCA, Time Series, Trees, RFs, SVMs, Boosting, Neural Networks
Project Description:
Utilizing data provided by Walmart via Kaggle available here, I aggregate sales by store level and perform exploratory analysis of weekly store sales and SARIMA models to predict future sales as time series.
Before generating any formal models, I explore the completeness of data coverage and explore trends in sales over time, detailed in my notebook hosted here.
Insight 1: Walmart store sales display an annual cycle with a massive spike during Christmas season and decline thereafter.
Insight 2: Superbowl Sunday and Labor Day do not stand out visually while the Black Friday spike and post-Christmas crash are massive.
Insight 3: Viewing the average autocorrelation values across all stores, only the first lag appears statistically significant with some evidence of a monthly cycle. Additionally, the 52-week lag correlates more strongly with future weekly sales than any other time lag.
Insight 4: Out of 45 stores, 8 fail the ADFuller test of stationarity. I find 5 stores display consistent, long-term trends while 3 display evidence of temporary shocks within the provided timeframe.