ETL Data Pipeline for a Cookie Business
- Katie Wojciechowski
- Feb 2
- 3 min read
Small businesses often run on instinct—and for many, that works… until it doesn’t. As order volume grows, ingredients fluctuate in price, and product mixes expand, intuition alone stops being enough. That’s where data comes in.
This project explores how even a small, home-based bakery can benefit from a thoughtfully designed data pipeline. Using a fictional business called Hayes Baked Goods, I built an end-to-end analytics workflow that turns messy operational data into clear, actionable insights.
GitHub repository: https://github.com/k80hys/bakery-pipeline
Business Context
Hayes Baked Goods is a small cookie bakery run by a full-time owner/manager, with help from a part-time baker and a contract bookkeeper. The business offers freshly baked cookies for local pickup and limited delivery.
Customers order by the dozen, choosing from:
Chocolate Chip Cookies
Emmy’s Sugar Cookies
Nutella Swirl Cookies
All cookies are priced the same at $28 per dozen, with optional add-ons:
Nutella stuffing ($4)
Crunchy peanut butter stuffing ($4)
Sea salt (free)
The goal of this project was to help the business move away from intuition-based decisions and toward data-backed insights around pricing, product performance, profitability, and ingredient planning.
Project Overview
At a high level, this project simulates a lightweight but realistic analytics stack for a small business. It includes:
Generated source data (orders, products, recipes, ingredients)
A data audit and validation process
Clearly defined business requirements and KPIs
An analytical schema with documentation and ER diagrams
A full ETL pipeline
Post-load validation in a data warehouse
A Tableau dashboard for visualization
Insights, recommendations, and future enhancements
Source Data
To start, I created scripts to generate dummy data and export it as CSV files. These files act as the pipeline’s source data:
shopify_orders.csv
shopify_products.csv
recipes.csv
ingredients.csv
The Shopify files simulate order and product exports, while the recipe and ingredient files represent manually maintained business documentation.
In a real-world setup, this data would come from the Shopify Admin API, with an automation tool like Zapier handling updates to recipe and ingredient spreadsheets.
Inspecting, Validating, and Modeling the Data
Before transforming anything, I ran a structural and content-level inspection across all source files using a Python script. This step checks things like missing values, inconsistent formats, and broken relationships.
Based on the findings, I designed the analytical model, complete with an ER diagram and star schema optimized for reporting. These artifacts are documented and included with the project.
The ETL Pipeline
With clean and validated data, I built an end-to-end ETL pipeline in Python. The pipeline handles:
Source data validation
Transformations into fact and dimension tables
Dimensional modeling for BI use cases
The transformed data is loaded into both:
A staging folder (for backup and debugging)
A MySQL data warehouse (cookie_bakery_dw) for structured storage
This dual approach demonstrates flexibility in how small teams can manage their data.
Warehouse Validation
After loading the data into MySQL, I ran a series of SQL validation checks to ensure table integrity, referential accuracy, and business logic consistency. This step is critical for trust—especially when dashboards will inform pricing and purchasing decisions.
Analytics & Visualization
Due to limited database connection options in my version of Tableau, I used Google Sheets as an intermediary layer. An automation script pushes curated data from the warehouse into Sheets using the Google Sheets and Google Drive APIs.
From there, I built Tableau dashboards that highlight:
Product sales trends
Revenue by cookie type and add-ons
High-level performance patterns


What's Next?
This pipeline is intentionally simple, but it leaves plenty of room to grow. Future enhancements could include:
Live Shopify API integration
Incremental (change-only) ETL
Event-driven automation triggered by new orders
Instagram marketing data integration
Forecasting and demand planning
Unit tests and CI/CD for the ETL pipeline
Final Thoughts
This project shows that data engineering isn’t just for large companies with massive budgets. With the right structure and a clear business goal, even a small cookie bakery can use data to make smarter, more confident decisions—and maybe bake a few extra dozen cookies along the way.




Comments