top of page
Search

ETL Data Pipeline for a Cookie Business

  • Writer: Katie Wojciechowski
    Katie Wojciechowski
  • Feb 2
  • 3 min read

Small businesses often run on instinct—and for many, that works… until it doesn’t. As order volume grows, ingredients fluctuate in price, and product mixes expand, intuition alone stops being enough. That’s where data comes in.


This project explores how even a small, home-based bakery can benefit from a thoughtfully designed data pipeline. Using a fictional business called Hayes Baked Goods, I built an end-to-end analytics workflow that turns messy operational data into clear, actionable insights.



Business Context

Hayes Baked Goods is a small cookie bakery run by a full-time owner/manager, with help from a part-time baker and a contract bookkeeper. The business offers freshly baked cookies for local pickup and limited delivery.


Customers order by the dozen, choosing from:

  • Chocolate Chip Cookies

  • Emmy’s Sugar Cookies

  • Nutella Swirl Cookies


All cookies are priced the same at $28 per dozen, with optional add-ons:

  • Nutella stuffing ($4)

  • Crunchy peanut butter stuffing ($4)

  • Sea salt (free)


The goal of this project was to help the business move away from intuition-based decisions and toward data-backed insights around pricing, product performance, profitability, and ingredient planning.


Project Overview

At a high level, this project simulates a lightweight but realistic analytics stack for a small business. It includes:

  • Generated source data (orders, products, recipes, ingredients)

  • A data audit and validation process

  • Clearly defined business requirements and KPIs

  • An analytical schema with documentation and ER diagrams

  • A full ETL pipeline

  • Post-load validation in a data warehouse

  • A Tableau dashboard for visualization

  • Insights, recommendations, and future enhancements


Source Data

To start, I created scripts to generate dummy data and export it as CSV files. These files act as the pipeline’s source data:


shopify_orders.csv

shopify_products.csv

recipes.csv

ingredients.csv


The Shopify files simulate order and product exports, while the recipe and ingredient files represent manually maintained business documentation.


In a real-world setup, this data would come from the Shopify Admin API, with an automation tool like Zapier handling updates to recipe and ingredient spreadsheets.


Inspecting, Validating, and Modeling the Data

Before transforming anything, I ran a structural and content-level inspection across all source files using a Python script. This step checks things like missing values, inconsistent formats, and broken relationships.


Based on the findings, I designed the analytical model, complete with an ER diagram and star schema optimized for reporting. These artifacts are documented and included with the project.


The ETL Pipeline

With clean and validated data, I built an end-to-end ETL pipeline in Python. The pipeline handles:

  • Source data validation

  • Transformations into fact and dimension tables

  • Dimensional modeling for BI use cases


The transformed data is loaded into both:

  • A staging folder (for backup and debugging)

  • A MySQL data warehouse (cookie_bakery_dw) for structured storage


This dual approach demonstrates flexibility in how small teams can manage their data.


Warehouse Validation

After loading the data into MySQL, I ran a series of SQL validation checks to ensure table integrity, referential accuracy, and business logic consistency. This step is critical for trust—especially when dashboards will inform pricing and purchasing decisions.


Analytics & Visualization

Due to limited database connection options in my version of Tableau, I used Google Sheets as an intermediary layer. An automation script pushes curated data from the warehouse into Sheets using the Google Sheets and Google Drive APIs.


From there, I built Tableau dashboards that highlight:

  • Product sales trends

  • Revenue by cookie type and add-ons

  • High-level performance patterns


These visualizations are sparse because the source data was artificially generated, but I just wanted to demonstrate some ways to begin to think about the data we pulled together.
These visualizations are sparse because the source data was artificially generated, but I just wanted to demonstrate some ways to begin to think about the data we pulled together.

What's Next?

This pipeline is intentionally simple, but it leaves plenty of room to grow. Future enhancements could include:

  • Live Shopify API integration

  • Incremental (change-only) ETL

  • Event-driven automation triggered by new orders

  • Instagram marketing data integration

  • Forecasting and demand planning

  • Unit tests and CI/CD for the ETL pipeline


Final Thoughts

This project shows that data engineering isn’t just for large companies with massive budgets. With the right structure and a clear business goal, even a small cookie bakery can use data to make smarter, more confident decisions—and maybe bake a few extra dozen cookies along the way.

 
 
 

Comments


©2025 by Katie Hayes. Proudly created with Wix.com

bottom of page