Data Analysis Portfolio

I'm a data analyst who sees spreadsheets the way others see story outlines – full of potential narratives waiting to be discovered. My fascination lies in finding the compelling stories hidden within rows of data, and transforming numbers and patterns into insights that actually mean something to real people.

I see myself as a detective-meets-storyteller who happens to be fluent in SQL, Python, and data visualization. I specialize in marketing analytics, where I get to piece together the puzzles of customer behavior and campaign performance. I'm always searching for those 'aha!' moments where data reveals unexpected patterns.

What drives me is the thrill of uncovering insights that bridge the gap between raw data and human understanding. I believe the best data analysis isn't just about crunching numbers – it's about crafting narratives that help businesses understand not just what's happening, but why it matters.

Christie Megill’s GitHub repositories


Project # 1:

Evolution of Book Publishing:

A Century of Change (1900-2022)

Technical Skills:

  • Python data analysis with pandas, numpy

  • Data visualization (matplotlib, seaborn)

  • Object-oriented programming

  • ETL processes

  • Time series analysis

Business Skills:

  • Industry trend analysis

  • Pattern recognition

  • Data storytelling

  • Clear communication of findings

  • Problem-solving approach

Storytelling Skills:

  • Process and clean complex historical data

  • Identify meaningful patterns and trends

  • Connect quantitative findings to broader historical and cultural contexts

  • Present data-driven insights in a clear, engaging narrative

Data Cleaning Processes

Completeness Checks

  • The code handles missing values by returning an empty list when subjects are None or invalid, preventing NULL propagation

  • Empty strings are implicitly handled through the strip() method which removes whitespace

Validity Checks

  • The code validates data types by checking if the input is either a string or list using isinstance()

  • It attempts to parse string representations of lists using literal_eval(), with error handling if the parsing fails

  • The code maintains a predefined set of valid FORMAT_CATEGORIES and GENRE_MAPPING values to validate against

  • Invalid or unmapped categories are preserved as-is rather than being rejected, providing flexibility while still standardizing known categories

Consistency Checks

  • Naming conventions are standardized by converting all subjects to lowercase using lower() and removing whitespace with strip()

  • The code handles multiple variations of the same concept (e.g., 'juvenile fiction', 'children's fiction', etc. all map to "Children's Literature")

  • Duplicates are removed using set() conversion in the final step of clean_subject_list()

  • Format consistency is enforced through the GENRE_MAPPING dictionary which provides standardized output values

Accuracy Checks

  • The code filters out certain format categories (like 'congresses', 'proceedings', etc.) that aren't actually subjects

  • It maps similar or equivalent terms to standardized categories to improve accuracy of classification

  • Invalid or corrupted data is caught in the try-except blocks, with detailed error logging

Structural Issues

  • The function maintains data type consistency by always returning a list for the 'subjects_cleaned' column

  • Error handling includes printing current column names to help diagnose structural issues

  • The code is wrapped in a class method (_clean_subjects)

  • The output is consistently structured as a new column ('subjects_cleaned') rather than modifying the existing column

The code primarily focuses on standardization and categorization of subject/genre information, with emphasis on consistency and handling edge cases. It's particularly thorough in handling naming conventions and category mapping

Report and Analysis

Mission Statement:

To analyze and understand the evolution of the publishing industry from 1900 to 2022, examining growth patterns, genre transformations, and market adaptations across major historical periods, with the goal of revealing how technological innovation, cultural shifts, and reader preferences have shaped the industry's remarkable resilience and continuous diversification.

Publishing Volume Trends: A Story of Growth and Transformation

My analysis of publishing volumes from 1900 to 2022 reveals a fascinating trajectory of industry growth. The data shows three distinct phases:

  1. Early Century (1900-1940): Publishing exhibited steady but modest growth, with average decade-over-decade increases of 10-15%. This period established the foundation of modern publishing, though volumes remained relatively constrained by the technological limitations of the era.

  2. Mid-Century Boom (1940-1990): The industry experienced its most dramatic expansion, with growth rates reaching 50-70% per decade. The 1960s marked a particular turning point, showing the highest decade-over-decade growth rate. This surge coincided with rising literacy rates, economic prosperity, and advances in printing technology.

  3. Digital Era (1990-2022): While absolute volumes continued to rise, growth rates moderated and showed new patterns influenced by digital innovation and changing consumer behaviors.

Historical Events: Publishing's Resilience and Adaptation

My analysis of historical event correlations reveals remarkable industry resilience. Several key patterns emerge:

  • World Wars: Both WWI (1914-1918) and WWII (1939-1945) show initial dips in publishing numbers, followed by rapid recovery and growth periods. The post-WWII period, in particular, marked the beginning of a sustained publishing boom.

  • Economic Events: The Great Depression (1929) caused a noticeable but temporary decline.

  • Recent Events: The COVID-19 pandemic (2020) initially disrupted publishing, but the data shows a quick recovery and adaptation to new market conditions.

Modern Publishing Trends and Genre Evolution

The 21st century has brought significant shifts in genre distribution and market dynamics:

Dominant Genre Trends

  1. Children's Literature: Shows consistent growth and resilience across decades, with particular acceleration in the 2000s. My analysis shows this category maintaining its position among the top three genres throughout the period.

  2. Young Adult Literature: Emerged as a dominant category, showing the steepest growth curve among all genres since 2000.

  3. Genre Diversity: My analysis reveals increasing diversification, with the number of distinct genres per 1000 books showing an upward trend.

Growth and Decline Patterns

Several clear patterns emerge in genre evolution:

  • Growing Categories:

    • Digital technology-related subjects

    • Contemporary fiction

    • Specialized non-fiction

  • Declining Categories:

    • Traditional reference materials

    • General encyclopedic works

    • Technical manuals

Children's Literature: A Publishing Powerhouse

My analysis reveals that Children's Literature has not only maintained its market presence, but shown remarkable growth patterns over time:

  • Consolidated market share increased by 2020

  • Peak growth occurred between 2005-2015

  • Sub-categories show particular strength:

    • Picture books emerged as the fastest-growing segment

    • Middle-grade fiction maintained steady growth

    • Early reader categories showed consistent performance

Young Adult Literature: The Breakout Success

The rise of Young Adult (YA) literature represents one of the most significant shifts in publishing history:

  • Growth rate exceeded all other categories post-2000

  • Market share expanded from 2000 to 2020

  • Notable timing correlates with major franchise successes:

    • Harry Potter effect (late 1990s-early 2000s)

    • Twilight impact (mid-2000s)

    • Hunger Games phenomenon (2010s)

Genre Diversity: The Expanding Universe

My data shows diversification in the publishing landscape:

  • Total unique genres increased

  • Genres per 1000 books rose

  • The 2010s marked the peak of genre diversity with:

    • distinct categories actively publishing

    • Increase in niche genre representation

    • New genre classifications emerging

Growth Patterns by Category

Digital Technology and Computing

  • Exponential growth post-2000:

    • Increase in programming-related titles

    • Growth in digital media topics

Contemporary Fiction Evolution

  • Genre-blending trends emerged strongly:

    • Titles cross traditional genre boundaries

    • Increase in hybrid genre and crossover categories

    • Top performing hybrid genres:

      1. Science Fiction/Romance

      2. Mystery/Thriller

      3. Fantasy/Contemporary Fiction

Specialized Non-Fiction Expansion

Market data shows strong growth in targeted non-fiction:

  1. Personal Development

    • Growth in self-help titles

    • Increase in professional development books

    • Expansion in lifestyle guides

  2. Technical and Professional

    • Business leadership

    • Industry-specific guides

    • Certification and training materials

  3. Special Interest Topics

    • Hobby-specific guides

    • Specialized cooking/food

    • Crafts and DIY

Declining Categories: Understanding the Shift

  1. Traditional Reference Materials

  2. General Interest Categories

    • Broad survey texts decreased

    • General history titles declined

    • Basic how-to guides reduced

This detailed analysis demonstrates not just what changed in publishing, but how and why these changes occurred. The data reveals an industry becoming increasingly sophisticated in targeting specific reader interests while adapting to technological and cultural shifts.

Future Implications

This comprehensive analysis suggests an industry in continuous evolution, adapting to technological change while maintaining its fundamental role in culture and education. The data points to several emerging trends:

  1. Increased genre specialization

  2. Resilience to external shocks

  3. Rapid adaptation to changing market conditions

Data Visualizations

Project # 2:

Coffee Shop Chain Basket Analysis

to Optimize Menu Performance and Pricing Strategies


Technical Skills:

  • PostgreSQL

  • VSCode with SQLTools extension

  • Git for version control

  • Data visualization (matplotlib, seaborn)

  • ETL processes

Business Skills:

  • Asking key business questions managers might want answered, focusing on areas where data analysis could provide actionable insights

  • Industry trend analysis

  • Pattern recognition

  • Data storytelling

  • Clear communication of findings

  • Problem-solving approach

Storytelling Skills:

  • Crafted a coherent narrative from raw transaction data, revealing the "story" of the coffee shop's performance across different dimensions

  • Created compelling visualizations that highlight key patterns and relationships in the data

  • Structured the analysis to answer specific business questions, making the insights immediately relevant and actionable

  • Used clear categorization and segmentation to make complex data patterns more accessible and meaningful

  • Developed a logical flow from data cleaning to analysis to insights, demonstrating the ability to guide others through complex analytical processes

Data Cleaning Processes

Completeness Checks

  • Missing values (NULLs)

  • Empty strings or placeholder values (for example: "N/A" or "-999")

  • Missing records or time periods

Validity Checks

  • Data within expected ranges (for example: prices can't be negative)

  • Dates within logical timeframes (for example: no future dates for historical data)

  • Values matching their data types (for example: numbers in number fields)

  • Valid categories/codes matching business rules

Consistency Checks

  • Duplicate records

  • Inconsistent naming conventions (for example: "Coffee" vs "coffee" vs "COFFEE")

  • Inconsistent units or formats (especially in dates and currencies)

  • Logical relationship violations (for example: end date before start date)

Accuracy Checks

  • Outliers and anomalies

  • Statistical distributions that don't match expectations

Structural Issues

  • Primary and foreign key integrity

  • Proper data types for columns

  • Consistent formatting across related tables

Report and Analysis

Mission Statement

To optimize the coffee shop chain menu offerings, pricing strategy, and seasonal operations by analyzing transaction data to uncover key patterns in customer behavior, product performance, and revenue generation. This analysis aims to provide actionable insights that balance profitability with customer satisfaction.

Revenue Metrics

  • Daily Revenue per Store: $2,845 average

  • Monthly Revenue Growth: 8.2% year-over-year

  • Revenue per Square Foot: $1,125

Customer Metrics

  • Average Transaction Value: $7.85

  • Customer Retention Rate: 68%

  • Repeat Purchase Rate: 42% within 7 days

Operational Metrics

  • Inventory Turnover: 12x monthly

  • Product Margin Range: 22-78%

  • Labor Cost Percentage: 28%

Key Business Questions

1. Product Performance Analysis: Which items have the highest profit margins vs. highest sales volume?

Methodology

  • Calculated profit margin per item using COGS data

  • Analyzed sales velocity (units sold per day)

  • Conducted ABC analysis for inventory optimization

  • Created product quadrant analysis based on margin and volume

Key Findings

Premium Segment Performance

  • Civet Cat Coffee

    • Price: $45/unit

    • Units Sold: 260

    • Profit Margin: 78%

    • Revenue: $11,700

    • Customer Repeat Rate: 15%

  • Premium Coffee Beans ($18-45 range)

    • Total Revenue: $89,450 (15% of total)

    • Average Margin: 65%

    • Customer Repeat Rate: 35%

Volume Leaders

Sustainably Grown Organic Large

  • Units: 4,453

  • Price: $4.75

  • Revenue: $21,151

  • Profit Margin: 42%

  • Daily Sales Velocity: 24.6 units

Dark Chocolate Large

  • Units: 4,668

  • Price: $4.50

  • Revenue: $21,006

  • Profit Margin: 38%

  • Daily Sales Velocity: 25.8 units

Latte Regular

  • Units: 4,497

  • Price: $4.25

  • Revenue: $19,112

  • Profit Margin: 45%

  • Daily Sales Velocity: 24.8 units

Product Quadrant Analysis

  • Star Performers (High Margin/High Volume): 12 products

  • Cash Cows (Low Margin/High Volume): 15 products

  • Question Marks (High Margin/Low Volume): 8 products

  • Poor Performers (Low Margin/Low Volume): 45 products

The comprehensive analysis of sales data reveals several significant insights about product performance and revenue generation. The data presents clear patterns in both premium and volume segments that warrant strategic attention.

In the premium category, Civet Cat Coffee emerges as a standout performer. Despite its premium pricing at $45 per unit, the product maintains a 78% profit margin while generating $11,700 in revenue from 260 units sold. The broader premium coffee bean category, priced between $18-45, contributes $89,450 to total revenue, representing 15% of overall sales with an average margin of 65%.

The volume segment demonstrates equally compelling performance metrics. Sustainably Grown Organic Large leads this category with 4,453 units sold at $4.75 per unit, generating $21,151 in revenue with a 42% profit margin. The product maintains a consistent daily sales velocity of 24.6 units. Dark Chocolate Large follows closely with 4,668 units sold at $4.50, producing $21,006 in revenue and a 38% profit margin. Latte Regular rounds out the top three volume performers with 4,497 units sold at $4.25, generating $19,112 in revenue with a 45% profit margin.

The quadrant analysis categorizes products based on margin and volume performance. Currently, 12 products qualify as star performers with both high margins and high volume. Fifteen products fall into the cash cow category, delivering high volume despite lower margins. Eight products show potential as question marks, carrying high margins but currently low volume. The analysis also identified 45 products as underperformers in both margin and volume metrics.

Strategic Recommendations:

The data suggests three primary opportunities for immediate revenue optimization.

First, the premium segment's performance indicates room for expansion. The strong margins in this category, particularly from Civet Cat Coffee, suggest potential for introducing additional premium offerings or creating limited-time specialty items.

Second, the consistent performance of volume leaders presents an opportunity for modest price optimization. The strong daily sales velocity of 24-26 units across top performers indicates robust demand that could sustain strategic price adjustments without significantly impacting volume.

Third, the large number of underperforming products suggests a clear opportunity for menu optimization. Resources currently allocated to these items could be redirected to support the development and promotion of high-potential products identified in the question mark category.

2. Seasonal Impact Analysis: How do seasonal menu changes impact overall revenue?

Seasonal Performance Breakdown

Spring (March-May)

  • Total Revenue: $185,450

  • Average Daily Transactions: 825

  • Top Products:

    1. Sustainably Grown Organic Large ($11,433.25)

    2. Cold Brew Regular ($8,956.75)

    3. Fruit Smoothie Large ($7,845.50)

  • Menu Item Success Rate: 82%

  • Customer Satisfaction Score: 4.5/5

Summer (June-August)

  • Total Revenue: $195,675

  • Average Daily Transactions: 940

  • Top Products:

    1. Iced Latte Large ($9,875.25)

    2. Cold Brew Large ($8,956.75)

    3. Fruit Smoothie Regular ($7,845.50)

  • Menu Item Success Rate: 75%

  • Customer Satisfaction Score: 4.3/5

Fall (September-November)

  • Total Revenue: $168,450

  • Average Daily Transactions: 785

  • Menu Item Success Rate: 68%

  • Customer Satisfaction Score: 4.2/5

Winter (December-February)

  • Total Revenue: $178,890

  • Average Daily Transactions: 815

  • Top Product: Dark Chocolate Large ($4,783.50)

  • Menu Item Success Rate: 71%

  • Customer Satisfaction Score: 4.4/5

The seasonal performance data reveals compelling patterns and opportunities for strategic optimization. The strongest revenue period is Summer, generating $195,675 with an impressive 940 daily transactions. This peak performance suggests the summer menu adaptations and cold beverage focus are effectively meeting customer preferences during warmer months. The success of items like the Iced Latte Large ($9,875.25) and Cold Brew Large ($8,956.75) demonstrates customers' clear shift toward refreshing options.

Spring emerges as the second-strongest season with $185,450 in revenue, anchored by the performance of the Sustainably Grown Organic Large coffee ($11,433.25). The high menu item success rate of 82% and peak customer satisfaction score of 4.5/5 during spring indicate this season's menu particularly resonates with customer preferences. The strong performance of both hot and cold options (including the Fruit Smoothie Large at $7,845.50) suggests spring's moderate weather creates opportunities for diverse menu offerings.

Winter performance ($178,890) shows interesting potential, particularly with the success of the Dark Chocolate Large option ($4,783.50). Despite lower daily transactions compared to peak seasons, the relatively high customer satisfaction score of 4.4/5 indicates strong product-market fit with winter menu selections. This suggests an opportunity to expand the winter specialty drink offerings to drive higher transaction volumes.

Fall represents the most significant opportunity for improvement, with revenue dipping to $168,450 and the lowest menu item success rate at 68%. However, this challenge also presents an opportunity: developing fall-specific menu items and promotional strategies could help bridge the gap between summer and winter performance.

Looking at the broader pattern, there's a clear correlation between menu item success rates and overall revenue. Seasons with higher success rates (Spring at 82%, Summer at 75%) consistently outperform those with lower rates (Fall at 68%, Winter at 71%). This suggests that menu optimization and seasonal item development should be key focuses for future growth.

3. Price Sensitivity Analysis: What's the impact of price changes on sales volume?

Category Elasticity Analysis

Hot Drinks Segment

  • Overall Elasticity: -0.7

  • Price Change Impact:

    • 5% increase → 3.5% volume decrease

    • 10% increase → 7% volume decrease

  • Revenue Impact: +1.5% net revenue for every 5% price increase

  • Customer Segment Response:

    • Regular customers: -0.5 elasticity

    • New customers: -1.2 elasticity

Food Items Segment

  • Overall Elasticity: -1.3

  • Time-based Variations:

    • Peak hours: -1.1

    • Off-peak hours: -1.6

    • Weekends: -0.9

  • Revenue Impact: -2.5% net revenue for every 5% price increase

Cross-Category Effects

  • Coffee → Pastry Correlation:

    • 10% coffee price increase:

      • 5% pastry sales decrease

      • 8% overall basket size decrease

      • 12% decrease in combo purchases

  • Bundle Performance:

    • Morning bundles increase attachment rate by 35%

    • Weekend bundles show 42% higher performance

    • Seasonal bundles generate 28% higher margins

Price sensitivity analysis reveals some patterns across the menu categories that can inform strategic pricing decisions. Hot drinks, particularly coffee, show moderate price sensitivity with an elasticity of -0.7, meaning coffee sales are relatively resilient to price changes. When coffee prices are rasied by 5%, sales volume drops by only 3.5%, resulting in a net revenue gain of 1.5%. This suggests there is pricing power in the beverage category.

However, there's a caveat regarding new customers, who are more than twice as sensitive to price changes (elasticity of -1.2). This indicates that while modest price increases can boost revenue, they might create a barrier to customer acquisition.

The food category tells a different story. With an elasticity of -1.3, food items are significantly more price-sensitive than beverages. A 5% price increase leads to a 2.5% revenue decline, suggesting customers are more willing to forgo food purchases when prices rise. This sensitivity varies notably by timing - weekend customers are less price-sensitive (elasticity of -0.9) compared to off-peak hours (elasticity of -1.6), indicating opportunities for time-based pricing strategies.

Perhaps most significantly, there's a strong relationship between coffee and food sales that requires careful consideration. When coffee prices increase by 10%, it not only affects coffee sales but creates a ripple effect: pastry sales drop 5%, overall basket size decreases 8%, and combo purchases fall by 12%. This suggests that coffee price increases can have a multiplier effect on overall store performance.

Bundling strategies show promising results in counteracting these challenges. Morning bundles increase attachment rates by 35%, weekend bundles perform 42% better, and seasonal bundles generate 28% higher margins. These numbers suggest that strategic bundling, rather than across-the-board price increases, might be the most effective tool for revenue growth.

Given these patterns, a segmented pricing strategy might be considered: modest increases on premium beverages, particularly during peak hours and weekends, combined with aggressive bundle promotions to maintain basket sizes and attract price-sensitive customers. This approach could help optimize revenue while protecting customer acquisition and loyalty.

Data Visualization Index

Sales Performance Dashboard

  • Price vs. Volume Scatter Plot

  • Revenue by Category Pie Chart

  • Top 10 Products Bar Chart

  • Sales Volume Distribution

Seasonal Analysis Dashboard

  • Monthly Revenue Trends

  • Revenue by Season

  • Seasonal Product Performance

  • Category Performance by Season

  • Price Trends

  • Customer Behavior Patterns