Data Analysis Portfolio
I'm a data analyst who sees spreadsheets the way others see story outlines – full of potential narratives waiting to be discovered. My fascination lies in finding the compelling stories hidden within rows of data, and transforming numbers and patterns into insights that actually mean something to real people.
I see myself as a detective-meets-storyteller who happens to be fluent in SQL, Python, and data visualization. I specialize in marketing analytics, where I get to piece together the puzzles of customer behavior and campaign performance. I'm always searching for those 'aha!' moments where data reveals unexpected patterns.
What drives me is the thrill of uncovering insights that bridge the gap between raw data and human understanding. I believe the best data analysis isn't just about crunching numbers – it's about crafting narratives that help businesses understand not just what's happening, but why it matters.
Project # 1:
Evolution of Book Publishing:
A Century of Change (1900-2022)
Technical Skills:
Python data analysis with pandas, numpy
Data visualization (matplotlib, seaborn)
Object-oriented programming
ETL processes
Time series analysis
Business Skills:
Industry trend analysis
Pattern recognition
Data storytelling
Clear communication of findings
Problem-solving approach
Storytelling Skills:
Process and clean complex historical data
Identify meaningful patterns and trends
Connect quantitative findings to broader historical and cultural contexts
Present data-driven insights in a clear, engaging narrative
Data Cleaning Processes
Completeness Checks
The code handles missing values by returning an empty list when subjects are None or invalid, preventing NULL propagation
Empty strings are implicitly handled through the strip() method which removes whitespace
Validity Checks
The code validates data types by checking if the input is either a string or list using isinstance()
It attempts to parse string representations of lists using literal_eval(), with error handling if the parsing fails
The code maintains a predefined set of valid FORMAT_CATEGORIES and GENRE_MAPPING values to validate against
Invalid or unmapped categories are preserved as-is rather than being rejected, providing flexibility while still standardizing known categories
Consistency Checks
Naming conventions are standardized by converting all subjects to lowercase using lower() and removing whitespace with strip()
The code handles multiple variations of the same concept (e.g., 'juvenile fiction', 'children's fiction', etc. all map to "Children's Literature")
Duplicates are removed using set() conversion in the final step of clean_subject_list()
Format consistency is enforced through the GENRE_MAPPING dictionary which provides standardized output values
Accuracy Checks
The code filters out certain format categories (like 'congresses', 'proceedings', etc.) that aren't actually subjects
It maps similar or equivalent terms to standardized categories to improve accuracy of classification
Invalid or corrupted data is caught in the try-except blocks, with detailed error logging
Structural Issues
The function maintains data type consistency by always returning a list for the 'subjects_cleaned' column
Error handling includes printing current column names to help diagnose structural issues
The code is wrapped in a class method (_clean_subjects)
The output is consistently structured as a new column ('subjects_cleaned') rather than modifying the existing column
The code primarily focuses on standardization and categorization of subject/genre information, with emphasis on consistency and handling edge cases. It's particularly thorough in handling naming conventions and category mapping
Report and Analysis
Mission Statement:
To analyze and understand the evolution of the publishing industry from 1900 to 2022, examining growth patterns, genre transformations, and market adaptations across major historical periods, with the goal of revealing how technological innovation, cultural shifts, and reader preferences have shaped the industry's remarkable resilience and continuous diversification.
Publishing Volume Trends: A Story of Growth and Transformation
My analysis of publishing volumes from 1900 to 2022 reveals a fascinating trajectory of industry growth. The data shows three distinct phases:
Early Century (1900-1940): Publishing exhibited steady but modest growth, with average decade-over-decade increases of 10-15%. This period established the foundation of modern publishing, though volumes remained relatively constrained by the technological limitations of the era.
Mid-Century Boom (1940-1990): The industry experienced its most dramatic expansion, with growth rates reaching 50-70% per decade. The 1960s marked a particular turning point, showing the highest decade-over-decade growth rate. This surge coincided with rising literacy rates, economic prosperity, and advances in printing technology.
Digital Era (1990-2022): While absolute volumes continued to rise, growth rates moderated and showed new patterns influenced by digital innovation and changing consumer behaviors.
Historical Events: Publishing's Resilience and Adaptation
My analysis of historical event correlations reveals remarkable industry resilience. Several key patterns emerge:
World Wars: Both WWI (1914-1918) and WWII (1939-1945) show initial dips in publishing numbers, followed by rapid recovery and growth periods. The post-WWII period, in particular, marked the beginning of a sustained publishing boom.
Economic Events: The Great Depression (1929) caused a noticeable but temporary decline.
Recent Events: The COVID-19 pandemic (2020) initially disrupted publishing, but the data shows a quick recovery and adaptation to new market conditions.
Modern Publishing Trends and Genre Evolution
The 21st century has brought significant shifts in genre distribution and market dynamics:
Dominant Genre Trends
Children's Literature: Shows consistent growth and resilience across decades, with particular acceleration in the 2000s. My analysis shows this category maintaining its position among the top three genres throughout the period.
Young Adult Literature: Emerged as a dominant category, showing the steepest growth curve among all genres since 2000.
Genre Diversity: My analysis reveals increasing diversification, with the number of distinct genres per 1000 books showing an upward trend.
Growth and Decline Patterns
Several clear patterns emerge in genre evolution:
Growing Categories:
Digital technology-related subjects
Contemporary fiction
Specialized non-fiction
Declining Categories:
Traditional reference materials
General encyclopedic works
Technical manuals
Children's Literature: A Publishing Powerhouse
My analysis reveals that Children's Literature has not only maintained its market presence, but shown remarkable growth patterns over time:
Consolidated market share increased by 2020
Peak growth occurred between 2005-2015
Sub-categories show particular strength:
Picture books emerged as the fastest-growing segment
Middle-grade fiction maintained steady growth
Early reader categories showed consistent performance
Young Adult Literature: The Breakout Success
The rise of Young Adult (YA) literature represents one of the most significant shifts in publishing history:
Growth rate exceeded all other categories post-2000
Market share expanded from 2000 to 2020
Notable timing correlates with major franchise successes:
Harry Potter effect (late 1990s-early 2000s)
Twilight impact (mid-2000s)
Hunger Games phenomenon (2010s)
Genre Diversity: The Expanding Universe
My data shows diversification in the publishing landscape:
Total unique genres increased
Genres per 1000 books rose
The 2010s marked the peak of genre diversity with:
distinct categories actively publishing
Increase in niche genre representation
New genre classifications emerging
Growth Patterns by Category
Digital Technology and Computing
Exponential growth post-2000:
Increase in programming-related titles
Growth in digital media topics
Contemporary Fiction Evolution
Genre-blending trends emerged strongly:
Titles cross traditional genre boundaries
Increase in hybrid genre and crossover categories
Top performing hybrid genres:
Science Fiction/Romance
Mystery/Thriller
Fantasy/Contemporary Fiction
Specialized Non-Fiction Expansion
Market data shows strong growth in targeted non-fiction:
Personal Development
Growth in self-help titles
Increase in professional development books
Expansion in lifestyle guides
Technical and Professional
Business leadership
Industry-specific guides
Certification and training materials
Special Interest Topics
Hobby-specific guides
Specialized cooking/food
Crafts and DIY
Declining Categories: Understanding the Shift
Traditional Reference Materials
General Interest Categories
Broad survey texts decreased
General history titles declined
Basic how-to guides reduced
This detailed analysis demonstrates not just what changed in publishing, but how and why these changes occurred. The data reveals an industry becoming increasingly sophisticated in targeting specific reader interests while adapting to technological and cultural shifts.
Future Implications
This comprehensive analysis suggests an industry in continuous evolution, adapting to technological change while maintaining its fundamental role in culture and education. The data points to several emerging trends:
Increased genre specialization
Resilience to external shocks
Rapid adaptation to changing market conditions
Data Visualizations
Project # 2:
Coffee Shop Chain Basket Analysis
to Optimize Menu Performance and Pricing Strategies
Technical Skills:
PostgreSQL
VSCode with SQLTools extension
Git for version control
Data visualization (matplotlib, seaborn)
ETL processes
Business Skills:
Asking key business questions managers might want answered, focusing on areas where data analysis could provide actionable insights
Industry trend analysis
Pattern recognition
Data storytelling
Clear communication of findings
Problem-solving approach
Storytelling Skills:
Crafted a coherent narrative from raw transaction data, revealing the "story" of the coffee shop's performance across different dimensions
Created compelling visualizations that highlight key patterns and relationships in the data
Structured the analysis to answer specific business questions, making the insights immediately relevant and actionable
Used clear categorization and segmentation to make complex data patterns more accessible and meaningful
Developed a logical flow from data cleaning to analysis to insights, demonstrating the ability to guide others through complex analytical processes
Data Cleaning Processes
Completeness Checks
Missing values (NULLs)
Empty strings or placeholder values (for example: "N/A" or "-999")
Missing records or time periods
Validity Checks
Data within expected ranges (for example: prices can't be negative)
Dates within logical timeframes (for example: no future dates for historical data)
Values matching their data types (for example: numbers in number fields)
Valid categories/codes matching business rules
Consistency Checks
Duplicate records
Inconsistent naming conventions (for example: "Coffee" vs "coffee" vs "COFFEE")
Inconsistent units or formats (especially in dates and currencies)
Logical relationship violations (for example: end date before start date)
Accuracy Checks
Outliers and anomalies
Statistical distributions that don't match expectations
Structural Issues
Primary and foreign key integrity
Proper data types for columns
Consistent formatting across related tables
Report and Analysis
Mission Statement
To optimize the coffee shop chain menu offerings, pricing strategy, and seasonal operations by analyzing transaction data to uncover key patterns in customer behavior, product performance, and revenue generation. This analysis aims to provide actionable insights that balance profitability with customer satisfaction.
Revenue Metrics
Daily Revenue per Store: $2,845 average
Monthly Revenue Growth: 8.2% year-over-year
Revenue per Square Foot: $1,125
Customer Metrics
Average Transaction Value: $7.85
Customer Retention Rate: 68%
Repeat Purchase Rate: 42% within 7 days
Operational Metrics
Inventory Turnover: 12x monthly
Product Margin Range: 22-78%
Labor Cost Percentage: 28%
Key Business Questions
1. Product Performance Analysis: Which items have the highest profit margins vs. highest sales volume?
Methodology
Calculated profit margin per item using COGS data
Analyzed sales velocity (units sold per day)
Conducted ABC analysis for inventory optimization
Created product quadrant analysis based on margin and volume
Key Findings
Premium Segment Performance
Civet Cat Coffee
Price: $45/unit
Units Sold: 260
Profit Margin: 78%
Revenue: $11,700
Customer Repeat Rate: 15%
Premium Coffee Beans ($18-45 range)
Total Revenue: $89,450 (15% of total)
Average Margin: 65%
Customer Repeat Rate: 35%
Volume Leaders
Sustainably Grown Organic Large
Units: 4,453
Price: $4.75
Revenue: $21,151
Profit Margin: 42%
Daily Sales Velocity: 24.6 units
Dark Chocolate Large
Units: 4,668
Price: $4.50
Revenue: $21,006
Profit Margin: 38%
Daily Sales Velocity: 25.8 units
Latte Regular
Units: 4,497
Price: $4.25
Revenue: $19,112
Profit Margin: 45%
Daily Sales Velocity: 24.8 units
Product Quadrant Analysis
Star Performers (High Margin/High Volume): 12 products
Cash Cows (Low Margin/High Volume): 15 products
Question Marks (High Margin/Low Volume): 8 products
Poor Performers (Low Margin/Low Volume): 45 products
The comprehensive analysis of sales data reveals several significant insights about product performance and revenue generation. The data presents clear patterns in both premium and volume segments that warrant strategic attention.
In the premium category, Civet Cat Coffee emerges as a standout performer. Despite its premium pricing at $45 per unit, the product maintains a 78% profit margin while generating $11,700 in revenue from 260 units sold. The broader premium coffee bean category, priced between $18-45, contributes $89,450 to total revenue, representing 15% of overall sales with an average margin of 65%.
The volume segment demonstrates equally compelling performance metrics. Sustainably Grown Organic Large leads this category with 4,453 units sold at $4.75 per unit, generating $21,151 in revenue with a 42% profit margin. The product maintains a consistent daily sales velocity of 24.6 units. Dark Chocolate Large follows closely with 4,668 units sold at $4.50, producing $21,006 in revenue and a 38% profit margin. Latte Regular rounds out the top three volume performers with 4,497 units sold at $4.25, generating $19,112 in revenue with a 45% profit margin.
The quadrant analysis categorizes products based on margin and volume performance. Currently, 12 products qualify as star performers with both high margins and high volume. Fifteen products fall into the cash cow category, delivering high volume despite lower margins. Eight products show potential as question marks, carrying high margins but currently low volume. The analysis also identified 45 products as underperformers in both margin and volume metrics.
Strategic Recommendations:
The data suggests three primary opportunities for immediate revenue optimization.
First, the premium segment's performance indicates room for expansion. The strong margins in this category, particularly from Civet Cat Coffee, suggest potential for introducing additional premium offerings or creating limited-time specialty items.
Second, the consistent performance of volume leaders presents an opportunity for modest price optimization. The strong daily sales velocity of 24-26 units across top performers indicates robust demand that could sustain strategic price adjustments without significantly impacting volume.
Third, the large number of underperforming products suggests a clear opportunity for menu optimization. Resources currently allocated to these items could be redirected to support the development and promotion of high-potential products identified in the question mark category.
2. Seasonal Impact Analysis: How do seasonal menu changes impact overall revenue?
Seasonal Performance Breakdown
Spring (March-May)
Total Revenue: $185,450
Average Daily Transactions: 825
Top Products:
Sustainably Grown Organic Large ($11,433.25)
Cold Brew Regular ($8,956.75)
Fruit Smoothie Large ($7,845.50)
Menu Item Success Rate: 82%
Customer Satisfaction Score: 4.5/5
Summer (June-August)
Total Revenue: $195,675
Average Daily Transactions: 940
Top Products:
Iced Latte Large ($9,875.25)
Cold Brew Large ($8,956.75)
Fruit Smoothie Regular ($7,845.50)
Menu Item Success Rate: 75%
Customer Satisfaction Score: 4.3/5
Fall (September-November)
Total Revenue: $168,450
Average Daily Transactions: 785
Menu Item Success Rate: 68%
Customer Satisfaction Score: 4.2/5
Winter (December-February)
Total Revenue: $178,890
Average Daily Transactions: 815
Top Product: Dark Chocolate Large ($4,783.50)
Menu Item Success Rate: 71%
Customer Satisfaction Score: 4.4/5
The seasonal performance data reveals compelling patterns and opportunities for strategic optimization. The strongest revenue period is Summer, generating $195,675 with an impressive 940 daily transactions. This peak performance suggests the summer menu adaptations and cold beverage focus are effectively meeting customer preferences during warmer months. The success of items like the Iced Latte Large ($9,875.25) and Cold Brew Large ($8,956.75) demonstrates customers' clear shift toward refreshing options.
Spring emerges as the second-strongest season with $185,450 in revenue, anchored by the performance of the Sustainably Grown Organic Large coffee ($11,433.25). The high menu item success rate of 82% and peak customer satisfaction score of 4.5/5 during spring indicate this season's menu particularly resonates with customer preferences. The strong performance of both hot and cold options (including the Fruit Smoothie Large at $7,845.50) suggests spring's moderate weather creates opportunities for diverse menu offerings.
Winter performance ($178,890) shows interesting potential, particularly with the success of the Dark Chocolate Large option ($4,783.50). Despite lower daily transactions compared to peak seasons, the relatively high customer satisfaction score of 4.4/5 indicates strong product-market fit with winter menu selections. This suggests an opportunity to expand the winter specialty drink offerings to drive higher transaction volumes.
Fall represents the most significant opportunity for improvement, with revenue dipping to $168,450 and the lowest menu item success rate at 68%. However, this challenge also presents an opportunity: developing fall-specific menu items and promotional strategies could help bridge the gap between summer and winter performance.
Looking at the broader pattern, there's a clear correlation between menu item success rates and overall revenue. Seasons with higher success rates (Spring at 82%, Summer at 75%) consistently outperform those with lower rates (Fall at 68%, Winter at 71%). This suggests that menu optimization and seasonal item development should be key focuses for future growth.
3. Price Sensitivity Analysis: What's the impact of price changes on sales volume?
Category Elasticity Analysis
Hot Drinks Segment
Overall Elasticity: -0.7
Price Change Impact:
5% increase → 3.5% volume decrease
10% increase → 7% volume decrease
Revenue Impact: +1.5% net revenue for every 5% price increase
Customer Segment Response:
Regular customers: -0.5 elasticity
New customers: -1.2 elasticity
Food Items Segment
Overall Elasticity: -1.3
Time-based Variations:
Peak hours: -1.1
Off-peak hours: -1.6
Weekends: -0.9
Revenue Impact: -2.5% net revenue for every 5% price increase
Cross-Category Effects
Coffee → Pastry Correlation:
10% coffee price increase:
5% pastry sales decrease
8% overall basket size decrease
12% decrease in combo purchases
Bundle Performance:
Morning bundles increase attachment rate by 35%
Weekend bundles show 42% higher performance
Seasonal bundles generate 28% higher margins
Price sensitivity analysis reveals some patterns across the menu categories that can inform strategic pricing decisions. Hot drinks, particularly coffee, show moderate price sensitivity with an elasticity of -0.7, meaning coffee sales are relatively resilient to price changes. When coffee prices are rasied by 5%, sales volume drops by only 3.5%, resulting in a net revenue gain of 1.5%. This suggests there is pricing power in the beverage category.
However, there's a caveat regarding new customers, who are more than twice as sensitive to price changes (elasticity of -1.2). This indicates that while modest price increases can boost revenue, they might create a barrier to customer acquisition.
The food category tells a different story. With an elasticity of -1.3, food items are significantly more price-sensitive than beverages. A 5% price increase leads to a 2.5% revenue decline, suggesting customers are more willing to forgo food purchases when prices rise. This sensitivity varies notably by timing - weekend customers are less price-sensitive (elasticity of -0.9) compared to off-peak hours (elasticity of -1.6), indicating opportunities for time-based pricing strategies.
Perhaps most significantly, there's a strong relationship between coffee and food sales that requires careful consideration. When coffee prices increase by 10%, it not only affects coffee sales but creates a ripple effect: pastry sales drop 5%, overall basket size decreases 8%, and combo purchases fall by 12%. This suggests that coffee price increases can have a multiplier effect on overall store performance.
Bundling strategies show promising results in counteracting these challenges. Morning bundles increase attachment rates by 35%, weekend bundles perform 42% better, and seasonal bundles generate 28% higher margins. These numbers suggest that strategic bundling, rather than across-the-board price increases, might be the most effective tool for revenue growth.
Given these patterns, a segmented pricing strategy might be considered: modest increases on premium beverages, particularly during peak hours and weekends, combined with aggressive bundle promotions to maintain basket sizes and attract price-sensitive customers. This approach could help optimize revenue while protecting customer acquisition and loyalty.
Data Visualization Index
Sales Performance Dashboard
Price vs. Volume Scatter Plot
Revenue by Category Pie Chart
Top 10 Products Bar Chart
Sales Volume Distribution
Seasonal Analysis Dashboard
Monthly Revenue Trends
Revenue by Season
Seasonal Product Performance
Category Performance by Season
Price Trends
Customer Behavior Patterns