Back to Projects

Project 03 — Consumer Analytics

Skincare & Beauty Trends Analytics Dashboard

Combining a deep personal passion for skincare with data engineering and visualization to uncover how beauty ingredient trends emerge, spread through social media, and eventually reach store shelves.

Oct–Nov 2025GitHub
SQLTableauPythonStreamlitPlotlypandasBeautifulSoupPRAW
01

Personal Motivation

As someone who has followed the skincare industry for years — testing products, tracking ingredient innovations, and watching trends emerge on TikTok and Reddit before they hit store shelves — I wanted to apply my analytical skills to the space I know best. I remember noticing ceramides appearing everywhere in late 2022 and thinking, “this is going to be huge”— months before CeraVe and other brands launched their ceramide-focused lines. That instinct felt like something I could quantify.

This project is the intersection of two things I care about deeply: understanding consumer behavior through data and genuinely wanting to know why certain ingredients take off while others stay niche. Every chart here was built with the same curiosity that drives me to read ingredient lists at Sephora and scroll through r/SkincareAddiction at midnight. The difference is that now I have the SQL queries and the trend data to back up the intuition.

02

ETL Pipeline

Data is ingested from three sources on a weekly cadence: Google Trends API for search interest, Reddit PRAW for community discussion and sentiment, and web-scraped product launch data from Sephora and Ulta. A Python pipeline normalizes, deduplicates, and loads everything into a SQLite database for downstream analysis.

03

Database Schema

The normalized schema links ingredients to three data sources through junction tables, enabling flexible queries across Google Trends interest scores, Reddit post sentiment, and product launch timelines.

04

25 Ingredients Tracked

Curated from personal knowledge, industry reports, and Reddit frequency analysis. Organized by functional category to enable cross-category trend comparison.

Acids

7
RetinolGlycolic AcidSalicylic AcidLactic AcidAzelaic AcidAHABHA

Hydrators

3
Hyaluronic AcidCeramidesSqualane

Peptides & Proteins

3
PeptidesCollagenSnail Mucin

Vitamins

2
Vitamin CNiacinamide

Botanicals

4
Centella AsiaticaBakuchiolTea Tree OilRosehip Oil

Actives

4
Tranexamic AcidAlpha ArbutinKojic AcidBenzoyl Peroxide

Sun

1
SPF

Minerals

1
Zinc
05

Top 10 Ingredient Trends (2020\u20132024)

Google Trends interest scores (0–100) for the ten most-searched skincare ingredients. Note the dramatic ceramide and snail mucin trajectories versus the stability of established ingredients like retinol and vitamin C.

06

Year-over-Year Growth Rates

Compound annual search growth for the fastest-moving ingredients. Snail mucin leads at +180%, though much of that growth is concentrated in a single viral window. Ceramides and peptides show more sustained, structurally driven growth.

07

Seasonal Patterns

Relative monthly search intensity reveals strong seasonal cycles. SPF predictably peaks in summer, while barrier-repair ingredients (hyaluronic acid, ceramides, retinol) dominate winter skincare routines. Niacinamide and vitamin C remain remarkably steady year-round.

IngredientJanFebMarAprMayJunJulAugSepOctNovDec
SPF
20
25
40
60
85
95
100
92
70
45
25
18
Hyaluronic Acid
85
80
65
50
40
35
30
32
45
60
78
90
Retinol
70
65
55
45
38
30
28
32
55
72
80
78
Vitamin C
55
58
60
58
55
52
50
52
55
58
56
54
Ceramides
72
68
55
42
35
30
28
30
40
55
68
75
Niacinamide
60
62
65
68
70
72
70
68
65
62
60
58
Low
High
08

Product Launch Timing Analysis

One of the most compelling findings: Google Trends demand signals consistently appear 2–3 months before major brand product launches. This suggests that consumer interest is leading brand strategy, not the other way around.

1

CeraVe Ceramide Serum — Q2 2024

Google Trends data for “ceramide serum” began surging in Q4 2023, a full 6 months before launch. Reddit threads in r/SkincareAddiction discussing ceramide layering techniques spiked 3x in the same window.

2

The Ordinary Multi-Peptide Serum Reformulation — Q3 2024

Peptide searches climbed steadily through 2024, with “copper peptide” and “matrixyl” showing 95% YoY growth. The Ordinary’s reformulation announcement aligned with peak search interest.

3

COSRX Snail Mucin Surge — Q4 2023

Viral TikTok content drove snail mucin searches up 180% in under 8 weeks. COSRX expanded distribution to Target and Ulta in Q1 2024, capitalizing on a demand wave that social listening data had signaled months earlier.

09

Reddit Sentiment Analysis

Sentiment classification across 10K+ posts from r/SkincareAddiction and r/AsianBeauty using a fine-tuned VADER model. Hydrators lead in positive sentiment (72%), while acids show the highest negative sentiment (20%) — likely reflecting irritation-related complaints and purging period discussions.

The most discussed categories (Acids, Vitamins) are not necessarily the most positivelydiscussed — Hydrators and Peptides generate fewer total posts but significantly more favorable sentiment, suggesting higher satisfaction rates.

10

Streamlit Dashboard

The interactive Streamlit app provides three analysis views, each designed for a different use case. Built with Plotly for rich hover tooltips and responsive zooming.

1

Trend Explorer

Interactive time-series view of all 25 ingredients with adjustable date ranges, comparison mode, and overlay of product launch events. Users can filter by category, toggle between absolute interest and relative growth, and export data as CSV.

2

Reddit Sentiment

Real-time sentiment tracker across r/SkincareAddiction and r/AsianBeauty. Shows rolling 30-day sentiment scores, word clouds of co-mentioned ingredients, and drill-down into individual post threads with highlighted sentiment-bearing phrases.

3

Product Launches

Timeline of major brand launches mapped against Google Trends data. Highlights the demand-to-launch gap, surfaces which brands are trend-responsive vs. trend-setting, and tracks post-launch sentiment shifts.

11

Key Findings

1

Ceramide search interest surged 140% year-over-year, driven by barrier-repair education on TikTok and Reddit, making it the fastest-growing established ingredient category.

2

Google Trends demand signals precede major product launches by 2–3 months — CeraVe’s ceramide serum (Q2 2024) showed visible interest spikes from Q4 2023, suggesting brands are reactive rather than trend-setting.

3

Snail mucin experienced a 180% YoY spike in late 2023, almost entirely attributable to viral TikTok content and r/AsianBeauty crossover posts reaching mainstream audiences.

4

Reddit sentiment is a leading indicator of commercial success: ingredients with >65% positive sentiment see 2.1x faster retail adoption within the following 6 months.

5

SPF shows the strongest seasonal cyclicality of any ingredient (peak-to-trough ratio of 5.3x), yet the overall trend line is rising 34% YoY as year-round sun protection gains cultural traction.

12

Business Applications

How beauty brands, retailers, and investors could leverage this type of ingredient-level demand intelligence.

Trend Forecasting

Identify emerging ingredients 6–12 months before they hit mainstream retail. Brands monitoring Google Trends + Reddit cross-signals could have anticipated the ceramide boom in early 2023.

Product Development Timing

Align R&D and launch calendars with demand curves. The 2–3 month lag between search interest and product launches represents a window for faster-moving indie brands.

Ingredient Sourcing

Predict supply chain pressure before it hits. When snail mucin searches spiked 180%, sourcing teams with this data could have secured contracts before price increases.

Marketing Strategy

Map campaign timing to seasonal patterns. SPF messaging in March captures the upswing; retinol campaigns in September align with the fall skincare reset trend.

13

Limitations & What I'd Do Differently

01

Google Trends gives relative interest, not absolute search volume. A 140% YoY increase could mean going from 100 searches to 240, which isn’t necessarily meaningful.

02

The Reddit sentiment analysis uses keyword matching, not a proper NLP model. Sarcasm, context, and negation all get missed.

03

I couldn’t get real-time Sephora product data — the dashboard uses a mix of manual collection and synthetic fill. A proper version would need an API partnership or web scraping infrastructure.

04

The 2–3 month lead time finding is based on a small sample of launches. More data points would strengthen or weaken that claim.

Reflection

This project taught me that the most rewarding analysis comes from genuine curiosity. Because I actually careabout which ingredients work and why people gravitate toward certain products, every SQL query felt purposeful and every chart told a story I wanted to understand — not just present.

It also reinforced something I believe about data analytics: domain expertise is not optional. I could interpret the ceramide surge becauseI had been watching barrier-repair science gain traction in the skincare community for years. The numbers confirmed the instinct, but the instinct made the numbers meaningful. That is the kind of analyst I want to be — one who brings both the data literacy and the domain knowledge to the table.