DEV Community

Cover image for I Built a Self-Hosted Google Trends Alternative with DuckDB
Prithwish Nath
Prithwish Nath

Posted on • Originally published at Medium

I Built a Self-Hosted Google Trends Alternative with DuckDB

TL;DR: Track SERP rankings, title changes, and competitor data Google Trends won’t show. Built with Python, DuckDB, and a CLI-first approach.

Google Trends will tell you if people are searching for “react” or “nextjs”. But it won’t tell you that Stack Overflow just got bumped from position #2 to #7, or that Vercel changed their landing page title five times this month trying to improve click-through rate.

If, say, you’re an indie dev launching a product, needing every edge available, that’s the data that actually matters to you.

So I spent a weekend building a tool to track it. I could’ve just paid for Ahrefs/Semrush etc. But building this taught me:

  • How SERP APIs work under the hood
  • How to model time-series data in SQL (and its gotchas)
  • How to calculate derived metrics (interest score) from raw data
  • How DuckDB handles analytical queries

…and also because I didn’t really want to spend anywhere near that much.😅

Ironically, focusing on CLI only made the tool more useful — I can use this, then pipe results to jq, schedule fetches with cron, and script complex workflows without fighting a web framework.

If I want a dashboard later, I can always add FastAPI in ~50 lines. But for now, the CLI is enough. Here’s how I built it.

If you’d like to tinker, the full code to this is on GitHub. Feel free to star, clone, fork, whatever: https://github.com/sixthextinction/duckdb-google-trends-basic/

Why Google Trends Isn’t Enough (And Why SEO Tools Cost $200/Month)

Google Trends answers one question really well: “How many people are searching for X?”

But if you’re building a product, writing technical content, or trying to rank for competitive keywords, you need to answer different questions:

  • Which competitors are winning in search results right now?
  • When did that tutorial site enter the top 10?
  • Is my rank drop because Google reshuffled the entire SERP, or just me?
  • What headlines are competitors A/B testing?

Tools like Ahrefs and SEMrush answer these questions. They cost $99–500/month, but I just wanted something I could self-host for the cost of API calls + would be doable as a weekend project.

Why I Use Google Results Volatility as a Proxy for Search Interest

This works because of ONE reason — when search interest in a term rises, Google’s top 10 results become volatile.

“Volatility” here simply means that new domains enter, rankings shift around, sites update their titles and snippets to capture more clicks, etc. You get the picture — essentially, the search engine results page becomes chaotic.

Conversely, when interest in a term is stable or declining, Google’s top 10 ossifies. The same Wikipedia article, the same W3Schools tutorial, the same official docs sit in positions 1–3 for months.

So I don’t really need to track raw search volume (which I can’t have access to — I’m not Google), I can just track these three things:

  • New domains entering top 10 because it’s a signal of rising interest or new content opportunities
  • Average rank improvement because it’s a signal of SERP instability
  • Domain overlap ratio because it measures how many domains persist between snapshots (complementing new domains)

Turns out, if I aggregate these three signals into a single 0–100 score (I’ll talk about the formula in just a bit), I get something that behaves remarkably like Google Trends — but tells me a lot more than just how many are searching.

The System Architecture

The entire system is about ~1000 lines of Python and runs locally with no server required.

Here’s how it works:

I started small with this one — using only daily snapshots, not live queries. Each run appends point-in-time data instead of overwriting. This way, over 7–30 days, I could build a local historical dataset that I could query freely.

I use DuckDB for this. For this workload (rank comparisons, volatility calculations, detecting new entrants), DuckDB’s SQL engine is ideal.

  • It’s columnar, so analytical queries over time-series data are fast. (If you want to know more, I covered columnar formats vs. JSON in this blog post)
  • It handles indexing, window functions (LAG(), PARTITION BY — which we will use extensively), and aggregations without needing a server or cloud warehouse.
  • It has an in-process design, meaning no separate database server — our project will need just a Python library and a file.
  • Plus, since it’s a single file (Our "database” just lives in data/serp_data.duckdb), backups are trivial and there’s zero configuration overhead.

Documentation - DuckDB

My Interest Score Formula

Every day, for each keyword, the system calculates a 0–100 “Search Interest Score” based on how much the SERP moved compared to the previous day.

I’m not gonna get into the math, but basically, I did some research on Google Trends scoring, adapted it for my needs, and split my scoring logic into 3 weighted parts:

1. New Domains Entering Top 10 (0–40 points)

new_domains = current_top10 — previous_top10

new_domains_score = min(len(new_domains) * 4, 40)

If 3 new sites enter the top 10, that’s 12 points. If 10 new sites appear (rare but possible during breaking news or major updates), that maxes out at 40 points.

2. Average Rank Improvement (0–30 points)

For each domain that appears in both snapshots

rank_improvement = previous_rank — current_rank

A positive value here means it moved up.

Now, average across all domains, normalized to -10 to +10 range

avg_improvement = mean(rank_improvements)

rank_improvement_score = min(max((avg_improvement + 10) / 20 * 30, 0), 30)

If the average site improved by 2 positions, that’s roughly 18 points. If rankings barely moved, this stays close to 15 (neutral).

3. Domain Overlap Ratio (0–30 points)

Finally, how many of today’s top 10 domains also appeared in yesterday’s top 10?

reshuffle_count = count(domains present in both current and previous top 10)

reshuffle_frequency = reshuffle_count / max(len(current_domains_set), 1)

reshuffle_score = reshuffle_frequency * 30

Let’s say 8 out of 10 domains carry over from yesterday, that’s 24 points. If only 3 carry over (meaning 7 are new — a massive reshuffle), that’s 9 points. This complements the new domains score by capturing continuity.

Total Score

So, taking all three parts together…

interest_score = new_domains_score + rank_improvement_score + reshuffle_score

High scores (60–100) = lots of movement = rising interest or major SERP disruption.

Low scores (0–40) = stable, ossified rankings = same old, same old. Established content is dominant.

What This Looks Like in Practice

I tested this by tracking “nextjs” for 7 days like this.

python main.py scores --query "nextjs" --days 7
Enter fullscreen mode Exit fullscreen mode

Here’s what the output looked like:

=== Interest Scores for 'nextjs' (last 7 days) ===  

Found 7 scores:  

| snapshot_date | interest_score | new_domains | avg_rank_improvement | reshuffle_freq |  
|---------------|----------------|-------------|----------------------|----------------|  
| 2026-02-01    | 45.2           | 2           | 1.5                  | 0.6            |  
| 2026-02-02    | 52.3           | 3           | 2.1                  | 0.7            |  
| 2026-02-03    | 38.7           | 1           | 0.8                  | 0.5            |  
| 2026-02-04    | 61.4           | 4           | 3.2                  | 0.8            |  
| 2026-02-05    | 42.1           | 2           | 1.2                  | 0.6            |  
| 2026-02-06    | 55.8           | 3           | 2.5                  | 0.7            |  
| 2026-02-07    | 48.3           | 2           | 1.8                  | 0.6            |  

Chart saved to: nextjs_trend.png  

Summary: Min: 38.7  Max: 61.4  Avg: 49.1
Enter fullscreen mode Exit fullscreen mode

Here’s that generated chart (I’m using basic matplotlibfor these):

Matplotlib generated Search Interest Trend graph for the term “nextjs”. I tracked this over 7 days in the tool, running once per day.

That spike on Feb 4 (interest_score = 61.4) indicates a major SERP reshuffle — probably a Google algorithm update or a major new tutorial entering the rankings.

Actually Building It

Before diving into the code, here’s the bird’s-eye view of how the system fits together.

The entire project is driven by a small CLI (powered by argparse) in main.py. This file doesn’t contain any scraping or analytics logic — it’s just the orchestration layer that wires everything together.

Read the full code for main.py here: https://github.com/sixthextinction/duckdb-google-trends-basic/blob/main/main.py

You run specific commands (fetchto get SERP data for a keyword, volatilityto analyze rank volatility for a keyword over a period of time, scoresto view interest scores for a keyword over time, etc.) like so:

python main.py fetch --keywords "python"  
python main.py volatility --query "python" --days 30  
python main.py scores --query "python" --days 90
Enter fullscreen mode Exit fullscreen mode

The CLI dispatch uses argparse subcommands to wire everything together:

def main():    
    parser = argparse.ArgumentParser(description="DuckDB Google Trends")    
    subparsers = parser.add_subparsers(dest='command', help='Commands')    

    # Each command gets its own parser with relevant arguments    
    fetch_parser = subparsers.add_parser('fetch', help='Fetch SERP snapshots')    
    fetch_parser.add_argument('--keywords', nargs='+', help='Keywords to track')    
    fetch_parser.add_argument('--num-results', type=int, default=10)    
    fetch_parser.add_argument('--delay', type=float, default=1.0)    

    scores_parser = subparsers.add_parser('scores', help='Show interest scores')    
    scores_parser.add_argument('--query', required=True)    
    scores_parser.add_argument('--days', type=int, default=90)    
    scores_parser.add_argument('--output', type=str)    

    # ... similar parsers for analyze, volatility, new-entrants, changes, calculate-scores    

    args = parser.parse_args()    
    commands = {    
        'fetch': cmd_fetch,    
        'analyze': cmd_analyze,    
        'volatility': cmd_volatility,    
        'new-entrants': cmd_new_entrants,    
        'changes': cmd_changes,    
        'calculate-scores': cmd_calculate_scores,    
        'scores': cmd_scores    
    }    
    commands[args.command](args)
Enter fullscreen mode Exit fullscreen mode

Our main.py defines commands that map directly to the questions we want to ask:

  • fetch — collect today’s SERP results for a set of keywords
  • analyze — inspect the shape of the collected data
  • volatility — measure how rankings change over time
  • new-entrants — detect URLs appearing for the first time
  • changes — track title and snippet updates
  • calculate-scores — recalculate interest scores for existing snapshots
  • scores — view the calculated interest score trend

For example, here’s the key command handler for the scorescommand (the other handlers follow the same pattern):

# Usage: python main.py scores --query "python" --days 90  
def cmd_scores(args):    
    """Show interest scores for a query"""    
    with SERPAnalytics() as analytics:    
        result = analytics.interest_scores(args.query, days=args.days)    

        print(f"\n=== Interest Scores for '{result['query']}' (last {result['days']} days) ===")    
        if len(result['results']) == 0:    
            print("No interest scores found")  
            print("Note: Interest scores require at least 2 snapshots on different days.")      
            print("To calculate scores for existing data, run:")      
            print(f"  python main.py calculate-scores --keywords {args.query}")       
            return    

        print(f"\nFound {len(result['results'])} scores:\n")    
        print(df_to_markdown(result['results']))    

        # Generate PNG chart    
        output_path = args.output or f"{args.query.replace(' ', '_')}_trend.png"    
        _generate_png_chart(result['results'], args.query, args.days, output_path)    
        print(f"\nChart saved to: {output_path}")
Enter fullscreen mode Exit fullscreen mode

At a high level, what can our project do?

  1. Fetch daily SERP snapshots for keywords (fetchcommand)
  2. Store those snapshots locally in DuckDB
  3. Run analytical queries over historical data using plain old SQL
  4. When you run python main.py scores --query “nextjs”, the CLI fetches interest scores from DuckDB and as an added bonus, generates a PNG chart using matplotlib. Note that this shows SERP movement, not raw search volume.

We don’t need servers, background workers, or dashboards here.

Now that we know how this tool works, let’s look at the major modules that make all this happen.

Module 1: Fetching SERP Data

All external data access is isolated in serp_client.py. I only have access to one SERP API — Bright Data — so I’ll only have to implement one class. Get your credentials here.

Bright Data SERP API

This approach does make it easy to extend it with other SERP APIs: just write another client, and include its credentials in your env file.

` Read the full code for serp_client.py here: https://github.com/sixthextinction/duckdb-google-trends-basic/blob/main/src/serp_client.py

`
"""

Bright Data SERP API client

"""

import os

import json

import requests

from typing import Dict, Any, Optional

from dotenv import load_dotenv

load_dotenv()

class BrightDataClient:

"""Client for Bright Data SERP API"""

def __init__( self,    
    api_key: Optional[str] = None,    
    zone: Optional[str] = None,    
    country: Optional[str] = None ):    
    env_api_key = os.getenv("BRIGHT_DATA_API_KEY")    
    env_zone = os.getenv("BRIGHT_DATA_ZONE")    
    env_country = os.getenv("BRIGHT_DATA_COUNTRY")    

    self.api_key = api_key or env_api_key    
    self.zone = zone or env_zone    
    self.country = country or env_country    
    self.api_endpoint = "https://api.brightdata.com/request"    

    if not self.api_key:    
        raise ValueError(    
            "BRIGHT_DATA_API_KEY must be provided via constructor or environment variable"    
        )    

    if not self.zone:    
        raise ValueError(    
            "BRIGHT_DATA_ZONE must be provided via constructor or environment variable"    
        )    

    self.session = requests.Session()    
    self.session.headers.update({    
        'Content-Type': 'application/json',    
        'Authorization': f'Bearer {self.api_key}'    
    })    

def search( self,    
    query: str,    
    num_results: int = 10,    
    language: Optional[str] = None,    
    country: Optional[str] = None ) -> Dict[str, Any]:    
    """Execute a Google search via Bright Data SERP API"""    
    search_url = (    
        f"https://www.google.com/search"    
        f"?q={requests.utils.quote(query)}"    
        f"&num={num_results}"    
        f"&brd_json=1"    
    )    

    if language:    
        search_url += f"&hl={language}&lr=lang_{language}"    

    target_country = country or self.country    

    payload = {    
        'zone': self.zone,    
        'url': search_url,    
        'format': 'json'    
    }    

    if target_country:    
        payload['country'] = target_country    

    try:    
        response = self.session.post(    
            self.api_endpoint,    
            json=payload,    
            timeout=30    
        )    
        response.raise_for_status()    
        result = response.json()    

        # Parse body JSON string if present    
        if isinstance(result, dict) and 'body' in result:    
            if isinstance(result['body'], str):    
                result['body'] = json.loads(result['body'])    
            # Return the parsed body content    
            return result['body']    

        return result    

    except requests.exceptions.HTTPError as e:    
        error_msg = f"Search request failed with HTTP {e.response.status_code}"    
        if e.response.text:    
            error_msg += f": {e.response.text[:200]}"    
        raise RuntimeError(error_msg) from e    
    except requests.exceptions.RequestException as e:    
        raise RuntimeError(f"Search request failed: {e}") from e
Enter fullscreen mode Exit fullscreen mode

`

This is just a thin wrapper around the Bright Data API. It takes a query, returns JSON with organic search results (title, snippet, URL, rank).

This module is called when we run the fetch command.


python main.py fetch --keywords "python" "javascript" "react"

This will connect to the Bright Data SERP API, and for each keyword, fetch Google search results (default of 10 per keyword, adjust as necessary), and extract + store organic results (title, snippet, URL, rank). Remember, interest scores require at least 2 snapshots on different days. You should fetch snapshots daily to build historical trends (cron job, or just running manually.)

Example output:


Fetching snapshots for 3 keywords…
[1/3] 'python': 10 results
[2/3] 'javascript': 10 results
[3/3] 'react': 10 results
Total snapshots in database: 30

Module 2: Storing Snapshots in DuckDB

Once SERP data is fetched by the previous module, it needs to be stored in DuckDB for analytical queries. That logic lives in duckdb_manager.py

Read the full code for duckdb_manager.py here: https://github.com/sixthextinction/duckdb-google-trends-basic/blob/main/src/duckdb_manager.py

First of all, let’s introduce the schema we’ll be using:

`
CREATE TABLE IF NOT EXISTS serp_snapshots (

snapshot_id BIGINT PRIMARY KEY,

query TEXT NOT NULL,

snapshot_date DATE NOT NULL,

snapshot_timestamp TIMESTAMP NOT NULL,

url TEXT NOT NULL,

title TEXT,

snippet TEXT,

domain TEXT,

rank INTEGER NOT NULL,

UNIQUE(query, snapshot_date, url)

)

-- Interest scores table (calculated from SERP movement between snapshots)

CREATE TABLE IF NOT EXISTS interest_scores (

query TEXT NOT NULL,

snapshot_date DATE NOT NULL,

interest_score DOUBLE NOT NULL,

new_domains_count INTEGER,

avg_rank_improvement DOUBLE,

reshuffle_frequency DOUBLE,

UNIQUE(query, snapshot_date)

)

-- Indexes for fast queries

CREATE INDEX IF NOT EXISTS idx_query_date ON serp_snapshots(query, snapshot_date)

CREATE INDEX IF NOT EXISTS idx_url_query ON serp_snapshots(url, query)

CREATE INDEX IF NOT EXISTS idx_interest_scores ON interest_scores(query, snapshot_date)
`

Each SERP result becomes a row, keyed by (query, date, URL). Interest scores are stored in a separate table, calculated automatically when a new snapshot is inserted. So, inserting a snapshot will look like this:

`
def insert_snapshot(self, results: List[Dict[str, Any]], query: str,

snapshot_date: Optional[datetime] = None):

"""Insert a daily snapshot of SERP results"""

if snapshot_date is None:

snapshot_date = datetime.now()

snapshot_timestamp = snapshot_date    
snapshot_date_only = snapshot_date.date() if hasattr(snapshot_date, 'date') else snapshot_date    

if not results:    
    return    

def extract_domain(url: str) -> str:    
    """Extract domain from URL, stripping www prefix"""    
    if not url:    
        return ""    
    try:    
        from urllib.parse import urlparse    
        parsed = urlparse(url)      
        return parsed.netloc.replace("www.", "")   
    except:    
        return ""    

# Get max snapshot_id     
max_id_result = self.conn.execute(      
    "SELECT COALESCE(MAX(snapshot_id), 0) FROM serp_snapshots"      
).fetchone()      
next_id = (max_id_result[0] if max_id_result else 0) + 1      

rows = []    
for idx, result in enumerate(results):    
    url = result.get('url', result.get('link', ''))    
    domain = extract_domain(url)     
    rows.append({    
        'snapshot_id': next_id + idx,    
        'query': query,    
        'snapshot_date': snapshot_date_only,    
        'snapshot_timestamp': snapshot_timestamp,    
        'url': url,    
        'title': result.get('title', ''),    
        'snippet': result.get('snippet', result.get('description', '')),    
        'domain': domain,     
        'rank': idx + 1    
    })    

import pandas as pd    
df = pd.DataFrame(rows)    
self.conn.execute("""      
    INSERT OR IGNORE INTO serp_snapshots       
    SELECT * FROM df      
""")    
# Calculate and store interest score    
self._calculate_interest_score(query, snapshot_date_only)  
Enter fullscreen mode Exit fullscreen mode

`

Instead of updating rows, every run adds new records. This builds a local time-series dataset.

Here’s how we calculate the interest score using that 40–30–30 formula described earlier:


def _calculate_interest_score(self, query: str, snapshot_date):
"""Calculate Search Interest Score (0-100) based on SERP movement"""
# Get previous day's snapshot for comparison
prev_date_result = self.conn.execute("""
SELECT MAX(snapshot_date)
FROM serp_snapshots
WHERE query = ?
AND snapshot_date
< ?

""", [query, snapshot_date]).fetchone()

if not prev_date_result or not prev_date_result[0]:  
    # First snapshot, no comparison possible  
    return  

prev_date = prev_date_result[0]  

# Get current top 10 domains  
current_domains = self.conn.execute("""  
    SELECT DISTINCT domain   
    FROM serp_snapshots   
    WHERE query = ?   
      AND snapshot_date = ?  
      AND rank <= 10  
""", [query, snapshot_date]).fetchall()  
current_domains_set = {row[0] for row in current_domains}  

# Get previous top 10 domains  
prev_domains = self.conn.execute("""  
    SELECT DISTINCT domain   
    FROM serp_snapshots   
    WHERE query = ?   
      AND snapshot_date = ?  
      AND rank <= 10  
""", [query, prev_date]).fetchall()  
prev_domains_set = {row[0] for row in prev_domains}  

# Count new domains entering top 10  
new_domains = current_domains_set - prev_domains_set  
new_domains_count = len(new_domains)  

# Calculate average rank improvement for existing domains  
rank_changes = self.conn.execute("""  
    WITH current_ranks AS (  
        SELECT domain, rank  
        FROM serp_snapshots  
        WHERE query = ? AND snapshot_date = ?   
          AND rank <= 10  
    ),  
    prev_ranks AS (  
        SELECT domain, rank  
        FROM serp_snapshots  
        WHERE query = ? AND snapshot_date = ?  
          AND rank <= 10  
    )  
    SELECT   
        c.domain,  
        c.rank as current_rank,  
        p.rank as prev_rank,  
        (p.rank - c.rank) as rank_improvement  
    FROM current_ranks c  
    JOIN prev_ranks p ON c.domain = p.domain  
""", [query, snapshot_date, query, prev_date]).fetchall()  

if rank_changes:  
    avg_rank_improvement = sum(row[3] for row in rank_changes) / len(rank_changes)  
else:  
    avg_rank_improvement = 0.0  

# Calculate reshuffle frequency (how many domains changed position)  
reshuffle_count = len(rank_changes)  
reshuffle_frequency = reshuffle_count / max(len(current_domains_set), 1)  

# Normalize to 0-100 score  
# I'm calculating a final score from 3 weighted sub-scores:  
# - New domains: 0-10 domains = 0-40 points  
# - Rank improvement: -10 to +10 = 0-30 points (normalized)  
# - Reshuffle frequency: 0-1 = 0-30 points  

new_domains_score = min(new_domains_count * 4, 40)  # Max 40 points  
rank_improvement_score = min(max((avg_rank_improvement + 10) / 20 * 30, 0), 30)  # Max 30 points  
reshuffle_score = reshuffle_frequency * 30  # Max 30 points  

interest_score = new_domains_score + rank_improvement_score + reshuffle_score  

# Store interest score  
self.conn.execute("""  
    INSERT OR REPLACE INTO interest_scores   
    (query, snapshot_date, interest_score, new_domains_count, avg_rank_improvement, reshuffle_frequency)  
    VALUES (?, ?, ?, ?, ?, ?)  
""", [query, snapshot_date, interest_score, new_domains_count, avg_rank_improvement, reshuffle_frequency])
Enter fullscreen mode Exit fullscreen mode

`

This runs automatically every time a new snapshot is inserted. The score gets stored in a separate interest_scores table for easy querying.

Module 3: Analytical Queries

The nerdiest of our logic lives in analytics.py. This module opens DuckDB in read-only mode and exposes focused queries.

` Read the full code for analytics.py here: https://github.com/sixthextinction/duckdb-google-trends-basic/blob/main/src/analytics.py

A good analytical query to demonstrate right now would be the one for rank volatility:

def rank_volatility(self, query: str, days: int = 30) -> Dict[str, Any]:    
    """Calculate rank volatility for URLs over time"""    
    cutoff_date = datetime.now().date() - timedelta(days=days)    

    result = self.conn.execute("""    
        WITH rank_changes AS (    
            SELECT     
                url,    
                domain,    
                rank,    
                snapshot_date,    
                LAG(rank) OVER (PARTITION BY url ORDER BY snapshot_date) as prev_rank    
            FROM serp_snapshots    
            WHERE query = ? AND snapshot_date >= ?    
            ORDER BY url, snapshot_date    
        ),    
        volatility AS (    
            SELECT     
                url,    
                domain,    
                COUNT(*) as snapshot_count,    
                AVG(rank) as avg_rank,    
                MIN(rank) as best_rank,    
                MAX(rank) as worst_rank,    
                STDDEV(rank) as rank_stddev,    
                COUNT(CASE WHEN prev_rank IS NOT NULL AND rank != prev_rank THEN 1 END) as rank_changes    
            FROM rank_changes    
            GROUP BY url, domain    
        )    
        SELECT     
            url,    
            domain,    
            snapshot_count,    
            ROUND(avg_rank, 2) as avg_rank,    
            best_rank,    
            worst_rank,    
            ROUND(rank_stddev, 2) as rank_stddev,    
            rank_changes,    
            ROUND(CAST(rank_changes AS DOUBLE) / NULLIF(snapshot_count - 1, 0) * 100, 1) as volatility_pct    
        FROM volatility    
        WHERE snapshot_count > 1    
        ORDER BY rank_stddev DESC, avg_rank ASC    
        LIMIT 50    
    """, [query, cutoff_date]).df()    

    return {'query': query, 'days': days, 'results': result}
Enter fullscreen mode Exit fullscreen mode

This uses window functions (LAG) and aggregations (STDDEV) to surface URLs that move around the most. These queries would normally require a data warehouse — here they’re just SQL running locally.

Run this with:

python main.py volatility --query "python" --days 30
Enter fullscreen mode Exit fullscreen mode

This, for example, will analyze the last 30 days of snapshots for the query string “python”, calculating average rank, best/worst rank, standard deviation, and change frequency — and display the top 50 (as default) most volatile URLs.

Example output:

=== Rank Volatility for 'python' (last 30 days) ===  

Top 10 most volatile URLs:  

| url | domain | snapshot_count | avg_rank | best_rank | worst_rank | rank_stddev | rank_changes | volatility_pct |  
| --- | --- | --- | --- | --- | --- | --- | --- | --- |  
| https://www.python.org/ | python.org | 30 | 1.5 | 1 | 3 | 0.67 | 15 | 51.7 |  
| https://www.w3schools.com/python/ | w3schools.com | 30 | 2.3 | 1 | 5 | 1.12 | 18 | 62.1 |  
| https://en.wikipedia.org/wiki/Python_(programming_language) | wikipedia.org | 28 | 4.1 | 2 | 8 | 1.89 | 12 | 44.4 |  
| https://www.codecademy.com/catalog/language/python | codecademy.com | 25 | 5.2 | 3 | 10 | 2.15 | 10 | 41.7 |
Enter fullscreen mode Exit fullscreen mode

Another query that is very useful is finding new entrants:

def new_entrants(self, query: str, days: int = 7):    
    """Find URLs that appeared for the first time recently"""    
    cutoff_date = datetime.now().date() - timedelta(days=days)    

    result = self.conn.execute("""    
        WITH first_appearance AS (    
            SELECT     
                url,    
                domain,    
                MIN(snapshot_date) as first_seen    
            FROM serp_snapshots    
            WHERE query = ?    
            GROUP BY url, domain    
        ),    
        recent_entrants AS (    
            SELECT     
                fa.url,    
                fa.domain,    
                fa.first_seen,    
                s.rank as first_rank,    
                s.title,    
                s.snippet    
            FROM first_appearance fa    
            JOIN serp_snapshots s     
                ON fa.url = s.url     
                AND fa.first_seen = s.snapshot_date    
                AND s.query = ?    
            WHERE fa.first_seen >= ?    
        )    
        SELECT     
            url,    
            domain,    
            first_seen,    
            first_rank,    
            title,    
            snippet    
        FROM recent_entrants    
        ORDER BY first_seen DESC, first_rank ASC    
        LIMIT 50    
    """, [query, query, cutoff_date]).df()    

    return {'query': query, 'days': days, 'results': result}
Enter fullscreen mode Exit fullscreen mode

This finds URLs whose first appearance falls within the last N days — perfect for spotting new competitors or fresh content entering the rankings.

Run this with:

python main.py new-entrants --query "python" --days  7
Enter fullscreen mode Exit fullscreen mode

Example output:

=== New Entrants for 'python' (last 7 days) ===  

Found 3 new URLs:  

| url | domain | first_seen | first_rank | title | snippet |  
| --- | --- | --- | --- | --- | --- |  
| https://realpython.com/ | realpython.com | 2026-02-04 | 7 | Real Python - Python Tutorials | Learn Python programming with Real Python's comprehensive tutorials and courses... |  
| https://www.pythonforbeginners.com/ | pythonforbeginners.com | 2026-02-05 | 9 | Python For Beginners | A comprehensive guide to learning Python programming from scratch... |  
| https://docs.python-guide.org/ | docs.python-guide.org | 2026-02-06 | 8 | The Hitchhiker's Guide to Python | Best practices and recommendations for Python development... |
Enter fullscreen mode Exit fullscreen mode

I’m not going to go over every module, but it’s all in the code. Find all queries + their expected output in the project README.md.

Module 4: The Snapshot Fetcher

Finally, scraper.py (I’m so sorry — I really could have named this better 😅) connects ingestion and storage.

Read the full code for scraper.py here: https://github.com/sixthextinction/duckdb-google-trends-basic/blob/main/src/scraper.py

"""    
SERP snapshot fetcher    
"""    

import time    
from datetime import datetime    
from typing import List, Optional    

from serp_client import BrightDataClient    
from duckdb_manager import DuckDBManager    


def fetch_snapshots(keywords: List[str], num_results: int = 10, delay: float = 1.0):    
    """    
    Fetch SERP snapshots for keywords and store in DuckDB    

    Args:    
        keywords: List of search keywords    
        num_results: Number of results per keyword    
        delay: Delay between API calls (seconds)    
    """    
    client = BrightDataClient()    

    with DuckDBManager() as db:    
        print(f"Fetching snapshots for {len(keywords)} keywords...")    

        for idx, keyword in enumerate(keywords):    
            try:    
                # Fetch SERP results    
                serp_data = client.search(keyword, num_results=num_results)    

                # Extract organic results    
                organic_results = []    
                if isinstance(serp_data, dict) and 'organic' in serp_data:    
                    organic_results = serp_data['organic']    

                if organic_results:    
                    # Insert snapshot    
                    db.insert_snapshot(organic_results, keyword)    
                    print(f"[{idx+1}/{len(keywords)}] '{keyword}': {len(organic_results)} results")    
                else:    
                    print(f"[{idx+1}/{len(keywords)}] '{keyword}': No results found")    

                # Rate limiting    
                if idx < len(keywords) - 1:    
                    time.sleep(delay)    

            except Exception as e:    
                print(f"Error fetching '{keyword}': {e}")    
                continue    

        total = db.get_snapshot_count()    
        print(f"\nTotal snapshots in database: {total}")
Enter fullscreen mode Exit fullscreen mode

This is just simple orchestration logic, again. It iterates over keywords, fetches results, and inserts snapshots. Rate limiting and error handling live at the edges. I’ve kept the core logic deliberately simple.

That’s everything! Remember, main.py brings all of these together.

Real World Use Cases

Now that you understand how it works, here’s some cool things you can actually do with this tool.

1. Detect Google Algorithm Updates Before They’re Announced

When tracking multiple keywords in the same niche, sudden volatility spikes across all of them indicate an algorithm change.

python main.py volatility --query "react" --days 7  
python main.py volatility --query "vue" --days 7  
python main.py volatility --query "angular" --days 7
Enter fullscreen mode Exit fullscreen mode

If all three show high rank_stddev and volatility_pct, Google likely pushed an update.

SEO folks pay $200/month for SEMrush Sensor just to get this signal. You’re building it for the cost of SERP API calls.

2. Spy on Competitor SEO Tactics

Track title and snippet changes to see what competitors are A/B testing:

python main.py changes --query "nextjs tutorial" --days 30
Enter fullscreen mode Exit fullscreen mode

Example output:

| url                              | prev_title                    | new_title                                               |  
|----------------------------------|-------------------------------|---------------------------------------------------------|  
| https://nextjs.org/docs          | Next.js Documentation         | Next.js Docs | Next.js                                  |  
| https://nextjs.org/learn         | Learn Next.js                 | Learn Next.js | Next.js by Vercel - The React Framework |
Enter fullscreen mode Exit fullscreen mode

Let’s say a site changed their title from a generic page description to something more specific. If their rank improved after the change, that’s a signal the new title performs better — steal that pattern!

3. Find Content Gaps in Real-Time

See which sites are entering top 10 and what format they’re using:

python main.py new-entrants --query "react hooks tutorial" --days 7
Enter fullscreen mode Exit fullscreen mode

Example output:

| domain              | first_seen | first_rank | title                                    |  
|---------------------|------------|------------|------------------------------------------|  
| react-tutorial.dev  | 2026-02-05 | 7          | React Hooks Interactive Tutorial         |  
| codesandbox.io      | 2026-02-06 | 9          | Learn React Hooks - Live Coding Examples |
Enter fullscreen mode Exit fullscreen mode

Let’s say there are two new entrants in the SERP for the query “react hooks tutorial”, and both new entrants have “Interactive” or “Live” in their titles. That means Google is currently rewarding interactive content for this query. Adjust your content strategy accordingly.

4. Validate Content Ideas Before Creation

This one’s super simple to understand. High volatility = easier for you to rank. Low volatility = established players dominate.

python main.py volatility --query "python tutorial" --days 30  
python main.py volatility --query "rust async tutorial" --days 30
Enter fullscreen mode Exit fullscreen mode

Let’s say the query “python tutorial” shows rank_stddev: 0.3 (very stable) and “rust async tutorial” shows rank_stddev: 2.1 (chaotic), focus on the Rust content! The Python keyword is locked down by W3Schools and Real Python — you won’t break in easily.

5. Track Your Own Product’s SERP Performance

Monitor how your product ranks for target keywords:

python main.py fetch --keywords "whatever your product is or does"
Enter fullscreen mode Exit fullscreen mode

Then check if you’re entering top 10:

python main.py new-entrants --query "whatever your product is or does" --days 7
Enter fullscreen mode Exit fullscreen mode

If your product URL appears, congrats — you just entered the top 10. If competitors are dropping out (volatility shows their ranks declining), you’re winning.

What I Learned Building This

  • DuckDB is a total cheat code for embedded analytics. I expected to need PostgreSQL (or ClickHouse, ugh.) for time-series queries over SERP data. Without fiddling with any config, calculating rank volatility across 30 days of snapshots for 50 URLs ran in ~20ms for me. The database file was <5MB for weeks of data.
  • Bright Data’s SERP API is very reliable. I tried other SERP APIs before settling on Bright Data, primarily because of the consistent JSON output for Google and Bing, and support for DuckDuckGo, Yandex, etc. This experiment cost me pennies — but make sure you check their pricing so you don’t get burnt by costs you shouldn’t be incurring.
  • The Interest Score formula needs tuning, possibly. The 40/30/30 weighting (new domains / rank improvement / domain overlap ratio) was only my first guess. It works reasonably well, but it’s not perfect. At the very least, I should weight new domains more heavily for breaking news queries, and reduce domain overlap ratio impact for stable niches (Because, for example, Wikipedia will always be #1 for “Python programming language”)

Try It Yourself

Again, the full code is on GitHub: https://github.com/sixthextinction/duckdb-google-trends-basic/

Quick start:

git clone https://github.com/sixthextinction/duckdb-google-trends-basic.git  
# or...  
gh repo clone sixthextinction/duckdb-google-trends-basic  
# then...  
cd duckdb-google-trends-basic    
pip install -r requirements.txt    

# Set environment variables    
export BRIGHT_DATA_API_KEY="your_key"    
export BRIGHT_DATA_ZONE="your_zone"    
export BRIGHT_DATA_COUNTRY="us"  # optional, for geo-targeted results    

# Or use a .env file instead (python-dotenv is included)    

# Test with sample data (no API key needed)    
python seed_data.py    
python main.py scores --query "nextjs" --days 7    

# Or fetch real data    
python main.py fetch --keywords "react" "vue" "svelte"
Enter fullscreen mode Exit fullscreen mode

I’ve included a seed script that creates 7 days of synthetic data so you can test immediately without waiting. Otherwise, set up a daily cron job to fetch snapshots automatically, and within a week you’ll have real trend data.

Thanks for reading!

Hi 👋 I’m constantly tinkering with dev tools, running weird-ass experiments, and otherwise building/deep-diving stuff that probably shouldn’t work but does — and writing about it. I put out a new post every Monday/Tuesday. If you’re into offbeat experiments and dev tools that actually don’t suck, give me a follow.

If you did something cool with this tool, I’d love to see it. Reach out on LinkedIn, or put it in the comments below.

Top comments (1)

Collapse
 
martin_miles_297f74dd4964 profile image
Martin Miles

Two things.

  • You mentioned trying other SERP APIs before settling on Bright Data - what went wrong with the others?
  • Being honest, if your intention is to promote your work, I don't know if going CLI instead of a full UI would help it be a Google trend alternative. Just a thought!