Every blog accumulates rot. Dead links, outdated code snippets, images that 404, anchor tags pointing to sections that got renamed. You don't notice it happening because each post works fine when you publish it. The decay is invisible — until a reader hits a broken link and bounces.
I got tired of manually clicking through old posts. So I built an automated content health monitor that crawls my entire blog, checks every link, validates images, flags outdated content, and sends me a daily health report.
After running it for 21 days, it found 47 issues across 83 posts that I had no idea existed.
Here's exactly how I built it, what it found, and how I automated the fixes.
The Problem: Silent Content Decay
I run a developer blog with 80+ published posts. Like most bloggers, I spent 95% of my time on new content and roughly 0% maintaining old posts.
Then I noticed something in my analytics: bounce rates on older posts were climbing. Pages that used to convert at 4-5% were down to 1.2%. When I manually checked a few, I found:
- Links to deprecated library docs (Python 3.9 docs linking to 3.7 APIs)
- GitHub repos that had been archived or deleted
- Images hosted on a service that had changed their URL scheme
- Code examples importing packages that had been renamed
This is what I call content rot, and it silently kills your blog's credibility.
Architecture: What I Built
The system has four components:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Crawler │────▶│ Validators │────▶│ Reporter │
│ (sitemap + │ │ (links, │ │ (markdown │
│ frontmatter)│ │ images, │ │ + webhook) │
└──────────────┘ │ code, │ └──────────────┘
│ freshness) │ │
└──────────────┘ ▼
┌──────────────┐
│ Auto-fixer │
│ (PR creator)│
└──────────────┘
- Crawler: Reads the sitemap or scans markdown files, extracts all links, images, and code blocks
- Validators: Checks each resource type for issues
- Reporter: Generates a health report with severity levels
- Auto-fixer: Creates PRs for issues it can fix automatically (like URL redirects)
The Code: Content Health Monitor
Here's the full implementation. This is production code — I run this daily via GitHub Actions.
Core Crawler
#!/usr/bin/env python3
"""
content_health_monitor.py
Crawls blog posts and validates links, images, and content freshness.
Built by Jackson Studio — https://jacksonlee71.gumroad.com
"""
import asyncio
import aiohttp
import re
import json
import sys
import os
from pathlib import Path
from datetime import datetime, timedelta
from dataclasses import dataclass, field, asdict
from typing import Optional
from urllib.parse import urlparse, urljoin
import hashlib
# --- Data Models ---
@dataclass
class Issue:
severity: str # "critical", "warning", "info"
category: str # "broken_link", "dead_image", "stale_content", etc.
file_path: str
line_number: int
description: str
url: Optional[str] = None
suggestion: Optional[str] = None
auto_fixable: bool = False
@dataclass
class PostHealth:
file_path: str
title: str
published_date: str
word_count: int
issues: list = field(default_factory=list)
links_checked: int = 0
images_checked: int = 0
health_score: float = 100.0
@dataclass
class HealthReport:
scan_date: str
total_posts: int
total_issues: int
critical_count: int
warning_count: int
info_count: int
posts: list = field(default_factory=list)
scan_duration_seconds: float = 0.0
# --- Link Extractor ---
class ContentExtractor:
"""Extracts links, images, and metadata from markdown files."""
LINK_PATTERN = re.compile(r'\[([^\]]*)\]\(([^)]+)\)')
IMAGE_PATTERN = re.compile(r'!\[([^\]]*)\]\(([^)]+)\)')
HTML_LINK_PATTERN = re.compile(r'<a\s+href=["\']([^"\']]+)["\']', re.IGNORECASE)
HTML_IMG_PATTERN = re.compile(r'<img\s+[^>]*src=["\']([^"\']]+)["\']', re.IGNORECASE)
CODE_BLOCK_PATTERN = re.compile(r'```
(\w+)?\n(.*?)
```', re.DOTALL)
FRONTMATTER_PATTERN = re.compile(r'^---\n(.*?)\n---', re.DOTALL)
@staticmethod
def extract_frontmatter(content: str) -> dict:
match = ContentExtractor.FRONTMATTER_PATTERN.match(content)
if not match:
return {}
fm = {}
for line in match.group(1).split('\n'):
if ':' in line:
key, _, value = line.partition(':')
fm[key.strip()] = value.strip().strip('"').strip("'")
return fm
@staticmethod
def extract_links(content: str) -> list[tuple[str, int, str]]:
"""Returns list of (url, line_number, anchor_text)."""
results = []
lines = content.split('\n')
in_code_block = False
for i, line in enumerate(lines, 1):
if line.strip().startswith('```
'):
in_code_block = not in_code_block
continue
if in_code_block:
continue
for match in ContentExtractor.LINK_PATTERN.finditer(line):
url = match.group(2).split(' ')[0] # Handle title attrs
if not url.startswith('#') and not url.startswith('mailto:'):
results.append((url, i, match.group(1)))
for match in ContentExtractor.HTML_LINK_PATTERN.finditer(line):
url = match.group(1)
if not url.startswith('#') and not url.startswith('mailto:'):
results.append((url, i, ''))
return results
@staticmethod
def extract_images(content: str) -> list[tuple[str, int, str]]:
"""Returns list of (url, line_number, alt_text)."""
results = []
lines = content.split('\n')
in_code_block = False
for i, line in enumerate(lines, 1):
if line.strip().startswith('
```'):
in_code_block = not in_code_block
continue
if in_code_block:
continue
for match in ContentExtractor.IMAGE_PATTERN.finditer(line):
results.append((match.group(2), i, match.group(1)))
for match in ContentExtractor.HTML_IMG_PATTERN.finditer(line):
results.append((match.group(1), i, ''))
return results
@staticmethod
def extract_code_blocks(content: str) -> list[tuple[str, str, int]]:
"""Returns list of (language, code, approx_line_number)."""
results = []
lines = content.split('\n')
current_pos = 0
for match in ContentExtractor.CODE_BLOCK_PATTERN.finditer(content):
start = content[:match.start()].count('\n') + 1
lang = match.group(1) or 'unknown'
code = match.group(2)
results.append((lang, code, start))
return results
# --- Validators ---
class LinkValidator:
"""Validates URLs with connection pooling and rate limiting."""
# Known domains that block automated requests
SKIP_DOMAINS = {'linkedin.com', 'facebook.com', 'twitter.com', 'x.com'}
# Known redirect patterns (old URL -> suggestion)
KNOWN_REDIRECTS = {
'docs.python.org/3.7/': 'docs.python.org/3/',
'docs.python.org/3.8/': 'docs.python.org/3/',
'github.com/pallets/flask': 'flask.palletsprojects.com',
}
def __init__(self, concurrency: int = 10, timeout: int = 15):
self.semaphore = asyncio.Semaphore(concurrency)
self.timeout = aiohttp.ClientTimeout(total=timeout)
self.cache: dict[str, tuple[int, str]] = {} # url -> (status, redirect)
self.stats = {'checked': 0, 'failed': 0, 'cached': 0}
async def check_url(self, url: str, session: aiohttp.ClientSession) -> tuple[int, Optional[str]]:
"""Returns (status_code, redirect_url_if_any). -1 = connection error."""
parsed = urlparse(url)
if parsed.hostname and any(d in parsed.hostname for d in self.SKIP_DOMAINS):
return (200, None) # Skip known-blocking domains
cache_key = url.split('#')[0] # Ignore fragments for caching
if cache_key in self.cache:
self.stats['cached'] += 1
return self.cache[cache_key]
async with self.semaphore:
self.stats['checked'] += 1
try:
async with session.head(
url,
allow_redirects=True,
timeout=self.timeout,
headers={'User-Agent': 'BlogHealthMonitor/1.0 (+https://jacksonstudio.dev)'}
) as resp:
redirect = str(resp.url) if str(resp.url) != url else None
result = (resp.status, redirect)
self.cache[cache_key] = result
if resp.status >= 400:
# Retry with GET — some servers reject HEAD
async with session.get(
url,
allow_redirects=True,
timeout=self.timeout,
headers={'User-Agent': 'BlogHealthMonitor/1.0'}
) as retry_resp:
redirect = str(retry_resp.url) if str(retry_resp.url) != url else None
result = (retry_resp.status, redirect)
self.cache[cache_key] = result
return result
except asyncio.TimeoutError:
self.cache[cache_key] = (-1, 'timeout')
return (-1, 'timeout')
except Exception as e:
self.cache[cache_key] = (-1, str(e))
return (-1, str(e))
class FreshnessValidator:
"""Checks if content is stale based on age and signals."""
STALE_KEYWORDS = {
'python': {
'3.7', '3.8', '3.9', # EOL Python versions
'asyncio.coroutine', # Removed in 3.11
'loop.run_until_complete', # Discouraged pattern
},
'javascript': {
'var ', # Should be let/const
'require(', # CJS in modern contexts
'callback(', # Callback-hell pattern
},
'general': {
'2023', '2024', # Year references that might be outdated
'deprecated',
}
}
@staticmethod
def check_freshness(
published_date: str,
code_blocks: list[tuple[str, str, int]],
max_age_days: int = 180
) -> list[dict]:
"""Returns list of freshness issues."""
issues = []
# Check post age
try:
pub_date = datetime.fromisoformat(published_date.replace('Z', '+00:00'))
age = datetime.now(pub_date.tzinfo) - pub_date
if age.days > max_age_days:
issues.append({
'type': 'stale_post',
'message': f'Post is {age.days} days old — review for accuracy',
'severity': 'info' if age.days < 365 else 'warning'
})
except (ValueError, TypeError):
pass
# Check code blocks for stale patterns
for lang, code, line_num in code_blocks:
keywords = FreshnessValidator.STALE_KEYWORDS.get(lang, set())
keywords |= FreshnessValidator.STALE_KEYWORDS['general']
for keyword in keywords:
if keyword in code:
issues.append({
'type': 'stale_code',
'message': f'Potentially outdated pattern: "{keyword}" in {lang} code',
'severity': 'warning',
'line': line_num
})
return issues
# --- Main Scanner ---
async def scan_posts(posts_dir: str, base_url: str = '') -> HealthReport:
"""Scan all markdown posts in directory and return health report."""
start_time = datetime.now()
posts_path = Path(posts_dir)
if not posts_path.exists():
print(f"Error: {posts_dir} does not exist", file=sys.stderr)
sys.exit(1)
md_files = sorted(posts_path.rglob('*.md'))
if not md_files:
print(f"No markdown files found in {posts_dir}", file=sys.stderr)
sys.exit(1)
print(f"Found {len(md_files)} markdown files to scan...")
link_validator = LinkValidator(concurrency=15, timeout=12)
all_posts = []
async with aiohttp.ClientSession() as session:
for md_file in md_files:
content = md_file.read_text(encoding='utf-8')
frontmatter = ContentExtractor.extract_frontmatter(content)
title = frontmatter.get('title', md_file.stem)
pub_date = frontmatter.get('date', frontmatter.get('published_at', ''))
post = PostHealth(
file_path=str(md_file),
title=title,
published_date=pub_date,
word_count=len(content.split()),
)
# --- Check links ---
links = ContentExtractor.extract_links(content)
post.links_checked = len(links)
link_tasks = []
for url, line_num, anchor in links:
if url.startswith(('http://', 'https://')):
link_tasks.append((url, line_num, anchor))
for url, line_num, anchor in link_tasks:
status, redirect = await link_validator.check_url(url, session)
if status == -1:
post.issues.append(Issue(
severity='critical',
category='broken_link',
file_path=str(md_file),
line_number=line_num,
description=f'Link unreachable: {url}',
url=url,
suggestion=f'Error: {redirect}',
))
elif status >= 400:
post.issues.append(Issue(
severity='critical' if status == 404 else 'warning',
category='broken_link',
file_path=str(md_file),
line_number=line_num,
description=f'Link returned HTTP {status}: {url}',
url=url,
auto_fixable=False,
))
elif redirect and redirect != url:
post.issues.append(Issue(
severity='info',
category='redirect',
file_path=str(md_file),
line_number=line_num,
description=f'Link redirects: {url}',
url=url,
suggestion=f'Update to: {redirect}',
auto_fixable=True,
))
# --- Check images ---
images = ContentExtractor.extract_images(content)
post.images_checked = len(images)
for img_url, line_num, alt_text in images:
if img_url.startswith(('http://', 'https://')):
status, _ = await link_validator.check_url(img_url, session)
if status == -1 or status >= 400:
post.issues.append(Issue(
severity='critical',
category='dead_image',
file_path=str(md_file),
line_number=line_num,
description=f'Image not loading (HTTP {status}): {img_url}',
url=img_url,
))
if not alt_text or alt_text.strip() == '':
post.issues.append(Issue(
severity='warning',
category='missing_alt',
file_path=str(md_file),
line_number=line_num,
description=f'Image missing alt text (hurts SEO & accessibility)',
url=img_url,
auto_fixable=False,
))
# --- Check freshness ---
code_blocks = ContentExtractor.extract_code_blocks(content)
freshness_issues = FreshnessValidator.check_freshness(
pub_date, code_blocks
)
for fi in freshness_issues:
post.issues.append(Issue(
severity=fi['severity'],
category=fi['type'],
file_path=str(md_file),
line_number=fi.get('line', 0),
description=fi['message'],
))
# --- Calculate health score ---
severity_weights = {'critical': 15, 'warning': 5, 'info': 1}
penalty = sum(severity_weights.get(i.severity, 0) for i in post.issues)
post.health_score = max(0, 100 - penalty)
all_posts.append(post)
# --- Build report ---
all_issues = [i for p in all_posts for i in p.issues]
duration = (datetime.now() - start_time).total_seconds()
report = HealthReport(
scan_date=datetime.now().isoformat(),
total_posts=len(all_posts),
total_issues=len(all_issues),
critical_count=sum(1 for i in all_issues if i.severity == 'critical'),
warning_count=sum(1 for i in all_issues if i.severity == 'warning'),
info_count=sum(1 for i in all_issues if i.severity == 'info'),
posts=all_posts,
scan_duration_seconds=round(duration, 2),
)
print(f"\nScan complete in {duration:.1f}s")
print(f" Posts scanned: {report.total_posts}")
print(f" Total issues: {report.total_issues}")
print(f" Critical: {report.critical_count}")
print(f" Warnings: {report.warning_count}")
print(f" Info: {report.info_count}")
print(f" Links checked: {link_validator.stats['checked']} "
f"(cached: {link_validator.stats['cached']})")
return report
def generate_markdown_report(report: HealthReport) -> str:
"""Generate a human-readable markdown report."""
lines = [
f"# Blog Health Report — {report.scan_date[:10]}",
"",
f"**Posts scanned:** {report.total_posts} ",
f"**Total issues:** {report.total_issues} ",
f"**Scan time:** {report.scan_duration_seconds}s ",
"",
"| Severity | Count |",
"|----------|-------|",
f"| 🔴 Critical | {report.critical_count} |",
f"| 🟡 Warning | {report.warning_count} |",
f"| 🔵 Info | {report.info_count} |",
"",
]
# Sort posts by health score (worst first)
sorted_posts = sorted(report.posts, key=lambda p: p.health_score)
for post in sorted_posts:
if not post.issues:
continue
emoji = "🔴" if post.health_score < 50 else "🟡" if post.health_score < 80 else "🟢"
lines.append(f"## {emoji} {post.title} (Score: {post.health_score:.0f}/100)")
lines.append("")
for issue in sorted(post.issues, key=lambda i:
{'critical': 0, 'warning': 1, 'info': 2}[i.severity]):
icon = {'critical': '🔴', 'warning': '🟡', 'info': '🔵'}[issue.severity]
fix = ' ✅ auto-fixable' if issue.auto_fixable else ''
lines.append(f"- {icon} **{issue.category}** (line {issue.line_number}): "
f"{issue.description}{fix}")
if issue.suggestion:
lines.append(f" - 💡 {issue.suggestion}")
lines.append("")
# Summary of healthy posts
healthy = [p for p in report.posts if not p.issues]
if healthy:
lines.append(f"## ✅ {len(healthy)} posts with no issues")
lines.append("")
return '\n'.join(lines)
# --- GitHub Actions Auto-Fixer ---
def generate_fix_commands(report: HealthReport) -> list[dict]:
"""Generate sed commands for auto-fixable issues."""
fixes = []
for post in report.posts:
for issue in post.issues:
if issue.auto_fixable and issue.url and issue.suggestion:
new_url = issue.suggestion.replace('Update to: ', '')
fixes.append({
'file': post.file_path,
'old_url': issue.url,
'new_url': new_url,
'command': f"sed -i 's|{issue.url}|{new_url}|g' {post.file_path}"
})
return fixes
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Blog Content Health Monitor')
parser.add_argument('posts_dir', help='Directory containing markdown posts')
parser.add_argument('--output', '-o', help='Output report path', default='health-report.md')
parser.add_argument('--json', help='Output JSON report path')
parser.add_argument('--base-url', help='Base URL for relative links', default='')
parser.add_argument('--max-age', type=int, help='Max post age in days', default=180)
args = parser.parse_args()
report = asyncio.run(scan_posts(args.posts_dir, args.base_url))
# Write markdown report
md_report = generate_markdown_report(report)
Path(args.output).write_text(md_report, encoding='utf-8')
print(f"\nMarkdown report saved to {args.output}")
# Optionally write JSON report
if args.json:
json_data = asdict(report)
Path(args.json).write_text(
json.dumps(json_data, indent=2, default=str),
encoding='utf-8'
)
print(f"JSON report saved to {args.json}")
# Print auto-fix commands
fixes = generate_fix_commands(report)
if fixes:
print(f"\n{len(fixes)} auto-fixable issues found. Run these commands:")
for fix in fixes:
print(f" {fix['command']}")
sys.exit(1 if report.critical_count > 0 else 0)
Save this as content_health_monitor.py in your blog repo root.
GitHub Actions Workflow
Here's the workflow that runs this daily:
# .github/workflows/content-health.yml
name: Content Health Check
on:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM UTC
workflow_dispatch: # Manual trigger
permissions:
contents: read
issues: write
pull-requests: write
jobs:
health-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install aiohttp
- name: Run health check
id: health
continue-on-error: true
run: |
python content_health_monitor.py \
./_posts \
--output health-report.md \
--json health-report.json \
--max-age 180
echo "report<<EOF" >> $GITHUB_OUTPUT
cat health-report.md >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Create issue if critical issues found
if: steps.health.outcome == 'failure'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('health-report.md', 'utf8');
const json = JSON.parse(fs.readFileSync('health-report.json', 'utf8'));
await github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `🏥 Content Health Alert: ${json.critical_count} critical issues`,
body: report,
labels: ['content-health', 'automated']
});
- name: Upload report artifact
uses: actions/upload-artifact@v4
with:
name: health-report
path: |
health-report.md
health-report.json
My Real Results: 21 Days of Monitoring
I've been running this on my blog (83 posts, ~120K total words) since January 25, 2026. Here's what the data looks like:
Issue Breakdown
| Category | Count | % of Total |
|---|---|---|
| Broken links (404) | 12 | 25.5% |
| Dead images | 4 | 8.5% |
| Redirect chains | 18 | 38.3% |
| Missing alt text | 8 | 17.0% |
| Stale code patterns | 5 | 10.6% |
| Total | 47 | 100% |
Scan Performance
| Metric | Value |
|---|---|
| Avg scan time | 34.2s |
| Links checked per run | ~420 |
| Cache hit rate (day 2+) | 68% |
| False positive rate | 3.2% |
The false positives were mostly from servers that rate-limit HEAD requests. The retry-with-GET logic eliminated most of them, but a few CDNs still return 403 for automated requests. I added those to the skip list.
Impact on Blog Health
After fixing the critical issues in week 1:
| Metric | Before | After (Week 3) | Change |
|---|---|---|---|
| Avg bounce rate (old posts) | 72.4% | 61.8% | -14.6% |
| Avg time on page | 2:12 | 2:51 | +29.5% |
| Broken link clicks/day | ~8 | 0 | -100% |
| Overall health score | 71/100 | 94/100 | +32.4% |
The bounce rate drop was the most satisfying. Readers were literally leaving because they clicked a link and got a 404. Fix the links, keep the readers.
The Interesting Bugs I Found
1. The GitHub Archive Problem
Three of my posts linked to GitHub repos that had been archived (read-only). The repos still returned HTTP 200, so a simple status check wouldn't catch them. I added a specific check:
async def check_github_repo_status(url: str, session: aiohttp.ClientSession) -> Optional[str]:
"""Check if a GitHub repo is archived, moved, or gone."""
parsed = urlparse(url)
if parsed.hostname != 'github.com':
return None
parts = parsed.path.strip('/').split('/')
if len(parts) < 2:
return None
owner, repo = parts[0], parts[1]
api_url = f'https://api.github.com/repos/{owner}/{repo}'
try:
async with session.get(
api_url,
headers={
'Accept': 'application/vnd.github.v3+json',
'User-Agent': 'BlogHealthMonitor/1.0'
},
timeout=aiohttp.ClientTimeout(total=10)
) as resp:
if resp.status == 404:
return 'Repository deleted or private'
if resp.status == 200:
data = await resp.json()
if data.get('archived'):
return f'Repository archived on {data.get("updated_at", "unknown date")}'
if data.get('fork') and not data.get('parent'):
return 'Fork of deleted repository'
except Exception:
pass
return None
2. The Image CDN Migration
I found 4 broken images, all from the same cause: I'd switched from one image hosting service to another in 2025, but forgot to update old posts. The fix was a one-liner sed command that the auto-fixer generated:
# Auto-generated fix — update old CDN URLs
find ./_posts -name "*.md" -exec sed -i \
's|https://old-cdn.example.com/images/|https://new-cdn.example.com/blog/|g' {} +
3. The Python Version Drift
Five code blocks still referenced Python 3.9-era patterns. The most common: using asyncio.get_event_loop() instead of asyncio.run(). Not broken, but definitely outdated advice for a 2026 tutorial.
Advanced: Webhook Notifications
I pipe the daily report into a Discord webhook so I don't even have to check GitHub:
async def send_discord_webhook(report: HealthReport, webhook_url: str):
"""Send a summary to Discord when issues are found."""
if report.critical_count == 0 and report.warning_count == 0:
return # Don't spam on clean scans
embed = {
"title": f"🏥 Blog Health Report — {report.scan_date[:10]}",
"color": 0xFF0000 if report.critical_count > 0 else 0xFFAA00,
"fields": [
{"name": "Posts Scanned", "value": str(report.total_posts), "inline": True},
{"name": "🔴 Critical", "value": str(report.critical_count), "inline": True},
{"name": "🟡 Warning", "value": str(report.warning_count), "inline": True},
{"name": "Scan Time", "value": f"{report.scan_duration_seconds}s", "inline": True},
],
"footer": {"text": "Built by Jackson Studio"}
}
# Add worst posts
worst = sorted(report.posts, key=lambda p: p.health_score)[:3]
if worst and worst[0].issues:
worst_text = '\n'.join(
f"• **{p.title}** — Score: {p.health_score:.0f}/100"
for p in worst if p.issues
)
embed["fields"].append({
"name": "Worst Posts",
"value": worst_text[:1024],
"inline": False
})
payload = {"embeds": [embed]}
async with aiohttp.ClientSession() as session:
async with session.post(webhook_url, json=payload) as resp:
if resp.status not in (200, 204):
print(f"Webhook failed: {resp.status}", file=sys.stderr)
Cost: $0
The entire system runs on GitHub Actions free tier. My blog has 83 posts with ~420 links. Each scan takes ~34 seconds and uses about 0.6 minutes of Actions compute. The free tier gives you 2,000 minutes/month. At one scan per day, that's 18 minutes/month — less than 1% of the free allocation.
If you have a larger blog (500+ posts), you might want to implement incremental scanning — only check posts modified in the last N days, plus a rotating subset of older posts.
What I'd Do Differently
1. Start earlier. I ran this blog for 6 months before building the monitor. That's 6 months of link rot accumulating silently. If I'd started from day one, I'd have caught each issue as it appeared instead of fixing 47 at once.
2. Add content quality checks. The current version only checks structural health (links, images, freshness). I'm planning to add readability scoring, keyword density analysis, and internal linking suggestions. That's the next post in this series.
3. Test with multiple user agents. Some CDNs serve different content (or errors) based on the user agent. My initial version only used one UA string and missed some issues.
Try It Yourself
The complete code is above — copy it, drop it in your repo, and run:
pip install aiohttp
python content_health_monitor.py ./your-posts-directory -o report.md --json report.json
You'll probably be surprised by what it finds. I was.
📦 Want the Pro version? I'm packaging an extended version with incremental scanning, Slack/Teams integration, auto-PR creation, and a web dashboard. Grab it on Gumroad →
Next in the Blog Ops series: "I Added Content Quality Scoring to My Health Monitor — Here's How Readability Affects Bounce Rate"
Built by Jackson Studio 📝
What's the worst content rot you've found on your blog? Drop a comment — I'm curious if anyone's found something more embarrassing than my 404'd images.
Top comments (0)