The easiest scraper bug to miss is not a crash.
It is a clean result that is wrong.
The request returns 200 OK.
The selector works.
The dataset has rows.
The dashboard looks alive.
And still, the output can lie.
That happens often on marketplaces because a marketplace is not just a list of product cards. It is a changing state machine.
Here is the checklist I now use before trusting marketplace scraping data.
1. Separate live records from sold records
Active listings and sold listings answer different questions.
Active listings show supply.
Sold listings show demand.
If both go into one dataset without a clear recordType, later analysis gets unreliable.
{
"recordType": "listing",
"isSold": false
}
{
"recordType": "sold_item",
"isSold": true
}
2. Treat disappearance as an event
If a listing disappears, do not silently delete it from your mental model.
Emit a tracking record.
{
"recordType": "tracking_event",
"event": "previously_seen_listing_missing",
"listingId": "123"
}
You may not know whether it sold, expired, or was removed.
But you do know something changed.
3. Store page country and seller country separately
This one sounds minor until it breaks your analysis.
country can mean the market page you searched.
sellerCountry can mean the seller location exposed by the listing.
Those are not the same thing.
{
"country": "DE",
"sellerCountry": "FR"
}
If you merge them, your cross-market analysis becomes noise.
4. Add precision filters after broad search
Marketplace search is optimized for discovery, not clean datasets.
A broad query often returns nearby products.
So I like a second layer:
{
"searchTerms": ["classic flap"],
"requiredKeywords": ["classic", "flap"]
}
This does not make search perfect.
It makes the dataset easier to inspect.
5. Make condition a first-class field
Price without condition is incomplete.
For resale marketplaces, condition changes the meaning of every price.
{
"price": 1000,
"condition": "Very good condition",
"conditionSource": "marketplace"
}
If condition is hidden in a description string, you will forget to use it.
6. Track price history, not only price
One scrape tells you what existed.
Repeated scrapes tell you what changed.
{
"priceHistory": [
{ "price": 1400, "observedAt": "2026-06-01T12:00:00.000Z" },
{ "price": 1200, "observedAt": "2026-06-08T12:00:00.000Z" }
]
}
For monitoring, the delta is usually more useful than the snapshot.
7. Use risk signals as review queues
Do not make accusations from scraper output.
If listings look suspiciously similar, surface them as review candidates.
{
"recordType": "risk_signal",
"signalType": "similar_listing_cluster",
"confidence": "review_required"
}
That gives a human a useful queue without pretending the scraper has certainty.
The real test
I do not trust a marketplace scraper because it extracts rows.
I trust it when it preserves enough context to answer:
- what exists now?
- what sold?
- what disappeared?
- where was it found?
- where is the seller?
- did the price change?
- what needs manual review?
That is the difference between scraping a page and modeling a marketplace.
What check would you add before trusting marketplace data in production?
Top comments (0)