If you're sourcing products from China, you know the pain: thousands of suppliers, inconsistent quality, and no easy way to filter the good from the bad. Manual screening takes hours. What if you could automate it?
In this guide, I'll show you how to build an automated supplier screening pipeline using data scraped from Made-in-China.com. We'll extract supplier data, score them based on key metrics, and generate a shortlist — all with code.
Why Automate Supplier Screening?
B2B sourcing platforms like Made-in-China.com list hundreds of thousands of suppliers. Manually checking each one is impractical when you need to:
- Compare 50+ suppliers for a single product category
- Verify business credentials (audited, verified, years in business)
- Filter by minimum order quantities and pricing tiers
- Track supplier changes over time
Automation turns a week of research into minutes.
Step 1: Collect Supplier Data
First, we need structured data. The Made-in-China Scraper on Apify extracts product and supplier information from search results.
Here's how to run it programmatically:
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const input = {
keywords: ["industrial bluetooth speaker"],
maxItems: 100,
};
const run = await client
.actor('jungle_intertwining/made-in-china-scraper')
.call(input);
const { items } = await client
.dataset(run.defaultDatasetId)
.listItems();
console.log(`Collected ${items.length} products from suppliers`);
Each result includes supplier name, location, business type, price, MOQ, and product details.
Step 2: Define Your Screening Criteria
Not all suppliers are equal. Define scoring criteria based on what matters for your business:
SCREENING_CRITERIA = {
"has_audit_report": {
"weight": 30,
"description": "Supplier has third-party audit report (SGS, TUV, etc.)"
},
"business_type": {
"weight": 20,
"preferred": ["Manufacturer", "Manufacturer/Factory"],
"description": "Direct manufacturers score higher than trading companies"
},
"location_match": {
"weight": 15,
"preferred_provinces": [
"Guangdong", "Zhejiang", "Jiangsu", "Fujian"
],
"description": "Major manufacturing hubs with better logistics"
},
"price_competitiveness": {
"weight": 20,
"description": "Price relative to category average"
},
"moq_flexibility": {
"weight": 15,
"max_acceptable": 500,
"description": "Lower MOQ = more flexibility for testing"
}
}
Step 3: Build the Scoring Engine
Here's a Python script that takes raw scraped data and outputs scored, ranked suppliers:
import json
import re
from collections import defaultdict
def parse_price(price_str):
"""Extract numeric price from strings like 'US$16.00'"""
if not price_str:
return None
match = re.search(r'[\d.]+', price_str.replace(',', ''))
return float(match.group()) if match else None
def parse_moq(moq_str):
"""Extract numeric MOQ from strings like '2 Pieces'"""
if not moq_str:
return None
match = re.search(r'[\d,]+', moq_str.replace(',', ''))
return int(match.group()) iatch else None
def get_grade(score):
if score >= 85:
return 'A'
if score >= 70:
return 'B'
if score >= 50:
return 'C'
return 'D'
def score_supplier(product, category_avg_price, criteria):
score = 0
details = {}
# 1. Audit report check
if product.get('auditReportUrl'):
score += criteria['has_audit_report']['weight']
details['audit'] = 'PASS - Has audit report'
else:
details['audit'] = 'FAIL - No audit report'
# 2. Business type scoring
biz_type = product.get('businessType', '')
if any(pref.lower() in biz_type.lower()
for pref in criteria['business_type']['preferred']):
score += criteria['business_type']['weight']
details['business_type'] = f'PASS - {biz_type}'
else:
details['business_type'] = f'PARTIAL - {biz_type}'
# 3. Location scoring
location = product.get('supplierLocation', '')
if any(prov in location
for prov in criteria['location_match']['preferred_provinces']):
score += criteria['location_match']['weight']
details['location'] = f'PASS - {location}'
else:
score += criteria['location_match']['weight'] * 0.5
details['location'] = f'PARTIAL - {location}'
# 4. Price competitiveness
price = parse_price(product.get('price'))
if price and category_avg_price:
ratio = price / category_avg_price
if ratio <= 0.9:
score += criteria['price_competitiveness']['weight']
details['price'] = (
f'EXCELLENT - ${price:.2f} '
f'(avg: ${category_avg_price:.2f})'
)
elif ratio <= 1.1:
score += criteria['price_competitiveness']['weight'] * 0.7
details['price'] = (
f'GOOD - ${price:.2f} '
f'(avg: ${category_avg_price:.2f})'
)
else:
details['price'] = (
f'HIGH - ${price:.2f} '
f'(avg: ${category_avg_price:.2f})'
)
else:
details['price'] = 'N/A - Price not available'
# 5. MOQ flexibility
moq = parse_moq(product.get('moq'))
max_moq = criteria['moq_flexibility']['max_acceptable']
if moq and moq <= max_moq:
score += criteria['moq_flexibility']['weight']
details['moq'] = f'PASS - {moq} elif moq:
ratio = maxoq - max_moq) / max_moq)
score += criteria['moq_flexibility']['weight'] * ratio
details['moq'] = (
f'PARTIAL - {moq} units (max preferred: {max_moq})'
)
else:
details['moq'] = 'N/A - MOQ not specified'
return {
'supplier': product.get('supplierName', 'Unknown'),
'product': product.get('title', ''),
'score': round(score, 1),
'max_score': 100,
'grade': get_grade(score),
'details': details,
'url': product.get('productUrl', '')
}
def screen_suppliers(products, criteria):
"""Score and rank all suppliers."""
prices = [parse_price(p.get('price')) for p in products]
prices = [p for p in prices if p is not None]
avg_price = sum(prices) / len(prices) if prices else None
results = []
for product in products:
result = score_supplier(product, avg_price, criteria)
results.append(result)
results.sort(key=lambda x: x['score'], reverse=True)
return results
Step 4: Generate the Shortlist Report
def generate_report(results, top_n=10):
print(f"\n{'='*60}")
print(f" SUPPLIER SCREENING REPORT")
print(f" Total suppliers analyzed: {len(results)}")
print(f" Showing top {top_n}")
print(f"{'='*60}\n")
for i, r in enumerate(results[:top_n], 1):
print(f"#{i} [{r['grade']}] {r['supplier']}")
print(f" Score: {r['score']}/{r['max_score']}")
print(f" Product: {r['product'][:60]}...")
for key, val in r['details'].items():
print(f" {key}: {val}")
print()
# Summary stats
grades = defaultdict(int)
for r in results:
grades[r['grade']] += 1
print(f"\nGrade Distribution:")
for grade in ['A', 'B', 'C', 'D']:
count = grades.get(grade, 0)
bar = '#' * count
print(f" {grade}: {bar} ({count})")
# Run the pipeline
with open('scraped_data.json') as f:
products = json.load(f)
results = screen_suppliers(products, SCREENING_CRITERIA)
generate_report(results, top_n=10)
# Export shortlist to JSON
shortlist = [r for r in results if r['grade'] in ('A', 'B')]
with open('supplier_shortlist.json', 'w') as f:
json.dump(shortlist, f, indent=2)
print(f"\nExported {len(shortlist)} qualified suppliers to shortlist")
Sample output:
============================================================
SUPPLIER SCREENING REPORT
Total suppliers analyzed: 87
Showing top 10
============================================================
#1 [A] Shenzhen Topway Technology Co., Ltd.
Score: 92.5/100
Product: Professional Wireless Bluetooth Speaker 40W...
audit: PASS - Has audit report
business_type: PASS - Manufacturer/Factory
location: PASS - Guangdong, China
price: EXCELLENT - $12.50 (avg: $18.73)
moq: PASS - 100 units
#2 [A] Dongguan Shengyuan Electronics Co., Ltd.
Score: 87.0/100
Product: Portable Bluetooth 5.3 Speaker Waterproof...
audit: PASS - Has audit report
business_type: PASS - Manufacturer
locati Guangdong, China
price: GOOD - $16.80 (avg: $18.73)
moq: PASS - 200 units
Step 5: Schedule Regular Screening Runs
Supplier landscapes change. Set up a recurring pipeline:
// Schedule weekly screening with Apify
const schedule = await client.schedules().create({
name: 'weekly-supplier-screening',
cronExpression: '0 9 * * 1', // Every Monday 9 AM
actions: [{
type: 'RUN_ACTOR',
actorId: 'jungle_intertwining/made-in-china-scraper',
runInput: {
body: JSON.stringify({
keywords: [
"bluetooth speaker",
"LED light strip",
"phone case"
],
maxItems: 200
}),
contentType: 'application/json'
}
}]
});
Then pipe the output through your scoring engine to get a fresh shortlist every week.
Practical Tips
Weight criteria for your business — if you're a small buyer, MOQ flexibility matters more than audit reports. Adjust weights accordingly.
Track scores over time — a supplier whose score drops from A to C might be having quality issues. Store historical scores in a database.
Combine platforms — use the Yiwugo Scraper and DHgate Scraper to cross-reference suppliers across platforms. A supplier present on multiple platforms with consistent pricing is generally more reliable.
Don't ignore "D" suppliers entirely — some may be new businesses with competitive pricing but no audit history yet. Flag them for manual review.
What's Next?
This pipeline gives you a data-driven starting point for supplier evaluation. From here you could:
- Add sentiment analysisupplier reviews
- Build a web dashboard for your procurement team
- Set up alerts when new high-scoring suppliers appear
- Cross-reference with trade compliance databases
The key insight: structured data + automated scoring = faster, better sourcing decisions.
Tools used in this guide:
- Made-in-China Scraper — Extract B2B product and supplier data
- Yiwugo Scraper — Wholesale market data from Yiwu
- DHgate Scraper — Cross-border wholesale product data
China Wholesale Scraper Toolkit — All tools in one place
Top comments (0)