wfgsss

Posted on Feb 15

How to Automate B2B Supplier Screening with Made-in-China.com Data

#webscraping #ecommerce #python #b2b

If you're sourcing products from China, you know the pain: thousands of suppliers, inconsistent quality, and no easy way to filter the good from the bad. Manual screening takes hours. What if you could automate it?

In this guide, I'll show you how to build an automated supplier screening pipeline using data scraped from Made-in-China.com. We'll extract supplier data, score them based on key metrics, and generate a shortlist — all with code.

Why Automate Supplier Screening?

B2B sourcing platforms like Made-in-China.com list hundreds of thousands of suppliers. Manually checking each one is impractical when you need to:

Compare 50+ suppliers for a single product category
Verify business credentials (audited, verified, years in business)
Filter by minimum order quantities and pricing tiers
Track supplier changes over time

Automation turns a week of research into minutes.

Step 1: Collect Supplier Data

First, we need structured data. The Made-in-China Scraper on Apify extracts product and supplier information from search results.

Here's how to run it programmatically:

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const input = {
    keywords: ["industrial bluetooth speaker"],
    maxItems: 100,
};

const run = await client
    .actor('jungle_intertwining/made-in-china-scraper')
    .call(input);

const { items } = await client
    .dataset(run.defaultDatasetId)
    .listItems();

console.log(`Collected ${items.length} products from suppliers`);

Each result includes supplier name, location, business type, price, MOQ, and product details.

Step 2: Define Your Screening Criteria

Not all suppliers are equal. Define scoring criteria based on what matters for your business:

SCREENING_CRITERIA = {
    "has_audit_report": {
        "weight": 30,
        "description": "Supplier has third-party audit report (SGS, TUV, etc.)"
    },
    "business_type": {
        "weight": 20,
        "preferred": ["Manufacturer", "Manufacturer/Factory"],
        "description": "Direct manufacturers score higher than trading companies"
    },
    "location_match": {
        "weight": 15,
        "preferred_provinces": [
            "Guangdong", "Zhejiang", "Jiangsu", "Fujian"
        ],
        "description": "Major manufacturing hubs with better logistics"
    },
    "price_competitiveness": {
        "weight": 20,
        "description": "Price relative to category average"
    },
    "moq_flexibility": {
        "weight": 15,
        "max_acceptable": 500,
        "description": "Lower MOQ = more flexibility for testing"
    }
}

Step 3: Build the Scoring Engine

Here's a Python script that takes raw scraped data and outputs scored, ranked suppliers:

import json
import re
from collections import defaultdict


def parse_price(price_str):
    """Extract numeric price from strings like 'US$16.00'"""
    if not price_str:
        return None
    match = re.search(r'[\d.]+', price_str.replace(',', ''))
    return float(match.group()) if match else None


def parse_moq(moq_str):
    """Extract numeric MOQ from strings like '2 Pieces'"""
    if not moq_str:
        return None
    match = re.search(r'[\d,]+', moq_str.replace(',', ''))
    return int(match.group()) iatch else None


def get_grade(score):
    if score >= 85:
        return 'A'
    if score >= 70:
        return 'B'
    if score >= 50:
        return 'C'
    return 'D'


def score_supplier(product, category_avg_price, criteria):
    score = 0
    details = {}

    # 1. Audit report check
    if product.get('auditReportUrl'):
        score += criteria['has_audit_report']['weight']
        details['audit'] = 'PASS - Has audit report'
    else:
        details['audit'] = 'FAIL - No audit report'

    # 2. Business type scoring
    biz_type = product.get('businessType', '')
    if any(pref.lower() in biz_type.lower()
         for pref in criteria['business_type']['preferred']):
        score += criteria['business_type']['weight']
        details['business_type'] = f'PASS - {biz_type}'
    else:
        details['business_type'] = f'PARTIAL - {biz_type}'

    # 3. Location scoring
    location = product.get('supplierLocation', '')
    if any(prov in location
           for prov in criteria['location_match']['preferred_provinces']):
        score += criteria['location_match']['weight']
        details['location'] = f'PASS - {location}'
    else:
        score += criteria['location_match']['weight'] * 0.5
        details['location'] = f'PARTIAL - {location}'

    # 4. Price competitiveness
    price = parse_price(product.get('price'))
    if price and category_avg_price:
        ratio = price / category_avg_price
        if ratio <= 0.9:
            score += criteria['price_competitiveness']['weight']
            details['price'] = (
                f'EXCELLENT - ${price:.2f} '
                f'(avg: ${category_avg_price:.2f})'
            )
        elif ratio <= 1.1:
            score += criteria['price_competitiveness']['weight'] * 0.7
            details['price'] = (
                f'GOOD - ${price:.2f} '
                f'(avg: ${category_avg_price:.2f})'
            )
        else:
            details['price'] = (
                f'HIGH - ${price:.2f} '
                f'(avg: ${category_avg_price:.2f})'
            )
    else:
        details['price'] = 'N/A - Price not available'

    # 5. MOQ flexibility
    moq = parse_moq(product.get('moq'))
    max_moq = criteria['moq_flexibility']['max_acceptable']
    if moq and moq <= max_moq:
        score += criteria['moq_flexibility']['weight']
        details['moq'] = f'PASS - {moq}     elif moq:
        ratio = maxoq - max_moq) / max_moq)
        score += criteria['moq_flexibility']['weight'] * ratio
        details['moq'] = (
            f'PARTIAL - {moq} units (max preferred: {max_moq})'
        )
    else:
        details['moq'] = 'N/A - MOQ not specified'

    return {
        'supplier': product.get('supplierName', 'Unknown'),
        'product': product.get('title', ''),
        'score': round(score, 1),
        'max_score': 100,
        'grade': get_grade(score),
        'details': details,
        'url': product.get('productUrl', '')
    }


def screen_suppliers(products, criteria):
    """Score and rank all suppliers."""
    prices = [parse_price(p.get('price')) for p in products]
    prices = [p for p in prices if p is not None]
    avg_price = sum(prices) / len(prices) if prices else None

    results = []
    for product in products:
        result = score_supplier(product, avg_price, criteria)
        results.append(result)

    results.sort(key=lambda x: x['score'], reverse=True)
    return results

Step 4: Generate the Shortlist Report

def generate_report(results, top_n=10):
    print(f"\n{'='*60}")
    print(f"  SUPPLIER SCREENING REPORT")
    print(f"  Total suppliers analyzed: {len(results)}")
    print(f"  Showing top {top_n}")
    print(f"{'='*60}\n")

    for i, r in enumerate(results[:top_n], 1):
        print(f"#{i} [{r['grade']}] {r['supplier']}")
        print(f"   Score: {r['score']}/{r['max_score']}")
        print(f"   Product: {r['product'][:60]}...")
        for key, val in r['details'].items():
            print(f"   {key}: {val}")
        print()

    # Summary stats
    grades = defaultdict(int)
    for r in results:
        grades[r['grade']] += 1

    print(f"\nGrade Distribution:")
    for grade in ['A', 'B', 'C', 'D']:
        count = grades.get(grade, 0)
        bar = '#' * count
        print(f"  {grade}: {bar} ({count})")


# Run the pipeline
with open('scraped_data.json') as f:
    products = json.load(f)

results = screen_suppliers(products, SCREENING_CRITERIA)
generate_report(results, top_n=10)

# Export shortlist to JSON
shortlist = [r for r in results if r['grade'] in ('A', 'B')]
with open('supplier_shortlist.json', 'w') as f:
    json.dump(shortlist, f, indent=2)

print(f"\nExported {len(shortlist)} qualified suppliers to shortlist")

Sample output:

============================================================
  SUPPLIER SCREENING REPORT
  Total suppliers analyzed: 87
  Showing top 10
============================================================

#1 [A] Shenzhen Topway Technology Co., Ltd.
   Score: 92.5/100
   Product: Professional Wireless Bluetooth Speaker 40W...
   audit: PASS - Has audit report
   business_type: PASS - Manufacturer/Factory
   location: PASS - Guangdong, China
   price: EXCELLENT - $12.50 (avg: $18.73)
   moq: PASS - 100 units

#2 [A] Dongguan Shengyuan Electronics Co., Ltd.
   Score: 87.0/100
   Product: Portable Bluetooth 5.3 Speaker Waterproof...
   audit: PASS - Has audit report
   business_type: PASS - Manufacturer
   locati Guangdong, China
   price: GOOD - $16.80 (avg: $18.73)
   moq: PASS - 200 units

Step 5: Schedule Regular Screening Runs

Supplier landscapes change. Set up a recurring pipeline:

// Schedule weekly screening with Apify
const schedule = await client.schedules().create({
    name: 'weekly-supplier-screening',
    cronExpression: '0 9 * * 1', // Every Monday 9 AM
    actions: [{
        type: 'RUN_ACTOR',
        actorId: 'jungle_intertwining/made-in-china-scraper',
        runInput: {
            body: JSON.stringify({
                keywords: [
                    "bluetooth speaker",
                    "LED light strip",
                    "phone case"
                ],
                maxItems: 200
            }),
            contentType: 'application/json'
        }
    }]
});

Then pipe the output through your scoring engine to get a fresh shortlist every week.

Practical Tips

Weight criteria for your business — if you're a small buyer, MOQ flexibility matters more than audit reports. Adjust weights accordingly.
Track scores over time — a supplier whose score drops from A to C might be having quality issues. Store historical scores in a database.
Combine platforms — use the Yiwugo Scraper and DHgate Scraper to cross-reference suppliers across platforms. A supplier present on multiple platforms with consistent pricing is generally more reliable.
Don't ignore "D" suppliers entirely — some may be new businesses with competitive pricing but no audit history yet. Flag them for manual review.

What's Next?

This pipeline gives you a data-driven starting point for supplier evaluation. From here you could:

Add sentiment analysisupplier reviews
Build a web dashboard for your procurement team
Set up alerts when new high-scoring suppliers appear
Cross-reference with trade compliance databases

The key insight: structured data + automated scoring = faster, better sourcing decisions.

Tools used in this guide:

Made-in-China Scraper — Extract B2B product and supplier data
Yiwugo Scraper — Wholesale market data from Yiwu
DHgate Scraper — Cross-border wholesale product data
China Wholesale Scraper Toolkit — All tools in one place
The Complete Guide to China Wholesale Data Scraping

DEV Community