DEV Community

wfgsss
wfgsss

Posted on

The Complete Guide to China Wholesale Data Scraping: Tools, Platforms, and Best Practices

If you're sourcing products from China — whether for dropshipping, wholesale import, or market research — you've probably spent hours manually browsing supplier platforms, copying prices into spreadsheets, and comparing MOQs across tabs.

There's a better way. In this guide, I'll walk you through everything I've learned building scrapers for China's three biggest wholesale platforms: Yiwugo.com, DHgate.com, and Made-in-China.com. We'll cover platform differences, technical challenges, working code examples, and the tools I've published on Apify Store that you can use right now.

Why Scrape China Wholesale Platforms?

Manual sourcing doesn't scale. Here's what automated data extraction unlocks:

  • Price monitoring — Track price changes across thousands of products daily
  • Supplier discovery — Find new suppliers matching your criteria automatically
  • Competitive analysis — Compare pricing, MOQs, and product ranges across platforms
  • Trend detection — Spot rising product categories before they go mainstream
  • Due diligence — Screen supplier ratings, transaction history, and verification status at scale

A single scraping run can collect data that would take a human researcher weeks to compile manually.

The Three Platforms: A Quick Comparison

Before diving into the technical details, let's understand what makes each platform unique:

Feature Yiwugo.com DHgate.com Made-in-China.com
Focus Yiwu small commodities Global wholesale/dropship B2B manufacturing
Typical MOQ 1-100 units 1-50 units 100-10,000 units
Buyer type Small retailers Dropshippers, small buyers Import companies, OEMs
Product range 2M+ SKUs 30M+ SKUs 15M+ products
Language Chinese (some English) English English
Anti-scraping Moderate Moderate Low (search pages)
Best for Commodity pricing data Dropshipping research Manufacturer sourcing

Each platform serves a different segment of the supply chain. Scraping all three gives you the most complete picture of China's wholesale market.

Platform Deep Dive: Yiwugo.com

What is Yiwu?

Yiwugo is the official online platform of the Yiwu International Trade Market — the world's largest small commodities wholesale market. It's where factory owners and distributors in Yiwu list their products, primarily targeting domestic and international small-to-medium buyers.

Data You Can Extract

{
  "title": "Stainless Steel Water Bottle 500ml",
  "price": "¥8.50 - ¥12.00",
  "minOrder": "100 pieces",
  "shopName": "Yiwu Hengda Cup Factory",
  "shopUrl": "https://www.yiwugo.com/shop/...",
  "location": "Yiwu, Zhejiang",
  "productUrl": "https://www.yiwugo.com/product/...",
  "imageUrl": "https://img.yiwugo.com/...",
  "category": "Cups & Bottles"
}
Enter fullscreen mode Exit fullscreen mode

Technical Approach

Yiwugo uses server-side rendering for search results, which makes it relatively straightforward to scrape:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $, enqueueLinks }) {
        // Extract product cards from search results
        $('.pro_list_product_img2').each((i, el) => {
            const title = $(el).find('.productloc a').text().trim();
   nst price = $(el).find('.product_price span').text().trim();
            const shop = $(el).find('.shop_name a').text().trim();

            console.log({ title, price, shop });
        });

        // Follow pagination
        await enqueueLinks({
            selector: '.page_next a',
            label: 'LIST',
        });
    },
});

await crawler.run(['https://www.yiwugo.com/product/search?keyword=water+bottle']);
Enter fullscreen mode Exit fullscreen mode

Key Challenges

  1. Mixed language content — Product titles and descriptions are primarily in Chinese
  2. Price ranges — Many products show price tiers based on quantity
  3. Rate limiting — Aggressive scraping triggers IP blocks
  4. Session management — Some pages require valid session cookies

Ready-to-Use Tool

I've published a production-ready Yiwugo scraper on Apify Store that handles all of these challenges:

👉 Yiwugo Scraper on Apify Store

It supports keyword search, category browsing, pagination, and proxy rotation out of the box.

Platform Deep Dive: DHgate.com

What is DHgate?

DHgate is one of China's largest cross-border e-commerce platforms, connecting Chinese manufacturers directly with international buyers. It's particularly popular with dropshippers because of its low MOQs (often just 1 piece) and built-in buyer protection.

Data You Can Extract

{
  "title": "Wireless Bluetooth Earbuds TWS 5.3",
  "price": "$3.82 - $5.47",
  "originalPrice": "$7.64",
  "discount": "50% OFF",
  "minOrder": "1 piece",
  "sold": "2,847 sold",
  "sellerName": "Shenzhen Digital Store",
  "sellerRating": "97.8%",
  "freeShipping": true,
  "productUrl": "https://www.dhgate.com/product/...",
  "imageUrl": "https://image.dhgate.com/..."
}
Enter fullscreen mode Exit fullscreen mode

Technical Approach

DHgate renders product listings with a mix of SSR and client-side hydration. For search results, a Cheerio-based approach works well:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $, log }) {
        const products = [];

        $('.gallery-item').each((i, el) => {
            products.push({
                title: $(el).find('.product-title').text().trim(),
                price: $(el).find('.price-current').text().trim(),
                sold: $(el).find('.sold-count').text().t,
                seller: $(el).find('.seller-name').text().trim(),
                url: $(el).find('a.product-link').attr('href'),
            });
        });

        log.info(`Found ${products.length} products on ${request.url}`);
    },
});

await crawler.run(['https://www.dhgate.com/wholesale/search.do?searchkey=bluetooth+earbuds']);
Enter fullscreen mode Exit fullscreen mode

Key Challenges

  1. Dynamic pricing — Prices change based on quantity tiers and promotions
  2. Anti-bot measures — Cloudflare protection on some pages
  3. Pagination limits — Search results cap at ~40 pages
  4. Image CDN — Product images use a separate CDN with transformation parameters

Ready-to-Use Tool

👉 DHgate Scraper on Apify Store

Handles search, category browsing, seller filtering, and automatic proxy rotation.

Platform Deep Dive: Made-in-China.com

What is Made-in-China.com?

Made-in-China.com (MIC) is a B2B platform focused on connecting international buyers with Chinese manufacturers. Unlike DHgate (which targets individual buyers), MIC is designed for bulk purchasing and OEM/ODM sourcing. It's where you find factories, not resellers.

Data Can Extract

{
  "title": "CNC Machining Aluminum Parts Custom Manufacturing",
  "price": "US $0.5-50 / Piece",
  "minOrder": "100 Pieces",
  "supplier": "Shenzhen Precision Machinery Co., Ltd.",
  "supplierType": "Gold Member",
  "verified": true,
  "yearsOnPlatform": 8,
  "location": "Guangdong, China",
  "productUrl": "https://www.made-in-china.com/...",
  "imageUrl": "https://image.made-in-china.com/..."
}
Enter fullscreen mode Exit fullscreen mode

Technical Approach

MIC's search results are server-side rendered, making them the easiest of the three platforms to scrape:

```javascript CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
async requestHandler({ request, $, log }) {
const products = [];

    $('.prod-list .prod-item').each((i, el) => {
        products.push({
            title: $(el).find('.prod-name a').text().trim(),
            price: $(el).find('.prod-price').text().trim(),
            moq: $(el).find('.prod-moq').text().trim(),
            supplier: $(el).find('.company-name a').text().trim(),
            location: $(el).find('.company-location').text().trim(),
        });
    });

    log.info(`Extracted ${product products`);
},
Enter fullscreen mode Exit fullscreen mode

});

awaiter.run(['https://www.made-in-china.com/products-search/hot-china-products/CNC_Parts.html']);




### Key Challenges

1. **Detail page protection** — Product detail pages are behind FCaptcha verification
2. **Supplier verification data** — Some verification badges require authenticated access
3. **Contact information** — Supplier contact details are partially hidden
4. **Large result sets** — Popular categories can have 100,000+ listings

### Ready-to-Use Tool

👉 [Made-in-China Scraper on Apify Store](https://apify.com/jungle_intertwining/made-in-china-scraper)

Extracts search results with full product and supplier metadata, supports keyword search and pagination.

## Anti-Detection Best Practices

China wholesale platforms have varying levels of anti-scraping protection. Here's what works across all three:

### 1. Rotate Proxies

Never scrape from a single IP. Use residential proxies for best results:



```javascript
const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://user:pass@proxy1.example.com:8080',
        'http://user:pass@proxy2.example.com:8080',
    ],
});
Enter fullscreen mode Exit fullscreen mode

Apify's built-in proxy pool handles this automatically when you run actors on the platform.

2. Respect Rate Limits

Add delays between requests to avoid triggering rate limiters:

const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 3,
    maxRequestsPerMinute: 30,
    // ...
});
Enter fullscreen mode Exit fullscreen mode

3. Rotate User Agents

Vary your User-Agent header to look like different browsers:

const userAgents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36...',
];
Enter fullscreen mode Exit fullscreen mode

4. Handle Cloudflares

When you hit Cloudflare protection (common on DHgate), switch to a browser-based approach:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    headless: false, // Headed mode passes more challenges
    browserPoolOptions: {
        useFingerprints: true,
    },
    // ...
});
Enter fullscreen mode Exit fullscreen mode

5. Cache and Deduplicate

Don't re-scrape data you already have. Use request queues with deduplication:

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({
    url: productUrl,
    uniqueKey: productId, // Prevents duplicate scraping
});
Enter fullscreen mode Exit fullscreen mode

Cross-Platform Price Comparison: A Real Example

Here's a practical workflow that combines data from all three platforms:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

async function comparePrices(keyword) {
    // Run all three scrapers in parallel
    const [yiwugo, dhgate, mic] = await Promise.all([
        client.actor('jungle_intertwining/yiwugo-scraper').call({
            keyword,
            maxProducts: 20,
        }),
        client.actor('jungle_intertwining/dhgate-scraper').call({
            keyword,
            maxProducts: 20,
        }),
        client.actor('jungle_intertwining/made-in-china-scraper').call({
            keyword,
            maxProducts: 20,
        }),
    ]);

    // Collect results
    const results = {
        yiwugo: await client.dataset(yiwugo.defaultDatasetId)
            .listItems().then(r => r.items),
        dhgate: await client.dataset(dhgate.defaultDatasetId)
            .listItems().then(r => r.items),
        mic: await client.dataset(mic.defaultDatasetId)
            .listItems().then(r => r.items),
    };

    // Comparage prices
    for (const [platrm, items] of Object.entries(results)) {
        const avgPrice = items.reduce((sum, item) => {
            const price = parseFloat(item.price?.replace(/[^0-9.]/g, '') || 0);
            return sum + price;
        }, 0) / items.length;

        console.log(`${platform}: ${items.length} products, avg price: $${avgPrice.toFixed(2)}`);
    }
}

comparePrices('bluetooth earbuds');
Enter fullscreen mode Exit fullscreen mode

For a complete cross-platform toolkit with supplier ranking and price comparison scripts, check out the GitHub repository:

👉 China Wholesale Scraper Toolkit

Common Use Cases

1. Dropshipping Product Research

Use DHgate data to find products with high sales volume and good margins. Filter by:

  • Price under $10 (for 3-5x markup potential)
  • Seller rating above 95%
  • Free shipping available
  • 100+ units sold

2. Wholesale Price Benchmarking

Compare the same product category across all three platforms to find the best wholesale price. Yiwugo typically has the lowest prices for small commodities, while Made-in-China.com offers better rates for bulk manufacturing orders.

3. Supplier Verification Pipeline

Buiomated pipeline that:

  1. Scrapes supplier lists from Made-in-China.com
  2. Filters by verification status, years on platform, and location
  3. Cross-references with Yiwugo for alternative suppliers
  4. Outputs a ranked shortlist for manual review

4. Market Trend Analysis

Run weekly scrapes across all platforms for your product categories. Track:

  • New product listings (emerging trends)
  • Price movements (supply/demand shifts)
  • New supplier entries (market competition)
  • Category growth rates

5. Competitive Intelligence

Monitor your competitors' supplier platforms. If they're sourcing from Yiwugo, you can find the same suppliers and negotiate directly — or find better alternatives on Made-in-China.com.

Legal and Ethical Considerations

Web scraping operates in a legal gray area. Here are guidelines to stay on the right side:

  1. Respect robots.txt — Check each platform's robots.txt before scraping
  2. Don't overload servers — Use reasonable rate limits and concurrency
  3. Public data only — Only scrape publicly accessible information
  4. No login circumvention — Don't bypass authentication walls
  5. Commercial use — Review each platform's Terms of Serviceegarding data usage
  6. Data storage — Handle collected data responsibly, especially supplier contact information

Getting Started

The fastest way to start collecting China wholesale data:

  1. Create a free Apify account at apify.com
  2. Pick a scraper based on your use case:
  3. Enter your search keywords and run
  4. Export results as JSON, CSV, or Excel
  5. Schedule recurring runs for ongoing monitoring

Each tool comes with detailed documentation, example outputs, and FAQ sections on their Apify Store pages.

What's Next

I'm actively developing more tools for the China wholesale ecosystem:


Building tools to make China wholesale data accessible to everyone. Questions or feature requests? Open an issue on GitHub or leave a comment below.

Top comments (0)