DEV Community

Cover image for Shopify Bulk Operations API at Scale: A Practical Developer Guide
Muhammad Masad Ashraf
Muhammad Masad Ashraf

Posted on • Originally published at kolachitech.com

Shopify Bulk Operations API at Scale: A Practical Developer Guide

Every Shopify developer hits this wall eventually.

You need to export 200,000 orders. Or sync a 500K-product catalog. Or run a price update across every variant you carry. Standard GraphQL queries collapse under that pressure. Rate limits fire. Timeouts pile up. Your integration breaks.

The Shopify Bulk Operations API exists to fix this. It processes millions of records asynchronously, returns output as a downloadable JSONL file, and sidesteps the throttling that kills traditional query approaches.

Here is how it works, and how to scale it properly.


What Is the Shopify Bulk Operations API?

It is a subset of the Shopify GraphQL Admin API built for large-scale data retrieval and mutation. Instead of paginating through thousands of API calls, you submit one GraphQL operation and Shopify runs it server-side in the background.

When the job completes, Shopify generates a JSONL file available at a signed URL. Each line in the file is one resource object.

Two operation types exist:

Operation Type What It Does Common Use Case
bulkOperationRunQuery Exports data asynchronously Orders export, catalog dump, customer list
bulkOperationRunMutation Applies mutations to a large dataset Price updates, tag writes, metafield updates

Both follow the same lifecycle: create, poll, download.


The Operation Lifecycle

Step 1: Submit

Send a bulkOperationRunQuery mutation. Shopify queues the job and returns an operation ID with status CREATED.

mutation {
  bulkOperationRunQuery(
    query: """
    {
      products {
        edges {
          node {
            id
            title
            variants {
              edges {
                node {
                  id
                  price
                  sku
                }
              }
            }
          }
        }
      }
    }
    """
  ) {
    bulkOperation {
      id
      status
    }
    userErrors {
      field
      message
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Poll for Status

Query currentBulkOperation at intervals until status reaches COMPLETED or FAILED.

query {
  currentBulkOperation {
    id
    status
    errorCode
    createdAt
    completedAt
    objectCount
    fileSize
    url
    partialDataUrl
  }
}
Enter fullscreen mode Exit fullscreen mode

Poll every 3 to 10 seconds for small jobs. For large datasets, back off to 30 to 60 seconds.

Step 3: Download and Parse

When COMPLETED, the url field contains a signed link valid for 7 days. Parse the JSONL line by line. Parent-child relationships are encoded via __parentId on child nodes.

Status reference:

Status Meaning Action
CREATED Queued, not yet running Keep polling
RUNNING Actively processing Keep polling
COMPLETED Done Download file
FAILED Error encountered Check errorCode, retry
CANCELED Manually or auto-canceled Resubmit if needed
CANCELING Cancel in progress Wait, then resubmit

Why This Beats Standard Pagination at Scale

Standard GraphQL pagination uses cursor-based after arguments. Fine for 10,000 records. For 1 million, you make thousands of API calls and burn through your query cost budget fast.

Bulk Operations bypass per-request rate limits almost entirely. Shopify does the work server-side. You wait, then download one file.

Factor Standard Pagination Bulk Operations API
Rate Limit Exposure High Very low
Max Records Limited by throttle Millions
Processing Model Synchronous Asynchronous
Error Recovery Per-request Job-level
Data Format JSON in response JSONL file download
Best For Up to ~50K records 50K to millions

Key Constraints You Must Know

One operation per store at a time

Only one bulk operation runs per store at any moment. Submitting a second while the first runs fails immediately. In a multi-tenant app, build per-store operation locks into your job queue.

No nested mutations

Bulk mutations require a flat JSONL input file. Each line maps to one mutation call. Nested mutations inside a single bulk operation are not supported.

Stream the output file

JSONL output can reach several gigabytes. Never load it fully into memory. Use line-by-line buffered reads.

Check partialDataUrl on failure

When a job fails or gets canceled, Shopify may still produce a partialDataUrl containing whatever completed before failure. Always check this field. Process the partial data, then retry only the remaining records.


Bulk Mutations: The Real Power Move

Bulk mutations handle operations like price updates, tag management, and metafield writes across millions of records.

The flow:

1. Stage an upload via stagedUploadsCreate to get a signed PUT URL.

2. Upload your JSONL input file. Each line is a JSON object with variables for one mutation call.

{"input": {"id": "gid://shopify/ProductVariant/123456789", "price": "29.99"}}
Enter fullscreen mode Exit fullscreen mode

3. Submit the bulk mutation referencing the staged upload URL in bulkOperationRunMutation.

Output results include a __lineNumber field so you can map every success and failure back to your input file precisely.


Scaling Patterns for Production Systems

Pattern 1: Webhook-Triggered Completion

Subscribe to BULK_OPERATIONS_FINISH instead of polling. Shopify pushes completion status and the download URL to your endpoint. Your system stays idle until Shopify calls you.

Make your webhook handler idempotent. Shopify can fire the same completion event more than once.

Pattern 2: Job Orchestration Layer

For apps managing bulk operations across hundreds or thousands of merchant stores, build a layer that:

  • Queues bulk operation submissions per store
  • Tracks the active operation ID per store in a database
  • Handles completions via webhooks
  • Retries failed jobs with exponential backoff
  • Logs partialDataUrl before discarding failed results

Pattern 3: Chunked Mutation Strategy

Even though bulk mutations support millions of records, split very large jobs into chunks of 50K to 100K records per operation. Smaller jobs complete faster, fail cheaper, and produce lighter output files.

Track your last successfully completed chunk in a persistent state store. On failure, resume from the checkpoint rather than reprocessing everything.

Pattern 4: JSONL Processing Pipeline

Build a streaming parser that:

  • Reads the output file line by line
  • Reconstructs parent-child relationships via __parentId
  • Writes records to your database or downstream system
  • Tracks failed lines separately for targeted retries

Use worker processes separate from your web layer for this step. Monitor memory consumption carefully on multi-gigabyte files.


Error Codes and Retry Logic

Error Code Cause Action
ACCESS_DENIED Missing API scope Update OAuth scopes
INTERNAL_SERVER_ERROR Shopify-side failure Retry with exponential backoff
TIMEOUT Query too complex or dataset too large Simplify query, chunk inputs
TOO_MANY_FILE_STORAGE_REQUESTS Too many staged uploads in flight Throttle upload submissions

For timeouts: simplify the query first. Remove fields your downstream system does not use. A leaner query fixes most timeouts without reducing dataset scope.


Performance Benchmarks

Dataset Size Typical Completion Time Approximate File Size
10,000 products 30 to 90 seconds 5 to 15 MB
100,000 orders 3 to 8 minutes 100 to 300 MB
500,000 customers 15 to 40 minutes 500 MB to 2 GB
1M+ line items 30 to 90 minutes 2 to 10 GB

These are estimates, not guarantees. Always build for the upper end of each range. Never assume a fixed completion time in production logic.


App Architecture Implications

Bulk operations shift your bottleneck. You stop hammering the API with thousands of requests. Instead, you make a few API calls, then process a large local file. The bottleneck moves from network throughput to local compute and storage I/O.

Plan accordingly:

  • Use worker processes separate from your web layer
  • Write processed results to a fast intermediate store (Redis, PostgreSQL) before your final destination
  • Use streaming HTTP clients that do not buffer the full response body
  • Alert on jobs stuck in RUNNING state beyond expected thresholds

Separate real-time event pipelines from batch bulk operation pipelines entirely. Use event-driven patterns for real-time updates. Reserve the bulk pipeline for scheduled or triggered large-batch jobs.


Wrapping Up

The Shopify Bulk Operations API is the right tool any time your data volume pushes past what paginated GraphQL can handle cleanly. The lifecycle is simple. The constraints are manageable. The architecture patterns are proven.

Build it with webhook-triggered completion, idempotent processing, chunked mutations, and streaming JSONL parsing, and you get a system that handles enterprise-scale data without breaking under load.

Originally published on KolachiTech: https://kolachitech.com/bulk-operations-api-at-scale/


KolachiTech is a Shopify-focused development agency specializing in API architecture, ERP integrations, and scalable app development. If you need a production-grade bulk operations system built for your store or app, get in touch.

Top comments (0)