Ever wondered which companies use Shopify? Or how many businesses have adopted Stripe? What about finding every site running on Next.js?
I've spent the last few months crawling the web to answer these questions, and today I'm open-sourcing the entire dataset.
What I Built
tech-stack-datasets is a free, open-source collection of company data grouped by the technologies they use. The full database contains 51.3 million companies across 403 technologies, with 500 sample records per technology available as open data.
Think of it as getting a solid sample of the technographic landscape without any barriers.
The data covers 403 different technologies across categories like:
- E-commerce platforms (Shopify, WooCommerce, Magento)
- Payment processors (Stripe, PayPal, Adyen)
- CMS systems (WordPress, Webflow, Contentful)
- Analytics tools (Google Analytics, Mixpanel, Amplitude)
- Cloud providers (AWS, Google Cloud, Azure)
- And hundreds more...
Why This Exists
As a developer who's worked on several SaaS products, I kept running into the same problem: finding potential customers is hard.
If you're building a Shopify app, wouldn't it be helpful to know which companies use Shopify? If you're selling a WordPress plugin, where do you even start looking for leads?
Traditional sales tools charge $99+/month for this data. I thought developers deserved better.
Real-World Use Cases
π― For Sales & Marketing
# Each dataset contains 500 sample companies
$ cat companies-using-stripe.csv | wc -l
500
That's 500 verified examples to start your research. The full dataset (68,072 companies) is available through the pro version.
π For Competitive Analysis
Want to know who your competitors' customers are? Cross-reference tech stacks:
# Companies using both Shopify AND Klaviyo
comm -12 <(sort companies-using-shopify.csv) \
<(sort companies-using-klaviyo.csv)
π¬ For Market Research
Track adoption trends over time. The data updates daily, so you can monitor:
- How fast is Next.js growing vs. traditional frameworks?
- Which e-commerce platform is gaining market share?
- What's the average tech stack for a successful SaaS company?
π€ For Data Scientists
Train ML models on real tech adoption patterns:
import pandas as pd
# Load multiple datasets
shopify = pd.read_csv('companies-using-shopify.csv')
stripe = pd.read_csv('companies-using-stripe.csv')
# Analyze correlation between technologies
merged = pd.merge(shopify, stripe, on='domain', how='inner')
print(f"Overlap: {len(merged) / len(shopify) * 100:.1f}%")
What's In The Data?
Each dataset provides 500 sample records from the complete database. Each company record includes:
- Company name and domain
- Geographic location (country/state)
- Technology stack detected
- Service type (B2B, B2C, etc.)
- Quality scores for data verification
- Last verified date
Here's a sample:
company_name,domain,country,technology,last_verified
Acme Corp,acme.com,United States,Shopify,2026-02-08
TechStart,techstart.io,United Kingdom,Next.js,2026-02-08
Why 500 records? It's enough to:
- Test your analysis workflows
- Understand market patterns
- Build proof-of-concepts
- Validate your targeting strategy
Need the full dataset? That's what Leadita Pro is for.
How I Built This
The crawler runs on a distributed system that:
- Fetches website HTML and JavaScript bundles
- Identifies technology fingerprints (CDN URLs, meta tags, script signatures)
- Validates findings with multiple detection methods
- Stores results in normalized CSV/JSON formats
- Re-crawls daily to keep data fresh
Detection accuracy: ~96% based on manual spot-checks against 1,000 random samples.
Browse by Technology
Each technology has 500 sample companies available. Here are some popular ones:
JavaScript Frameworks:
- Companies using React (500 samples from 12,411 total)
- Companies using Next.js (500 samples from 340,205 total)
- Companies using Vue.js (500 samples)
Backend Stacks:
- Companies using Laravel (500 samples from 5,482 total)
- Companies using Django (500 samples from 597 total)
- Companies using Ruby on Rails (500 samples from 8,222 total)
SaaS Tools:
- Companies using HubSpot (500 samples from 86,132 total)
- Companies using Intercom (500 samples from 21,293 total)
- Companies using Segment (500 samples from 8,573 total)
Getting Started
Download Individual Datasets
# Clone the repo
git clone https://github.com/leadita/tech-stack-datasets.git
# Navigate to a specific technology
cd tech-stack-datasets/leads/companies-using-shopify
# The data is available in both CSV and JSON
ls
# shopify-companies.csv
# shopify-companies.json
Quick Analysis with jq
# Count companies by country
cat companies-using-stripe.json | \
jq -r '.[] | .country' | \
sort | uniq -c | sort -rn | head -10
Load into PostgreSQL
CREATE TABLE companies (
name VARCHAR(255),
domain VARCHAR(255) PRIMARY KEY,
country VARCHAR(100),
technology VARCHAR(100),
verified_date DATE
);
COPY companies FROM '/path/to/companies-using-shopify.csv'
DELIMITER ',' CSV HEADER;
-- Query insights
SELECT country, COUNT(*) as total
FROM companies
WHERE technology = 'Shopify'
GROUP BY country
ORDER BY total DESC
LIMIT 10;
What's Free vs. What's Pro
Free (Open Source):
- β 500 sample records per technology
- β 403 technologies covered
- β Daily updated data
- β CSV & JSON formats
- β No registration required
Pro Version:
- π Full datasets (millions of records)
- π Verified email addresses & phone numbers
- π API access
- π Historical data & trends
- π Custom filtering & exports
The sample data is genuinely useful for:
- Exploratory analysis - Test your hypotheses before committing
- Proof of concepts - Validate your product-market fit
- Learning & research - Study tech adoption patterns
- Portfolio projects - Build data apps with real data
For sales teams needing thousands of leads with contact info, check out Leadita Pro.
Current Stats
In the Open Repo:
- 200,000+ sample records (500 per tech Γ 403 technologies)
- 403 technologies tracked
- Daily updates for data freshness
- 100% open source (MIT license)
Total Database:
- 51.3M+ companies indexed
- Billions of data points
- Daily crawls across the web
Roadmap
Some features I'm working on:
- [ ] Historical data (track tech adoption over time)
- [ ] API access for programmatic queries
- [ ] More granular tech detection (framework versions, specific libraries)
- [ ] Company employee count estimates
- [ ] Funding/revenue data integration
Contributing
Found a bug? Have suggestions? Want to add new technologies to track?
- Issues: GitHub Issues
- Discussions: Share your use cases and ideas
- PRs: Always welcome!
Try It Yourself
Pick a technology you're interested in and explore who's using it:
- Browse the full list of technologies
- Download the CSV/JSON for any tech
- Run your own analysis
- Share what you find!
Links:
- GitHub Repo: leadita/tech-stack-datasets
- Website: leadita.com
- Questions? Drop a comment below π
If you find this useful, please β the repo on GitHub. It helps others discover the project!
Building tools that should be free. Follow along for more open data projects.
Top comments (0)