I've been managing MySQL infrastructure for over 15 years. One problem that never had a clean solution: getting production data to developers safely.
With large databases, the usual approach — dump, load, then anonymize — takes forever. I needed something that anonymizes on-the-fly, during the dump itself.
Every night, I transfer around 200GB of production databases. By morning, developers have fresh, realistic data to work with — fully anonymized, GDPR-safe. One command, no intermediate files.
The workflow
mysqldump mydb \
| tee >(myanon -f myanon.conf | gzip > mydb_anon.sql.gz) \
| gpg -e -r me@domain.com > mydb.sql.gz.gpg
Real backup goes to GPG, anonymized version goes to gzip. Two outputs, one pass, no temp files.
How it works
- Streaming: anonymizes on-the-fly
- Fast, low memory: tables not in config pass straight through — no unnecessary parsing
- Single binary: written in C, easy to deploy — also available as a Docker image
Configuration
secret = 'your_hmac_secret'
tables = {
`users` = {
`name` = texthash 8
`email` = emailhash 'example.com' 20
`phone` = fixednull
}
}
Deterministic hashing: same input → same output. Foreign keys just work.
Built-in features:
- Hash names, emails, integers
- Set fields to NULL or fixed values
- Anonymize inside JSON columns
- Truncate entire tables
- And more
Extensible: need realistic names? Plug in Python functions (e.g., Faker).
Try it
- Website: https://myanon.io
- GitHub: https://github.com/ppomes/myanon
-
Docker:
docker pull ppomes/myanon - Ubuntu PPA: https://launchpad.net/~pierrepomes/+archive/ubuntu/myanon
Feedback welcome — open an issue if you hit an edge case.
How do you get production data to developers safely?
Top comments (0)