DEV Community

Pierre POMES
Pierre POMES

Posted on

myanon — how I anonymize 200GB of MySQL nightly for GDPR-safe dev

I've been managing MySQL infrastructure for over 15 years. One problem that never had a clean solution: getting production data to developers safely.

With large databases, the usual approach — dump, load, then anonymize — takes forever. I needed something that anonymizes on-the-fly, during the dump itself.

Every night, I transfer around 200GB of production databases. By morning, developers have fresh, realistic data to work with — fully anonymized, GDPR-safe. One command, no intermediate files.

The workflow

mysqldump mydb \
  | tee >(myanon -f myanon.conf | gzip > mydb_anon.sql.gz) \
  | gpg -e -r me@domain.com > mydb.sql.gz.gpg
Enter fullscreen mode Exit fullscreen mode

Real backup goes to GPG, anonymized version goes to gzip. Two outputs, one pass, no temp files.

How it works

  • Streaming: anonymizes on-the-fly
  • Fast, low memory: tables not in config pass straight through — no unnecessary parsing
  • Single binary: written in C, easy to deploy — also available as a Docker image

Configuration

secret = 'your_hmac_secret'

tables = {
  `users` = {
    `name`  = texthash 8
    `email` = emailhash 'example.com' 20
    `phone` = fixednull
  }
}
Enter fullscreen mode Exit fullscreen mode

Deterministic hashing: same input → same output. Foreign keys just work.

Built-in features:

  • Hash names, emails, integers
  • Set fields to NULL or fixed values
  • Anonymize inside JSON columns
  • Truncate entire tables
  • And more

Extensible: need realistic names? Plug in Python functions (e.g., Faker).

Try it

Feedback welcome — open an issue if you hit an edge case.


How do you get production data to developers safely?

Top comments (0)