Twice this week I was asked what seems to be simple question.
We just refreshed our full Salesforce sandbox. How can we do a bulk anonymization of the sandbox’s data so it is safe for anyone in my company (or even contractors) to use?
Why would a company want to anonymize a full Salesforce sandbox?
- Anonymized sandboxes can be used to populate developer sandboxes — even if they will be used by consultants – without the risk of exposing real customer data. This can save a lot of time that consultants often take in creating test data by providing test data directly to the consultants.
- Developers can safely populate their own private sandboxes and scratch orgs with little risk of accidentally using real customer data. This technique can be used to replace two bad practices:
- Do development on a sandbox with virtually no test data.
- Do development on the same sandbox with a dozen other developers.
- Salesforce developer productivity (and quality) will soar once developers are working with a rich set of test data.
In this post I look at two approaches to create anonymized full Salesforce sandboxes.
Approach 1: Write Some Code
How hard could it be to write custom code to anonymize a sandbox? Since I am programmer, this is the first hammer I will use to solve the problem.
Logically, I want a program that looks something like the following:
Of course, for performance, operations like writing back to Salesforce would need to be batched and communication errors need to be handled, but you should get the general idea.
In fact, an actual implementation is not that difficult using the SQLForce Python module. In the following bit of working code, all that needs to be customized is the anonymizeField() function.
Be careful since this code will really reverse all Account names in a Salesforce instance! If you want to make the code like this work for you then:
- Download the free SQLForce python module from Capstorm
- Learn the basics for SQLForce
- Run the program and BE SURE to use it on a test environment. People will not be happy if you run this on production.
From the sample code, it should be obvious how to enhance to program to work with a lot of tables and fields. The job of writing the code will be a bit tedious but in the end you would have a program that can anonymize your Salesforce sandboxes.
I did not write a complete Python program for this task. Why? It was a lot easier for me to do this work with a commercial program.
Approach 2: Use a Salesforce Backup/Recovery Tool
If you have a Salesforce backup/recovery product, this may be the fastest and easiest way to anonymize a Salesforce sandbox. For example, here is is how I anonymized all of the sensitive data in one of our sandboxes and automated the process for future sandbox refreshes.
- Step 1: Create a backup of the Sandbox to a database.
- Step 2: Use the backup to restore and anonymize fields in the Salesforce Sandbox backed up in the first step.
Step 1: Backup the Sandbox to a Database
The first step was to create a backup of my Sandbox. You might not need this step, but the Salesforce recovery tool I use, CopyStorm/Restore, needs a database backup to do a recovery.
How hard was this step?
- Launch the backup program, CopyStorm.
- Enter Salesforce credentials for the Sandbox
- Choose a H2 database into which the backup will be stored. All I had to do is provide a filename (the system will create the database on the fly)
- Click on the Start Copy button.

After the copy finished, I had a complete H2 database based copy of my Salesforce in a file which I can use for restores. The next step uses the H2 database to anonymize data in the original sandbox.
Step 2: Restore Anonymized Data to the Sandbox
In this step I did an “update only” restore which replaced data in fields that I specified with anonymized data.
The first two steps were:
- Enter Salesforce and backup database credentials
- Tell the application to restore records even if the timestamps in the backup and in Salesforce match.

Setting the “Ignore Timestamps” parameter is required. If not set then then CopyStorm/Restore will assume that no records need to be written to Salesforce since the timestamp in the database and the timestamp in Salesforce will always match.

The final step before starting the anonymization process is to tell CopyStorm/Restore which fields to anonymize. This is a three step process.
- Pick a table to restore.
- Set an option to only allow updates.
- Unselect all fields that SHOULD NOT be anonymized.
- Choose the anonymization approach for the remaining fields.

Once the fields to be anonymized for a table have been selected, then specify the anonymization rules.

Repeat this same procedure for each table containing fields to be anonymized.
The two final steps are the easiest.
- Save your restore rules using the File/Save menu item. (You may want to use these rules again the next time your sandbox is refreshed).
- Start the restore.

Here is what the data looked like before anonymization.

Here is what the records looked like after CopyStorm/Restore updated them with anonymization.

The data appears unrelated to the original data — it’s anonymized!
Here is the good news! Now that I have rules for anonymizing my full sandbox I can save the rules to a file and reuse them later.
- I can reuse them in the CopyStorm/Restore GUI.
- I can run them as part of a script.
- CopyStormRestore -run AnonymizeSandbox.copyStormRestore
Wrap Up
Once an anonymized Salesforce full sandbox is created, it can be the base for a variety of powerful development practices.
- The sandbox can be used to populate developer sandboxes — even if they will be used by consultants – without the risk of exposing real customer data. This can save a lot of time consultants often take in creating test data by providing test data directly to the consultants.
- Developers can safely populate their own private sandboxes and scratch orgs with little risk of accidentally using real customer data. This technique can be used to replace two bad practices:
- Do development on a sandbox with virtually no test data.
- Do development on the same sandbox as a dozen other developers.
If a company uses Capstorm tools to do the anonymization then the next steps are:
- Do a full backup of your newly anonymized sandbox.
- Let your developers have free use of the anonymized sandbox to create their own private development and scratch orgs.
- Sit back and see how much your development team’s productivity soars when they can work with rich test data.