A few years ago we faced an interesting problem. A client was doing a Salesforce migration project and had a 5GB Salesforce sandbox which they wanted to refresh multiple times per day. Testing the migration multiple times per day made sense but the problem was how to do this when Salesforce will only refresh a 5GB Sandbox once every seven days (and it might take a day or more for the refresh to complete!). This is the story of how the requirement was met and the sandbox was refreshed multiple times per day.
For ages, Capstorm has given away a Python module (SQLForce) which makes Python a pretty darn good platform for modifying Salesforce using plain ANSI SQL while hiding a lot of Salesforce’s governor limits. This, Python with SQLForce, is where we started.
The Basics
Let’s start with the basics of how to delete records from Salesforce using Python. For this exercise we wrote a script which deletes all Task records.
Make the Delete Script Scale to Millions of Records
Yes, deleting a bunch of task is this simple. In the SQLForce code what happens is:
- The WHERE clause of the DELETE statement is used to create a SOQL SELECT statement. In this case:
- SELECT Id FROM Task WHERE id<>null
- SQLForce runs the SELECT statement to grab Salesforce record Ids and immediately batches the Ids and sends them to Salesforce to delete records.
This approach works well for less than 1,000,000 records but for larger data sets a bit more complex technique needs to be used.
- The first step to work with millions of records is to take control of the SOQL batching by using a SOQL LIMIT option.
The next evolution in the “wipe a sandbox” script adds a method which will run any SOQL statement until it modifies no records OR encounters an error. This method, runSOQLUntilFinished(), is a workhorse the script will use for a variety of tasks.
The following script does the same thing as the first script (Deletes all Tasks) but will scale up to Salesforce instances with many millions of tasks.
Take Advantage of Salesforce Cascading Deletes
To truncate a lot of tables, the problem is much easier if we take advantage of Salesforce cascading deletes. For example:
- When a Salesforce Case is deleted all related CaseContactRole, CaseTeamMember, Tasks, etc. are automatically deleted.
The next evolution of the “Wipe a Sandbox” script adds a few more tables to wipe but picks a few which benefit from Salesforce’s cascading delete by Salesforce.
For simplicity, only the code that was modified is included in the example below (the methods runSOQLUntilFinished() and truncateTable() are still in the script…just hidden).
Deal with Salesforce Table Restrictions
At this point you may be thinking “This is easy. The script must almost be finished” and you are almost correct. There is one Salesforce restriction which we need to code around:
- Some Salesforce tables make deletion hard when their records are in particular states. Examples:
- Opportunities in a Close/Won stage are hard to delete.
- Contracts in an Activated state are hard delete.
In the next evolution of the “Wipe a Sandbox” script we code around a few Salesforce restrictions by modifying certain records before attempting to delete them.
Like in the previous code evolution example, only the modified code is shown.
Supercharge Record Deleting and Record Update Speeds!
At this point the core script for wiping a Sandbox is in place minus the details of adding additional tables to truncate. However, there is one last problem to solve:
- How can we make the record deletions run very fast?
Our first attempt used the Salesforce Bulk API (like Salesforce suggests) but the performance we saw was not nearly fast enough for a 5GB Sandbox. Yes, we were able to match the million or so records per hour Salesforce publishes in their white paper but we needed many tens of millions to meet the client’s requirements. We succeeded by uncovering a bit of little known Salesforce trivia:
- Salesforce DELETE requests often happen so fast that a program can run way more concurrently than the published governor limit of 25. How much above the governor limit? We consistently ran 100 or more concurrent deletion requests with no issues. If fact, the first iteration of the “Wipe the Sandbox” script was developed at my farm where my internet connection was only 40 down and 20 up — basically we ran out of bandwidth and still managed to delete over 24 million records per hour! If you are interested in the details then look at the whitepaper.
- Salesforce UPDATE requests often happen so fast that the governor limit of 25 concurrent operations can also be safely exceeded. The pre-delete table operations performed by the “Wipe the Sandbox” script tend to be fast enough to fit this case (though your environment may have workflows/triggers which make this not true).
This evolution of the “Wipe the Sandbox” scripts tells SQLForce to increase parallelism to 20 threads for both DELETE and UPDATE operations. In practice this means that 4000 records (200*20) will be deleted from Salesforce concurrently. In practice you can likely bump this parameter much higher.
The Finished Script
The finished script can be found at: http://download.capstorm.com/DeveloperBlog/WipeASandbox/WipeASandboxFinal.py.
To make it work for your environment:
- Install a copy of SQLForce in your Python.
- Learn how SQLForce connects to Salesforce (the session = SQLForce.Session(“WipeASandbox”) statement).
- Adjust the list of tables in the final script to fit your Salesforce Sandbox.
Please let us know if you “break any speed records” for wiping a sandbox!
Development Process Details
This script was developed using a variety of tools.
- Salesforce was used to create an empty developer Sandbox.
- CopyStorm/Restore was used to repeatedly populate the Sandbox with a rich set of data.
- The PyDev module for Eclipse was used to write and test the Python code.
- All work was done on Ubuntu (the script, however, should run on any O/S with Python and SQLForce installed)
How long did it take to write the “Wipe a Sandbox” script? The script and this article took around 3 hours to write.
The basic workflow was: