The data load playbook I wish someone handed me

A pragmatic approach to imports, automation impact, validation, and rollback—so your load doesn’t become next week’s fire.

Data loads are supposed to be the “easy” work. You export a CSV, you import a CSV, everyone goes home.

And then a flow fires, an integration wakes up, a validation rule you forgot exists blocks half the rows, and suddenly you’re in detective mode with 40 tabs open.

This is the checklist-y, slightly paranoid way I run loads now—because it’s faster than cleaning up the mess later.

The three questions you answer first

1) Where is truth?

If the “source of truth” is a spreadsheet, pause. Decide what system actually owns correctness (ERP, billing, marketing platform, etc.). Otherwise you’ll “fix Salesforce” today and re-break it on the next sync.

2) What’s the blast radius?

List what can fire on the objects you’ll touch:

flows / Apex triggers
assignment rules / auto-response rules
rollups / scheduled jobs
downstream integrations

If you can’t list them, your first run is a sandbox run.

3) What’s the rollback?

Rollback is not “we’ll fix it manually.” Rollback is a plan that works when you’re tired and the business is watching.

Minimum viable rollback:

export the records you’ll touch with Id + every field you will change
keep the export somewhere safe
be able to update those values back

Add one field that pays for itself: Import Run Id

Add a text field on high-load objects:

Import_Run_Id__c

Populate it in every CSV with a unique run id:

2026-01-30_lead_cleanup_v2
billing_backfill_2026w05

This gives you:

quick validation (“show me everything we changed”)
easy reporting (“how many records did we touch?”)
targeted rollback (“undo this run”)

Matching rules that don’t create duplicates

Prefer Ids when you have them.

If you must match by External ID:

keep the list small
make it unique where possible
document which system owns it

External IDs fail when multiple systems “help.”

Automation: default ON, bypass only with control

Default stance:

If production automation can’t survive normal data change, that automation needs work.

When bypass is justified, make bypass permission-based, not record-based.

Avoid:

Bypass_All_Automation__c checkbox on records
hardcoding integration usernames into every flow

Prefer:

Custom Permission: Data_Load_Bypass
Flow entry condition: NOT($Permission.Data_Load_Bypass)

Grant bypass temporarily. Remove it afterwards.

Stage the load (don’t be clever)

If you’re touching multiple dependent fields:

Pass 1: run id + safe fields
Pass 2: fields that trigger automation
Pass 3: relationship fields (lookups) after you’ve validated keys

It’s slower. It’s stable.

Validation: counts + spot checks

Count checks:

expected rows: X
succeeded: X
failed: 0

Spot checks (pick ~20 records across segments):

lookups correct?
picklists valid?
automation didn’t overwrite your values?

Rollback: be explicit

With Import_Run_Id__c:

filter by run id
update fields back from your pre-export

If you didn’t pre-export old values, you don’t have rollback. You have optimism.

Checklist

Before:

sandbox dry-run completed
run id column present
pre-export saved (Id + impacted fields)
automation approach agreed (on vs bypass)
expected row counts agreed

After:

reconcile counts
spot-check records
monitor integrations/logs for 1–2 hours
remove bypass permissions (if used)
write one paragraph in your change log (what/why)

The data load playbook I wish someone handed me