Blog/The data load playbook I wish someone handed me

The data load playbook I wish someone handed me

A pragmatic approach to imports, automation impact, validation, and rollback—so your load doesn’t become next week’s fire.

Ravi Patel9/29/20253 min readDataGovernanceAdmin

Data loads are supposed to be the “easy” work. You export a CSV, you import a CSV, everyone goes home.

And then a flow fires, an integration wakes up, a validation rule you forgot exists blocks half the rows, and suddenly you’re in detective mode with 40 tabs open.

This is the checklist-y, slightly paranoid way I run loads now—because it’s faster than cleaning up the mess later.

The three questions you answer first

1) Where is truth?

If the “source of truth” is a spreadsheet, pause. Decide what system actually owns correctness (ERP, billing, marketing platform, etc.). Otherwise you’ll “fix Salesforce” today and re-break it on the next sync.

2) What’s the blast radius?

List what can fire on the objects you’ll touch:

  • flows / Apex triggers
  • assignment rules / auto-response rules
  • rollups / scheduled jobs
  • downstream integrations

If you can’t list them, your first run is a sandbox run.

3) What’s the rollback?

Rollback is not “we’ll fix it manually.” Rollback is a plan that works when you’re tired and the business is watching.

Minimum viable rollback:

  • export the records you’ll touch with Id + every field you will change
  • keep the export somewhere safe
  • be able to update those values back

Add one field that pays for itself: Import Run Id

Add a text field on high-load objects:

Import_Run_Id__c

Populate it in every CSV with a unique run id:

  • 2026-01-30_lead_cleanup_v2
  • billing_backfill_2026w05

This gives you:

  • quick validation (“show me everything we changed”)
  • easy reporting (“how many records did we touch?”)
  • targeted rollback (“undo this run”)

Matching rules that don’t create duplicates

Prefer Ids when you have them.

If you must match by External ID:

  • keep the list small
  • make it unique where possible
  • document which system owns it

External IDs fail when multiple systems “help.”

Automation: default ON, bypass only with control

Default stance:

If production automation can’t survive normal data change, that automation needs work.

When bypass is justified, make bypass permission-based, not record-based.

Avoid:

  • Bypass_All_Automation__c checkbox on records
  • hardcoding integration usernames into every flow

Prefer:

  • Custom Permission: Data_Load_Bypass
  • Flow entry condition: NOT($Permission.Data_Load_Bypass)

Grant bypass temporarily. Remove it afterwards.

Stage the load (don’t be clever)

If you’re touching multiple dependent fields:

  1. Pass 1: run id + safe fields
  2. Pass 2: fields that trigger automation
  3. Pass 3: relationship fields (lookups) after you’ve validated keys

It’s slower. It’s stable.

Validation: counts + spot checks

Count checks:

  • expected rows: X
  • succeeded: X
  • failed: 0

Spot checks (pick ~20 records across segments):

  • lookups correct?
  • picklists valid?
  • automation didn’t overwrite your values?

Rollback: be explicit

With Import_Run_Id__c:

  1. filter by run id
  2. update fields back from your pre-export

If you didn’t pre-export old values, you don’t have rollback. You have optimism.

Checklist

Before:

  • sandbox dry-run completed
  • run id column present
  • pre-export saved (Id + impacted fields)
  • automation approach agreed (on vs bypass)
  • expected row counts agreed

After:

  • reconcile counts
  • spot-check records
  • monitor integrations/logs for 1–2 hours
  • remove bypass permissions (if used)
  • write one paragraph in your change log (what/why)

Want help implementing this?

If you have a backlog and want steady delivery without surprise projects, we can handle admin-sized work under a monthly subscription.