6 Steps for Cleaning Duplicates in Your Database

Here are 6 steps for getting a handle on your org’s duplicate situation.

Working in data, you can imagine I have strong feelings about this subject! So here it is.

Duplicates are the bed bugs of nonprofit databases everywhere.

Credit: Giphy

They’re tough to spot when the problem is small. But as things get worse, look closely and you’ll start to see them all over…sucking the life out of everything good in your system.

But let’s be even more explicit. Dupes are damaging to a constituent strategy. They make it difficult to gather the right data, and can cause staff to lose confidence in your system.

And when staff don’t have faith in your database, you either get more bad data going into it, or none at all.

We can’t have that! Here are 6 steps for wrangling those duplicates and getting your org back on track.

6 Steps for Getting A Handle on Those Duplicates

If only our duplicates looked this cute.

1. Find the source.

First, figure out why duplicates exist. Are they all echoes from a distant past, or are they still entering your system today?

Obviously, the type of system plays a role. But generally speaking, your org has duplicates for one of the following reasons:

  • Users are entering them manually
  • Users/admins are mass importing them accidentally
  • You have an automation that’s creating them (automatically)

Tip: To diagnose this, you need to understand 2 things: how data enters your system AND if/how that system de-dupes automatically.

2. Then, cut off your sources.

Next, start heading off new duplicates immediately.

If it’s an automation that’s to blame, pause it and get to fixing. If it’s your staff, politely remind them that they are driving you crazy must check the system before creating records.

Before you start any cleanup, make sure your interventions worked/are preventing any new duplicates. Enter step 3.

3. Start monitoring for new duplicates.

Make sure you leverage any and all functionality to help you track new duplicates. You don’t want to start cleaning up if there’s still a sourcing issue.

What does that look like? Most likely, it will be reports of new records that you monitor on a regular basis. But if your org uses Salesforce, you’ve got a few different options.

Tip: If you use a system that doesn’t track duplicates, but charges for storage, consider that a flag when it comes time to renew.

4. Figure out the total number of duplicates in your system.

First, remember that you’ve done steps 1-3. So technically, your duplicate problem can’t get worse.

Now take a deep breath. Let’s see how bad this really is.

Figure out how many duplicates are in your system. Having that number – even if it’s just a rough estimate – is important for the next step, where we plan for the actual cleanup.

Tip: You can do this using a duplicate report, if your system allows for it. But if not, you can also export and calculate the dupes in Excel or Google Sheets.

5. Come up with a cleanup plan.

We’re data admins for peep’s sake. We don’t have time to sit and clean all these duplicate records, amirite?!

So I guess we’ll just have to make the time.

Using that rough estimate from step 4, figure out how many duplicates you’d need to clean on a daily/weekly/monthly basis, in order to finish the cleanup in a reasonable timeframe.

p.s. Consider delegating to a direct report (or even an intern). It may not be the flashiest task, but that doesn’t make it any less essential. It’s also a great way to give someone more database experience.

6. Communicate the reality to your org.

Even if duplicates have caused staff to lose confidence in your database, it’s never too late to do some damage control & educating.

After all, preventing duplicates isn’t just on admins. It’s the job of the entire organization: everyone and anyone responsible for entering data into your system.

People deserve to know where your system stands and how you’re all heading in the right direction.

Share your thoughts!