«

»

Extract without duplicates with formula

It is very easy to extract without duplicates data from a table through the menu Data>Remove duplicates. It works great for one or more columns but the thing about this tool is that you can not control data that will be delete.

I recently worked on a customer address file by keeping only the most recent data (address, phone , …). To make this work I used a formula, used the tool sort and filter to find and eliminate duplicates. Here are these steps.

Remove duplicates tool

By using the tool Data> Remove duplicates the result is irrelevant because there are differences (because of the phone number) for each line.

Remove_Duplicate_Formula_1
Remove_Duplicate_Formula_2
However, we know we have to clean our list of duplicates. The first step is to create a formula.


COUNTIF function

The trick is to find a unique identifier for each individual (in this case, the email) and count the number of times we find this ID in the list. We will us the COUNTIF function to count the number of times we find the same email in the list BUT we have to used a mixed reference.

So in a new column, we will write the following formula and copy it to all cells of this new column.

=COUNTIF($H$2:H2,H2)

The result is the following and you can see in the image below for Crystal CROFT, we find 3 times its mail in our list with the values 1, 2 and 3.

Remove_Duplicate_Formula_3

The explanation of this formula is quite simple. As we have in our formula, the first reference fixed ($H$2) and the second relative (H2), when you copy this formula with the fill-handle down, only the relative references are changing. Thus, the formula I3 only consider the mail between H2 and H3 (and therefore there, the email of Crystal Croft is present only once) and in the cell I10, this time the range is from H2 to H10. This time, the mail of Crystal Croft is present 2 times and so on.


Look at the formulas in the following image to see how changing the references when it is copied.

Remove_Duplicate_Formula_4

Sort

Yet, our formula works perfectly but do not forget to sort our table. It is compulsory to sort in descending order the column ‘Last update’  in order to have the most recent updates first. To simplify the result, we can sort the mail as first key and the last date as second key.

Go in the menu Data>Sort and apply the following keys of sorting.

Remove_Duplicate_Formula_5
After validation, all the duplicate are grouped.
Remove_Duplicate_Formula_6

Filter the data

The work is almost done. Now we just need to use the filter Excel (Data> Filter) on the column we previously added and uncheck the value 1.

Remove_Duplicate_Formula_7

This way, we only display the items that we want to remove.

Remove_Duplicate_Formula_8

Select all visible rows and remove them with a right-click> Delete Row or the shortcut Ctrl + – (minus)

 

1 comment

  1. Sujoy Chakraborty

    cool

Leave a Reply

%d bloggers like this: