Anonymise your data

If you work on a workbook with confidential data, you need to anonymise your data if you collaborate with other people. The technique is not really complex but you have to respect the following steps.

The initial document

Imagine you are a journalist and you receive the following file in your mailbox (it's all fake data). These data will be soon world famous in a few months as the Nicaragua Papers 😉

Before to share the workbook, let's see how we can anonymise the name.

Anonymous_Data_1

Step 1: Duplicate the anonymise column

Duplicate the column containing the names to the column H for instance. You can do this with a traditional copy and paste or using the shortcut ctrl key with the mouse (video).

Anonymous_Data_2

Step 2: Extract without duplicates

Then, you remove duplicates (Data>Remove Duplicates) to keep only one name.

You can watch the following video to get the full manipulation (duplication of the column and removing duplicates).

The result then becomes the following:

 

 

 

 

Remove duplicates doesn't protect you of mistake. As you can noticed, we have always Jacky Chin 2 times because the first name is spelled two different ways. It would be tempting to correct the error in the source file but in data analysis, the main rule is to keep the raw data like they are.

 

 

Step 3: Add the anonymous code

In the column I, we'll add the code that will replace the original names. We'll just write in the I2 cell NAME001 value.

Then, with the fill-handle, we just have to grad and drop to automatically increases the text and the series of numbers.

The result then becomes.Anonymous_Data_4

Step 4: Replace the original names

Now, we need to replace the original name in column A by the anonymous code. To do that we'll use the VLOOKUP function.

Let's insert a column close to the original name and write the following formula

=VLOOKUP(A2,$I$2:$J$11;2;0)

$ Signs mean that references will not move. More explanation in the absolute and relative reference page.

The result is now the following

Anonymous_Data_5

Step 5: Replace formulas by values

The column B contains now the result of a formula. We must convert this result into value. This is done very easily with a Copy/ Paste special value

  • Anonymous_Data_6Select your cells in column B and make a simple copy (Ctrl + C)

  • Do not change your selection
     and select Paste Special menu by clicking the small arrow below the Paste icon.
  • Anonymous_Data_7In the options, select this icon to make a collage in value

All the formulas of the column B have been replaced by their values in the cells.

Step 6: Remove the original name

Now we just have to delete the value in the column A

  • Select column A and right-click> Delete
  • Select column A and apply the keyboard shortcut Ctrl + -

Don't forget to keep the corresponding table name to go the other way round.

 


Back to the original data

To restore your data it's not difficult. You just have to

  1. Switch the column of the name (the code name must be the first column)
  2. Create one more time the VLOOKUP to return the original name
  3. Convert the formula in value
  4. Remove the anonymous name

 

Related posts


Permanent link to this article: https://www.excel-exercise.com/anonymise-your-data/


Leave a Reply

Your email address will not be published.