Anonymise your data

Anonymise your data
Last Updated on 22/02/2021
Reading time: 3 minutes

If you are working on a workbook containing confidential data, you need to anonymise your data if you are collaborating with other people. The technique is not really complex but you have to respect the following steps.

The initial document

Imagine you are a journalist and you receive the following file in your mailbox (it's all fake data). In a few months these data will be famously known as the Nicaragua Papers 😉

Before I share the workbook, let's see how we can anonymise the names.

Anonymous_Data_1

Step 1: Duplicate the anonymise column

Duplicate the column containing the names to the column H for instance.

Anonymous_Data_2

You can do this with a standard copy and paste or using the shortcut ctrl key with the mouse.

Duplicate with Ctrl

Step 2: Extract without duplicates

Then, you remove duplicates (Data>Remove Duplicates) to keep only one name.

The result then becomes the following:

Removing duplicates doesn't remove spelling mistakes. As you will have noticed, we still have Jacky Chin twice because the first name is spelled two different ways. It would be tempting to correct the error in the source file but in data analysis, the main rule is to leave the raw data as they are.

Step 3: Add the anonymous code

In the column I, we'll add the code that will replace the original names. We'll just write in the I2 cell NAME001 value.

Then, with the fill-handle, we just have to drag down to automatically increase the series of numbers 😉. The result then becomes.

Anonymous_Data_4

Step 4: Replace the original names

Now we need to replace the original name in column A by the anonymous code. To do that we'll use the VLOOKUP function.

Let's insert a column close to the original name and write the following formula

=VLOOKUP(A2,$I$2:$J$11;2;0)

$ Signs mean that references will not move. The result is now the following

Anonymous_Data_5

Step 5: Replace formulas by values

Column B now contains the results of the formula. We must convert these results into values. This is done very easily with a Copy / Paste Special Value

  • Select your cells in column B and make a simple copy (Ctrl + C)
  • Do not change your selection and select Paste Special menu by clicking the small arrow below the Paste icon.
  • In the options, select this icon to make a collage in value

All the formulas of column B have been replaced by their values in the cells.

Step 6: Remove the original name

Now we just have to delete the value in the column A

  • Select column A and right-click> Delete
  • Select column A and apply the keyboard shortcut Ctrl + -

Don't forget to keep the corresponding table name to reverse the anonymisation. 😉

Step 7: Back to the original data

To restore your data is not difficult. You just have to:

  1. Switch the column of the name (the code name must be the first column)
  2. Create one more time the VLOOKUP to return the original name
  3. Convert the formula in value
  4. Remove the anonymous name

5 Comments

  1. Chris
    25/01/2023 @ 06:45

    Excel on Mac 16.67 gives an error when trying to paste this in a cell:
    =VLOOKUP(A2,$I$2:$J$11;2;0)

    There's a problem with this formula

    Not trying to type a formula?
    When the first character is an equals sign...
    To get around this, type an apostrophe(') first...

    It's like Excel is trying to 'do me a favor' and thinks I'm not trying to enter a formula. This formula does not seem to work and excel does not accept it.

    Reply

  2. Roy
    24/06/2021 @ 02:25

    this works if you have a small set of data, but how can I do it , if I have thousands of records? And the data I need to anonymyze is a set of generated account numbers?

    Reply

    • Frédéric LE GUEN
      24/06/2021 @ 12:12

      You must create a rule to crypt your sensitive data. This isn't easy to do

      Reply

    • Yves
      10/12/2021 @ 22:42

      If you have your data in CSV, you can easily do this with Grafton. https://pypi.org/project/grafton/

      Reply

      • Frédéric LE GUEN
        11/12/2021 @ 16:26

        Python isn't easy to use. But that's a good idea

Leave a Reply

Your email address will not be published. Required fields are marked *

Anonymise your data

Reading time: 3 minutes
Last Updated on 22/02/2021

If you are working on a workbook containing confidential data, you need to anonymise your data if you are collaborating with other people. The technique is not really complex but you have to respect the following steps.

The initial document

Imagine you are a journalist and you receive the following file in your mailbox (it's all fake data). In a few months these data will be famously known as the Nicaragua Papers 😉

Before I share the workbook, let's see how we can anonymise the names.

Anonymous_Data_1

Step 1: Duplicate the anonymise column

Duplicate the column containing the names to the column H for instance.

Anonymous_Data_2

You can do this with a standard copy and paste or using the shortcut ctrl key with the mouse.

Duplicate with Ctrl

Step 2: Extract without duplicates

Then, you remove duplicates (Data>Remove Duplicates) to keep only one name.

The result then becomes the following:

Removing duplicates doesn't remove spelling mistakes. As you will have noticed, we still have Jacky Chin twice because the first name is spelled two different ways. It would be tempting to correct the error in the source file but in data analysis, the main rule is to leave the raw data as they are.

Step 3: Add the anonymous code

In the column I, we'll add the code that will replace the original names. We'll just write in the I2 cell NAME001 value.

Then, with the fill-handle, we just have to drag down to automatically increase the series of numbers 😉. The result then becomes.

Anonymous_Data_4

Step 4: Replace the original names

Now we need to replace the original name in column A by the anonymous code. To do that we'll use the VLOOKUP function.

Let's insert a column close to the original name and write the following formula

=VLOOKUP(A2,$I$2:$J$11;2;0)

$ Signs mean that references will not move. The result is now the following

Anonymous_Data_5

Step 5: Replace formulas by values

Column B now contains the results of the formula. We must convert these results into values. This is done very easily with a Copy / Paste Special Value

  • Select your cells in column B and make a simple copy (Ctrl + C)
  • Do not change your selection and select Paste Special menu by clicking the small arrow below the Paste icon.
  • In the options, select this icon to make a collage in value

All the formulas of column B have been replaced by their values in the cells.

Step 6: Remove the original name

Now we just have to delete the value in the column A

  • Select column A and right-click> Delete
  • Select column A and apply the keyboard shortcut Ctrl + -

Don't forget to keep the corresponding table name to reverse the anonymisation. 😉

Step 7: Back to the original data

To restore your data is not difficult. You just have to:

  1. Switch the column of the name (the code name must be the first column)
  2. Create one more time the VLOOKUP to return the original name
  3. Convert the formula in value
  4. Remove the anonymous name

5 Comments

  1. Chris
    25/01/2023 @ 06:45

    Excel on Mac 16.67 gives an error when trying to paste this in a cell:
    =VLOOKUP(A2,$I$2:$J$11;2;0)

    There's a problem with this formula

    Not trying to type a formula?
    When the first character is an equals sign...
    To get around this, type an apostrophe(') first...

    It's like Excel is trying to 'do me a favor' and thinks I'm not trying to enter a formula. This formula does not seem to work and excel does not accept it.

    Reply

  2. Roy
    24/06/2021 @ 02:25

    this works if you have a small set of data, but how can I do it , if I have thousands of records? And the data I need to anonymyze is a set of generated account numbers?

    Reply

    • Frédéric LE GUEN
      24/06/2021 @ 12:12

      You must create a rule to crypt your sensitive data. This isn't easy to do

      Reply

    • Yves
      10/12/2021 @ 22:42

      If you have your data in CSV, you can easily do this with Grafton. https://pypi.org/project/grafton/

      Reply

      • Frédéric LE GUEN
        11/12/2021 @ 16:26

        Python isn't easy to use. But that's a good idea

Leave a Reply

Your email address will not be published. Required fields are marked *