logo
logo
Sign in

Data Masking: What is the difference between Pseudonymization and Anonymization?

avatar
Nilesh Parashar
Data Masking: What is the difference between Pseudonymization and Anonymization?

Both pseudonymization and anonymization are supported by the GDPR and allow for compliance with its limits. As a result, these procedures should be generalized and repeated. Individuals who have access to personal data should apply one or more of these strategies to mitigate risk, studying cyber security and automation can help decrease compliance costs.

 

Pseudonymization Techniques

 

The following are the most often used pseudonymization techniques:

 

● Encryption using a secret key: in this situation, the keyholder may easily re-identify each Data Subject by decrypting the dataset, as the Personal Data is still present in the dataset, albeit encrypted. If a state-of-the-art encryption system is used, decryption can be accomplished only with knowledge of the key.

 

● Hash function returns a fixed-size output from any size input (the input can be a single or a collection of characteristics) and cannot be reversed, thus eliminating the reversal risk associated with encryption. However, if the range of hash function input values is known, they may be replayed through the hash function to obtain the right result for a given record. Hash functions are often intended to be computationally efficient and are susceptible to brute force assaults. Additionally, pre-computed tables can be developed to enable mass reversal of a huge collection of hash values.

 

● Keyed-hash function with stored key: a hash function that accepts an additional input in the form of a secret key (this differs from a salted hash function as the salt is commonly not secret). While a Data Controller may replay the function on the attribute using the secret key, it is far more difficult for an attacker to do so without knowledge of the key, as the amount of possible combinations to test is prohibitive.

 

● Deterministic encryption or a keyed-hash function with key deletion: this approach is analogous to assigning a random number as a pseudonym to each database attribute and deleting the relationship table. This method and cyber security certifications mitigate the possibility of linking personal data in one dataset to those belonging to the same individual in another dataset that uses a different pseudonym. If an attacker uses a state-of-the-art technique, decrypting or replaying the function will be computationally challenging, as it would entail testing every potential key in the absence of the key.

 

● Tokenization is a method that is frequently used in (but is not limited to) the banking industry to substitute card ID numbers with less valuable information for an attacker. It differs from the preceding ones. It is often based on one-way encryption techniques or the assignment of a sequence number or a randomly generated number that is not mathematically derived from the original data via an index function.

 

 

Anonymization

 

An anonymization is a de-identification approach that entails the total and irreversible elimination of any information from a dataset that may be used to identify a person, either directly or when combined with additional data kept by the institution or a third party and maintaining communications security. Anonymization permanently obscures data; the procedure cannot be reversed to re-identify persons.

 

To anonymize a dataset, sufficient components for cyber threats must be deleted from it so that the Data Controller or a third party cannot use it to identify a Data Subject using "all reasonably possible means." Because completely anonymized data is not Personal Data, it is not subject to privacy and data protection laws and regulations.

 

Methods of Anonymization

 

In general, there are two distinct techniques to anonymization:

 

Randomization is a collection of approaches that vary the authenticity of data to disassociate it from the individual. If the data are sufficiently ambiguous, they cannot be linked to a particular individual. While randomization alone does not eliminate the singularity of each record because each record is still produced from a single data subject, it can defend against inference attacks/risks. It can be used with generalization techniques to give greater privacy assurances. Additional measures may be necessary to ensure that no one individual can be identified from a record. Among the randomization strategies are the following:

● Addition of noise

● Permutation

● Distinctive privacy

 

2. Generalization is the process of generalizing or diluting people's characteristics by altering the scale or order of magnitude of the corresponding scale or order of magnitude (e.g., a region rather than a city, a month rather than a week). While generalization can help prevent singling out, it does not always result in good anonymization; in particular, it requires specialized and advanced quantitative procedures to eliminate linkability and inference. It can also be filled with cyber security pg courses. Several strategies for generalization include the following:

● Aggregation

● K-anonymity

● L-diversity

● T-closeness

 

In most circumstances, none of the approaches above will suffice to completely anonymize a dataset and need to be paired with others.


Legally, the distinction between anonymized and pseudonymized data is its classification as personal data. While pseudonymous data permits some re-identification (even indirect and distant), anonymous data does not permit re-identification.  Pseudonymization is distinct from anonymization. Anonymization removes any information that may be used to identify a data subject. Pseudonymization does not eliminate all identifying information from data; rather, it minimizes the dataset's linkability to an individual's original identity and

collect
0
avatar
Nilesh Parashar
guide
Zupyak is the world’s largest content marketing community, with over 400 000 members and 3 million articles. Explore and get your content discovered.
Read more