Anonymization of personal data with applications in R

Lecturers

Matthias Templ, Institut of Data Analysis and Process Design, Zurich University of Applied Sciences

Certificate Confirmation of participation
Target audience

Novice and advanced R users from all professional groups.

Costs
  • CHF 400.- for members of UZH/ETH and associated institutes
  • CHF 600.- for alumni of UZH/ETH, members of other universities, the public sector and non-profit organizations
  • CHF 800.- for companies
Persons without current employment can register for the UZH/ETH fee upon request.
Course language English
Course description

New technologies and research in the field of machine learning and deep learning methods and new ways of accessing, integrating and analyzing sensitive personal data increase the demand for solutions to be able to respect laws on data privacy and confidentiality. Fields of applications include official statistics and social sciences, financial transactions, social network activities, location trajectories, CRM, insurance data and medical records.
New data protection regulations, that especially include high penalties for violating privacy, put the topic of statistical disclosure control in focus.

Pseudo-anonymization (with salting and hashing) should not be mixed with anonymization, because it does not prevent the successful re-identification of persons, which is typically done by a combination of attributes. Statistical disclosure control includes the measurement of the re-identification risk of persons in a data set, the anonymization of data and the measurement of the information loss after anonymization. After anonymization the data include no link to persons, and thus all the rules on privacy do no longer apply.

The lessons learnt will include

  • the knowledge of the basic methods on estimating the re-identification risk of persons (k-anonymity, suda, individual and global risk), data anonymization (recoding, local suppression, pram, noise, aggregation) and estimating the data utility after anonymization;
  • the basic understanding of a data- and use-case-driven anonymization;
  • a deep understanding of the classes and methods of the sdcMicro package.

The audience is expected to have sound knowledge in data manipulation steps in base R as well as basic statistical knowledge, e.g., mean, median, variance, correlation, regression.

Further readings, resources and documents:

Publications (selection):

Resources:

See also:

Dates

April 27, 2020
[canceled]

  After registering you will receive a short automatic confirmation by email. If you received this email you are successfully and bindingly registered for the course. For administrative reasons the written invoice won't be sent out until about two weeks before the course.