Anonymization of personal data with applications in R

Dozierende

Matthias Templ, Institut of Data Analysis and Process Design, Zurich University of Applied Sciences

Abschluss Teilnahmebestätigung
Zielpublikum

Novice and advanced R users from all professional groups.

Kosten
  • CHF 400.- für Angehörige der UZH/ETH und assoziierter Institute
  • CHF 600.- für Alumni der UZH/ETH, Angehörige anderer Universtitäten, Einrichtungen der öffentlichen Hand und non-profit Organisationen
  • CHF 800.- für Firmen
Personen ohne Anstellung können sich auf Anfrage zum UZH/ETH Preis anmelden.
Kurssprache Englisch
Beschreibung

New technologies and research in the field of machine learning and deep learning methods and new ways of accessing, integrating and analyzing sensitive personal data increase the demand for solutions to be able to respect laws on data privacy and confidentiality. Fields of applications include official statistics and social sciences, financial transactions, social network activities, location trajectories, CRM, insurance data and medical records.
New data protection regulations, that especially include high penalties for violating privacy, put the topic of statistical disclosure control in focus.

Pseudo-anonymization (with salting and hashing) should not be mixed with anonymization, because it does not prevent the successful re-identification of persons, which is typically done by a combination of attributes. Statistical disclosure control includes the measurement of the re-identification risk of persons in a data set, the anonymization of data and the measurement of the information loss after anonymization. After anonymization the data include no link to persons, and thus all the rules on privacy do no longer apply.

The lessons learnt will include

  • the knowledge of the basic methods on estimating the re-identification risk of persons (k-anonymity, suda, individual and global risk), data anonymization (recoding, local suppression, pram, noise, aggregation) and estimating the data utility after anonymization;
  • the basic understanding of a data- and use-case-driven anonymization;
  • a deep understanding of the classes and methods of the sdcMicro package.

The audience is expected to have sound knowledge in data manipulation steps in base R as well as basic statistical knowledge, e.g., mean, median, variance, correlation, regression.

Further readings, resources and documents:

Publications (selection):

Resources:

See also:

Daten

27. April 2020
[canceled]

  Nach der Anmeldung erhalten Sie zunächst eine kurze automatische Anmeldebestätigung per Email. Wenn Sie diese Email erhalten haben, sind Sie erfolgreich und verbindlich zum Kurs angemeldet. Die schriftliche Rechnung wird aus administrativen Gründen erst ca. zwei Wochen vor Kursbeginn verschickt.