Anonymization of personal data with applications in R

Lecturers	Matthias Templ, Institut of Data Analysis and Process Design, Zurich University of Applied Sciences
Certificate	Confirmation of participation
Target audience	Novice and advanced R users from all professional groups.
Costs	30% Corona discount for 2020 CHF ~~400.-~~ 280.- for members of UZH/ETH and associated institutes CHF ~~600.-~~ 420.- for alumni of UZH/ETH, members of other universities, the public sector and non-profit organizations CHF ~~800.-~~ 560.- for companies Persons without current employment can register for the UZH/ETH fee upon request.
Course language	English
Course description	New technologies and research in the field of machine learning and deep learning methods and new ways of accessing, integrating and analyzing sensitive personal data increase the demand for solutions to be able to respect laws on data privacy and confidentiality. Fields of applications include official statistics and social sciences, financial transactions, social network activities, location trajectories, CRM, insurance data and medical records. New data protection regulations, that especially include high penalties for violating privacy, put the topic of statistical disclosure control in focus. Pseudo-anonymization (with salting and hashing) should not be mixed with anonymization, because it does not prevent the successful re-identification of persons, which is typically done by a combination of attributes. Statistical disclosure control includes the measurement of the re-identification risk of persons in a data set, the anonymization of data and the measurement of the information loss after anonymization. After anonymization the data include no link to persons, and thus all the rules on privacy do no longer apply. The lessons learnt will include the knowledge of the basic methods on estimating the re-identification risk of persons (k-anonymity, suda, individual and global risk), data anonymization (recoding, local suppression, pram, noise, aggregation) and estimating the data utility after anonymization; the basic understanding of a data- and use-case-driven anonymization; a deep understanding of the classes and methods of the sdcMicro package. The audience is expected to have sound knowledge in data manipulation steps in base R as well as basic statistical knowledge, e.g., mean, median, variance, correlation, regression. Further readings, resources and documents: Publications (selection): Publication: Journal of Statistical Software: sdcMicro Publication about sdc app and a online test version Book on SDC in Springer Resources: sdcMicro development on github sdcMicro stable CRAN version See also: International Household Survey Network World Bank Group
Dates	Friday November 6 [canceled] new date presumably in summer 2021
	After registering you will receive a short automatic confirmation by email. If you received this email you are successfully and bindingly registered for the course. For administrative reasons the written invoice won't be sent out until about two weeks before the course.

Quicklinks and available languages

Main navigation

Anonymization of personal data with applications in R