The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.

Kohlmayer, Florian; Prasser, Fabian; Kuhn, Klaus A

doi:10.1016/j.jbi.2015.09.007

Benutzer: Gast

2015

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Titel:: The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.
Dokumenttyp:: Journal Article
Autor(en):: Kohlmayer, Florian; Prasser, Fabian; Kuhn, Klaus A
Abstract:: With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method.In this article, we first confirm experimentally that combining generalization with suppression significantly increases data utility. Next, we proof that, within this coding model, the outcome of data transformations regarding privacy and utility cannot be predicted. As a consequence, existing algorithms fail to deliver optimal data utility. We confirm this finding experimentally. The limitation of previous work can be overcome at the cost of increased computational complexity. However, scalability is important for anonymizing data with user feedback. Consequently, we identify properties of datasets that may be predicted in our context and propose a novel and efficient algorithm. Finally, we evaluate our solution with multiple datasets and privacy models.This work presents the first thorough investigation of which properties of datasets can be predicted when data is anonymized with generalization and suppression. Our novel approach adopts existing optimization strategies to our context and combines different search methods. The experiments show that our method is able to efficiently solve a broad spectrum of anonymization problems.Our work shows that implementing syntactic privacy models is challenging and that existing algorithms are not well suited for anonymizing data with transformation models which are more complex than generalization alone. As such models have been recommended for use in the biomedical domain, our results are of general relevance for de-identifying structured biomedical data. «
With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility accord... »
Zeitschriftentitel:: J Biomed Inform
Jahr:: 2015
Band / Volume:: 58
Seitenangaben Beitrag:: 37-48
Sprache:: eng
Volltext / DOI:: doi:10.1016/j.jbi.2015.09.007
PubMed:: http://view.ncbi.nlm.nih.gov/pubmed/26385376
Print-ISSN:: 1532-0464
TUM Einrichtung:: Institut für Medizinische Statistik und Epidemiologie
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2015 Fakultäten Medizin Institut für Medizinische Statistik und Epidemiologie

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Medicine and Health Departments Clinical Medicine Institut für KI und Informatik in der Medizin (Prof. Rückert)2015