Medical records can be re-identified
Dr Vanessa Teague: Report reveals Australian medical records caught up in open data bungle.
The confidential medical records of one in ten Australians were exposed by the Department of Health in a bungled release last year, a new report has revealed.
The privacy breach was discovered by a team of researchers at the University of Melbourne - Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague - who informed the government of the vulnerability in December last year. They have now warned that the government should not release any anonymised data that refers to individuals as it could easily be re-identified using the same method.
The researchers found that the anonymised data could be re-identified as an individual using readily available information, such as date of birth and gender. While this method puts more prominent individuals at risk, it also exposes everyday Australians to data breaches, especially from banks or insurance agencies, the report said.
In August 2016 the Department of Health released de-identified historical health data from the Australian Medicare Benefits Scheme and the Pharmaceutical Benefits Scheme on data.gov.au to contribute to “research, community information, policy development and policy evaluation”.
The data included de-identified longitudinal medical billing records of 2.9 million Australians - 10 per cent of the population - including all publicly reimbursed medical and pharmaceutical bills between 1984 and 2014.
Soon after the data was published by the government, the same researchers discovered that although the suppliers’ IDs were encrypted, they could be easily decrypted. The datasets were soon taken offline, but it has now been revealed that the highly sensitive patient data was also exposed due to the ease with which it can be re-identified.
The new research report shows that this confidential data can be re-identified without the use of decryption using known information about the person to find their record. The research team were able to find the patient records of seven prominent Australians using the method, including three former or current MPs and an AFL footballer.
This was done through using publicly available information regarding injuries or surgeries, and then matching it to the de-identified data posted online. The report found that this process was “straightforward for anyone with technical skills about the level of an undergraduate computing degree”.
“We found that patients can be re-identified, without decryption, through a process of linking the encrypted parts of the record with known information about the individuals such as medical procedures and year of birth. This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy,” Dr Culnane said.
A Department of Health spokesperson said it is taking the revelations “very seriously” and has referred it to the Privacy Commissioner.
“The project was halted and remains halted, and the dataset was removed immediately. This matter dates back to 2016 and since then the Australian government has taken further steps to protect and manage data. The Department has not been made aware of anyone being identified,” the spokesperson said.
The health department last year confirmed that the dataset had been downloaded 1500 times before it was taken down, with 500 of these downloads coming from academic or government domains. The rest of the downloads came from private sector firms like health insurance companies and consultancies.
The Australian Information and Privacy Commissioner confirmed that it had opened an investigation into the potential breach in September last year.
“Realising the value of public data to innovations that benefit the community at large is dependent on the public’s confidence that privacy is protected. The OAIC continues to work with Australian government agencies to enhance privacy protections in published datasets,” privacy commissioner Timothy Pilgrim said.
The federal opposition slammed the government over the latest privacy breach, with Shadow Health Minister Catherine King questioning why the government never informed the public that their private health records had potentially been exposed.
“It is absolutely disgraceful that the Turnbull government has been sitting on this information for 12 months - and hasn’t even had the decency to inform Australians about it. The Turnbull government should now have the decency to inform Australians if their health data has been compromised - as they would have to do under the government’s own mandatory data breach legislation starting next year,” Ms King said.
The Greens have also taken aim at the government, with Senator Jordon Steele-Young saying that while the release of data is important, it must be done in a way that is secure.
“It is critical that this kind of information is only ever given out in a secure research environment with greater control and visibility for patients over their data. Legislating against misuse of this kind of data will not stop it occurring, especially when it is this easy to re-identify individual records. What are the implications for other publicly released datasets that are supposedly ‘de-identified’ and secure?” Senator Steele-Young said.
“This technologically inept government continues to show it has no respect for the privacy of everyday Australians and is incapable of leading this country into the digital future.”
The researchers found that the data could be de-identified using basic information including an individual’s age and gender. While this would still lead to several matches, once more information is matched, risks of identification becomes higher.
“When the set of possible matches is small enough to inspect manually, the person’s privacy is seriously at risk, if that person is in the dataset,” the report said.
The report used childbirth as an example of how publicly available information could be used to de-identify the data, especially when the birth is unusual due to the woman’s age or the number of children they have.
The researchers found three prominent Australians in the dataset just from reported information on their childbirths. Other information like sports injuries and surgeries was also be used to find individual patient records in the de-identified dataset.
While this puts prominent individuals at risk, everyday Australians have also been exposed to risk from banks and insurance agencies.
“A private health insurer could efficiently track the medical records of past customers through the decades of data, or derive extra information they didn’t know about from current customers. This would be a clear breach of privacy that would possibly never be reported, even though the data could lead to detrimental decisions for the individual in the future,” the report said.
Following the initial reports last year that the health data was vulnerable to re-identification last year, the government moved to amend the Privacy Act to criminalise the re-identification of published government data. This would introduce punishment of up to two years in jail and large fines for this, but the reforms are still yet to pass the Senate after they were criticised by the Opposition and the Greens.
Concerns surround the impact of the legislation on researchers in the sector, and if they would actually prevent the re-identification of sensitive data at all.
The University of Melbourne researchers said the proposed reforms would have no positive impact.
“The proposed amendments to the Privacy Act to criminalise re-identification will not solve these problems. It will make them harder to detect, understand and avoid. It inhibits open public analysis and discussion, and hence makes personal data less secure,” the report said.
Dr Teague said the latest revelations have wider implications on other de-identified data that is released by the government.
“Open publication of de-identified records like health, tax or Centrelink data is bound to fail as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records. We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data,” Dr Teague said.
The researchers concluded that the government should no longer release anonymised data at an individual level.
“One thing is certain: open publication of de-identified data is not a secure solution for sensitive unit-record level data. We support the program of making more data more easily available to facilitate research, innovation and sound public policy. However, there is an important technical and procedural problem to solve: there is no good solution for publishing sensitive unit-record level data that protects privacy without substantially degrading the usefulness of the data,” the report said.
“Policy should be made with a clear understanding of the technical ease and serious consequences of re-identification.”