Use of de-identified data questioned

James Riley
Editorial Director

More significant data breaches are “inevitable” until Australian governments engage in an open discussion about the use of de-identified personal data, according to the University of Melbourne’s Dr Chris Culnane.

Dr Culnane, along with colleagues Professor Vanessa Teague and Professor Ben Rubenstein, recently revealed that data released by the Victorian government from 15 million myki public transport cards could easily be re-identified, potentially allowing for an individuals’ movements over the last four years to be tracked.

Using the dataset the researchers quickly found themselves able to trace their public transport movements. They were also able to find people they had travelled within the dataset, and find a state politician in the dataset by simply matching his tweets with the touch-on and touch-off data.

Vic Privacy team: Rachel Dixon, Sven Bluemmel and Joanne Kummrow

“Ordinary travellers are very easily and confidently identifiable from the published Myki data,” the report said.

“It takes about three points with dates to identify a complete stranger, but only one with an exact time to identify someone who traveled with you. This then allows the retrieval of all their travel records for months or years.”

The Office of the Victorian Information Commissioner (OVIC) also conducted an inquiry into the data dump by Public Transport Victoria (PTV) and found that the department breached privacy laws, with a compliance notice issued.

“Your public transport history can contain a wealth of information about your private life. It reveals your patterns of movement or behaviour, where you go and who you associate with. This is information that I believe Victorians expect to be well-protected,” Victorian Information Commissioner Sven Bluemmel said.

The state government has denied any wrongdoing, and says that data released is not personal information, and therefore doesn’t constitute a data breach.

The latest research demonstrates the dangers of sharing and releasing de-identified data and assuming this information is safe and secure, Dr Culnane said.

He says this should lead to a wider debate about this practice, especially with the imminent launch of the Consumer Data Right (CDR) and ongoing issues surrounding My Health Record.

“Longitudinal detailed data really can’t be de-identified. There’s far too much information on each individual. The problem is there’s a lot of information being exchanged on the basis that it’s de-identified, but that’s a classification of convenience to transfer the data,” Dr Culnane told

“There’s a lot of pressure to maintain this fallacy that it works. We really need to learn the lesson that this as a process doesn’t work and we can’t rely on it to protect us.”

Instead of arguing over semantics and denying wrongdoing, governments need to have an open discussion with the general public over the sharing of their data, Dr Culnane said.

“This isn’t the first time the data has been re-identified. It’s a problem that keeps on happening, and part of that problem is there isn’t an open and transparent discussion when it goes wrong,” he said.

“The fact that it happened is concerning and the fact that we’re now in a situation with denials about whether this is personal information clouds the lessons that can be learned from this.”

The public release of de-identified data, such as the PTV data dump, is also just the “most visible part of a much larger problem”, with similar data being shared behind closed doors by governments outside of the jurisdiction of privacy acts.

“A more serious problem is re-identifiable datasets that have been shared without the affected people ever finding out. This makes it impossible for them to protect themselves or to make fully-informed decisions about further sharing,” the researchers said.

The federal government is also using the de-identification of data to justify its release through schemes such as the CDR and My Health Record, Dr Culnane said.

“We need to start having a conversation about de-identification. It still purveys a lot of the regulation that we have in Australia, and even makes an appearance in the CDR legislation,” he said.

“We need to start having this conversation openly about this not working and figure out where we go from here to enable the data economy in a way that protects the individual.”

Under the recently passed CDR legislation, consumers will be given the option to have their data deleted or de-identified once it is no longer needed.

“That never should have been introduced – it should have been left off. You can still do data sharing but you should have to get consent of the person – we need to move into a world in which we are asking those consent questions and evaluating whether consent is being obtained,” Dr Culnane said.

“You can still share the data but ultimately control rests with the individual when and where they share their data,” he said.

Last year PTV made public the touch-on and touch-off information from 15 million Myki cards from July 2015 to June 2018 for a data hackathon event.

The state government said the longitudinal dataset was de-identified, with the corresponding Myki card number replaced with a new randomised number, but this number was still linked with all other trips made with that card.

The data was publicly available via a URL and access was not restricted to participants in the hackathon.

The same researchers last year discovered that the confidential medical records of one in 10 Australians were exposed by the Department of Health in a bungled release of de-identified data in 2016.

A similar refusal to admit any wrongdoing from the federal government made a similar breach such as the Victorian public transport dataset inevitably, the researchers said.

“The Australian government’s response to such a serious error in judgement was not to retreat from increased sharing. Instead even greater sharing is being facilitated, without clear privacy protections,” they said.

“The Victorian government’s re-identifiable data release is a predictable consequence of the Australian government’s refusal to deal with the [health] identifiable data release. The mistakes will keep being repeated until the failings are acknowledged and addressed in a transparent manner.”

The latest data breach involving the Myki public transport dataset is especially troubling as it could allow an individual to track their former partner, for example, Dr Culnane said.

“There’s a lot of concern particularly around vulnerable groups. If they travelled with someone they’re no longer with, that person could use their card to find that person,” he said.

OVIC’s report found that the department had breached the Privacy and Data Protection Act 2014, with the Information Commissioner issuing PTV with a compliance notice requiring it to strengthen its policies and procedures, improve data governance and introduce training and reporting.

“This matter demonstrates the challenges in identifying privacy risks in large, complex datasets and the needs for the Victorian public sector, which possesses many large and sensitive data holdings, to have a high level of data literacy,” Mr Bluemmel said.

“PTV’s decision-making processes were not clear or well-documented and appeared to lack both the support of an effective enterprise risk management framework and suitable rigour in the application of a risk-management process.”

While agreeing to go along with the conditions set out in the compliance notice, PTV said it “does not accept the findings” that the information disclosed was “personal information” or that it constituted a breach of individuals’ privacy.

Similar datasets should not be made public in the future, the OVIC said in its report.

“Where a dataset contains unit-level data about individuals, especially where it contains longitudinal unit-level data about behaviour, more recent research indicates such material may not be suitable for open release, even where extensive attempts have been made to de-identify it,” the report said.

There’s also a need for an uplift in the data capability and knowledge within the Victorian public sector to avoid similar privacy breaches in the future, Mr Bluemmel said.

“This incident demonstrates the challenges in identifying privacy risks in large and complex datasets. It is important the Victorian public sector, which possesses many large and sensitive data holdings, have a high level of data literacy,” he said.

Do you know more? Contact James Riley via Email.

Leave a Comment

Related stories