From deep learning models that aid disease diagnosis and risk stratification to algorithms that have expanded the frontiers of personalized medicine, AI is creating extensive opportunities for better and more effective patient care. However, since AI largely depends on a large base of patients’ medical and behavioral data to catalyze these transformations, one major concern prevails - are these data protected?
Today, most people have already had contact with AI platforms, which use machine learning models to collect and use vital personal data. AI-enabled services such as Google Home, driverless cars, and social media are part of our everyday lives, collecting information about what we like, where we go, our personal beliefs, sexual orientation, and even our behavioral patterns. What we do not know, however, is what these data end up being used for and who can access them.
In 2017, Facebook created a “suicide detection algorithm” to promote awareness and prevent suicidal attempts, which had become prevalent in social media circles at the time. The algorithm draws on large datasets from Facebook posts, likes, and even typing speeds to determine your mental state and predict your tendency to commit suicide. While this may be rightly considered as a positive contribution of AI, it is also true that all of these data - which are personal - are used without one’s consent.
Similarly, in healthcare, genetic testing companies analyze patient’s genetic data and sell to Big Pharma to create advances in therapeutics and genomic medicine. Again, this is done - in almost all cases - without the patient’s consent.
For some of these companies that do as much as give information about how they will use patient’s data, they obfuscate all essential details including how long the data will be in their possession and who has access to it.
Earlier this year, GlaxoSmithKline and genetic testing company 23and Me signed a 4-year deal to allow GSK access to the large datasets of patient’s genetic information. The partnership, although with a purpose “to gather insights and discover novel drug targets and develop therapies for serious unmet healthcare needs” does not involve obtaining informed consent from the thousands of patients whose genetic data are in 23andMe’s possession.
Also, 23andMe got regulatory approval to analyze the genetic information of several patients for their risk for 10 diseases including Parkinson’s disease and Celiac disease. While the findings from these tests may help at-risk patients take steps in lowering their risk, there are concerns that insurance companies may use these data discriminately to charge higher premiums or to bias selection protocols.
Securing Personal Data: Where We have Been
In the past, experts relied on the concepts of anonymization and pseudonymization to dissociate personal identifiers (names, ID numbers) from datasets but so far, these models have been largely inadequate.
For instance, pseudonymization algorithms, which replace personal identifiers with pseudonyms, have been breached in the past. In the 1990s, MIT student Latanya Sweeney was able to “deanonymize” the medical records of the then governor of Massachusetts, William Weld, by running the records on a database of public electoral rolls of the city using his date of birth and postcode.
Furthermore, while anonymization (or de-identification) models may have proven more effective than the former by coarsening or modifying datasets to remove vital information, both are grossly deficient in protecting personal data in today’s world, where almost every personal detail - age, home address, occupation, location, etc, - are available on smart devices.
Until the HIPAA rule, there were no general standards for data security in healthcare. While the HIPAA security rules safeguard large dataset of patient information in AI-driven platforms and devices, they still do not regulate the use of patient data by non-healthcare providers including genetic testing companies and tech companies that use health data - the non-HIPAA Personal health vendors.
The European Union’s General Data Protection Regulation (GDPR) has, however, come close to helping patients secure their data in digital platforms. The policy takes a more elaborate approach by mandating all organizations - non-health and health providers - that use personal health information to obtain informed consent to collect and use patient data. Consent, here, involves expressly stating how the data will be used, who will have access to them, and for how long the data will be in their possession, a violation of which attracts heavy fines.
Oddly enough, this has been met with a few violations. A case in point is the controversial agreement between the Royal Free Trust (RFT) and Google’s AI-driven app DeepMind, in which the UK’s Information Commissioner's Office (ICO) found that the RFT contravened the GDPR rule.
The RFT provided personal data of over 1.6 million patients in a trial to test an AI algorithm that could detect and diagnose acute kidney injury. The ICO found that the patients were not informed that their data would be a part of the test.
The Way Forward
AI and machine learning have inarguably made great advances in medicine including developing cheaper and more effective disease diagnostics, efficient drug development, and faster clinical trials. Therefore, we need to find a balance between leveraging these advances and protecting personal data.
In the wake of this, several privacy-enhancing technologies (PET) are being developed to allow conscientious usage of personal health data and gain patient’s trust.
Many such systems allow a PHR vendor as an intermediary to verify the identity of an individual and provide the required data to third-party organizations using an electronic token. Using the token, the vendor does not have to disclose the personal identifiers or other personal information of the patient. Also, using the token, the patient can access information about the third party and how they plan to use their health data. The patient can even decline consent to use their data.
For instance, using Data Track, a tool developed as part of the European Union’s Privacy and Identity Management for Europe (PRIME) project, patients can look up a history of their online activities; what information they provided to whom, and what these organizations intended to use these data for.
Furthermore, some systems give patients control over what information they disclose. These systems use a group of software called attribute-based credentials (ABCs). Examples of this software are Microsoft’s UProve and IBM’s Identity Mixer. These ABCs giver users autonomy over their data, allowing them to validate information anonymously and limit the linkability of these data to multiple other users.
Experts also recommend that AI developers secure personal data by decentralizing it, an encryption model called federated learning. Here developers and data consumers receive anonymized health data, with no information traceable to a particular user. This way, only the learning outcome or output data is centralized, leaving out any identifiers or personal information.
For HIPAA, the main law protecting data privacy, a lot is also being done to seal the attendant loopholes in its regulation. For instance, new laws are being created to expand covered entities in HIPAA to include non-health providers that collect and use personal data such as Google and Facebook.
Going forward, the US may also need to pattern its privacy laws after the GDPR, which demands that all companies that collect and use personal data must state in clear terms how the data shall be used, with huge financial penalties to defaulters.
Data privacy issues remain a lingering concern as AI continues to infiltrate healthcare. Nonetheless, continued discussions and solutions are needed to harness the gains of AI in healthcare while ensuring patients’ dignity and privacy are respected.