How are you? Bit of a cold? Perhaps something worse? Silicon Valley wants to know – big tech is coming for our health data.
The past few weeks has seen headlines about Google buying up 50 million Americans' health records, Amazon being handed NHS data for free, and NHS England execs pricing up our data to sell it off. And we're not happy about it: a recent poll from YouGov suggests 80 per cent of people are against tech giants hoovering up such data, not least with the NHS under threat of further privatisation. That's been true for two decades, says Phil Booth, coordinator at activist group MedConfidential. "In all the polling and research I've seen for 20 years, what's consistent is the discomfort people have with commercial exploitation of their health data."
But while we're generally happy with academic or publicly funded research making use of our medical data, much of the R&D around AI in medicine is happening at private companies, notably large American tech firms. Careful use of health data could save lives, cut costs of delivering health care and even become a nice little earner for the NHS – indeed, an EY analysis that's frequently touted by the government suggests opening up the vaults could earn the underfunded public health organisation as much as £9.6 billion annually. But the tradeoffs could be our privacy, letting big tech further monetise medicine, and locking hospitals and clinics into expensive tech systems that will cost us more in the long run.
The risks depend on the aims of the company receiving our data. So what are they up to? Google last month hit the headlines for gobbling up the health data of 50m Americans via its Project Nightingale deal with US health-care provider Ascension, which is now under investigation by the US Department of Health and Human Services, though Google has received a letter from that organisation, it is not under formal investigation. (Google stressed that with the Ascension project, no patient data leaves Ascension, nor are machine learning systems being trained to be used elsewhere).
It also recently acquired FitBit. Here in the UK, Google-owned DeepMind was slapped on the wrist by data regulators for failing to get patient permission for a project with the Royal Free London NHS Trust, but has since started similar projects – albeit with a closer watch on patient permission – alongside its investments via Verily, Alphabet's health division. To an extent, this isn't new. Google has been trying – and failing – to build an e-health platform for years. But the end goal is no longer simply gathering up our data into a Google-run platform in Google's cloud, though there's plenty of money to be had there, but collecting data sets to build AI and machine-learning systems.
Google isn't alone, as even Facebook has slowly been slipping into healthcare. Its AI department worked with radiologists at New York University to teach its machine learning to read MRI scans, and last year asked American hospitals to share patient data to help spot vulnerable people by linking those records with Facebook profiles; that project was halted before data traded hands. In October, the social network unveiled a tool called "Preventative Health" to send reminders to Americans to get their cholesterol tested or get a flu shot. Facebook has said such data isn't for marketing or advertising, but only for promotion of health and research.
And then there's Apple, which rather than buy in your health data from hospitals, is asking patients to get it for them. That happens via the Apple Health system on its iPhones and Watch, tracking your footsteps, sleep and more, and as of 2018, pulling in their own health records via HealthKit's API, in the US at least. "Apple is an interesting outsider," says Eerke Boiten, interim head of the School of Computer Science and Informatics at De Montfort University. "Rather than Apple building up their own data repository, they actively encourage their users to record health – or more accurately: health-related – data on their own devices." Boiten praises the privacy, accountability and transparency, but says there are concerns with how the information could be shared with government agencies. "Chinese iPhone users should probably not opt in," he warns.
Amazon's use of NHS data is a little different. The company is adding information used on the NHS website to its Echo voice-assistant system, so people can ask "Alexa" for information about a condition or symptom. That data is all openly available now online, so why the fuss? Some of it is sensitivity to big tech gobbling up NHS data for free, says Booth. "People just went a bit nuts," he says.
There are real concerns that the association with the NHS will not only encourage more people to opt into Amazon's particular brand of surveillance capitalism – helped by health secretary Matt Hancock's enthusiasm for the idea – but also train us humans to share our medical data with Amazon – of course, we're already doing that every time we use Google to search for a symptom. But in the future, warns Booth, the data gleaned from asking Alexa about health concerns could be used to develop an AI symptom checker without our consent. "This is pushing people towards training Amazon's AI using EU citizens' data," he says.
Last week, while Amazon dominated stories about NHS data, another announcement slipped out: data gathered from GPs surgeries had been sold via the Clinical Practice Research Datalink for £10m last year to American pharmaceutical companies to develop drugs. Booth said it's no surprise drug makers want such data – it helps create safe drugs more cheaply, which is good for us all – but said such efforts need to be consensual, safe and transparent. We should find out this is happening from our GP, not the Observer newspaper..
In short, US companies want your health data, but they don't all want it for the same reasons. But why us, here in the UK? There are other data sets, including the ones Google is already buying up in the US, but the NHS is a target because of its size and quality. "The NHS has the biggest somewhat homogeneous and reasonably quality longitudinal health database in the world," says Boiten. "They are always talking about combining all the data in their systems into a single one." The first attempt to create such a "data lake" was with the care.data system, which was so heavily criticised, particularly on consent, that it was dropped after a government review into the sharing of patient identifiable information known as the Caldicott Report. And now they're trying again, according to a recent leak seen by tech website The Register, which reports that it’s seen slides and files revealing discussions are already underway to sell access to a massive database of UK patient data.
The main defense to health data sharing is that the information is anonymised – but whether that's truly possible has long been widely doubt. Consider Facebook's own proposal to find and get help for vulnerable people. Under those plans, which were halted when the Cambridge Analytica scandal broke, Facebook wanted anonymised health data including demographics from hospitals, which it would compare to its own users – if it found someone who matched, they'd be profiled to see if they needed extra health support or were vulnerable. In other words, the entire point was to de-anonymise the data.
The more data that's handed over, the easier it is to de-anonymise, which is one reason to limit what's doled out. "If appropriately aggregated and anonymised, there should in principle be a low risk of re-identification for data sets which have narrow scope and focus (such as medical images and diagnostic labels)," says Christopher Yau, a fellow at the Turing Institute. Wider data sets used to mine for patterns and relationships are much higher-risk, he explains: "For example, if an organisation has identifiable geolocation data (such as from mobile phones), it maybe possible to cross-link this with hospital records to narrow down the identity of the patients." If you're on Android, for example, Google can know that you're at the same location as a hospital on a given day. And if it's de-anonymised, it can be linked to other profiles, be it Google, Facebook or a data broker, and misused for online marketing, insurance premiums and so on.
There are simple steps the NHS or other health-care providers could take to avoid this linking, such as by limiting the information being handed over, tightly regulating and auditing future uses, and even letting them see the data but not take it away with them. "By restricting access conditions, so don’t share the data but let the researchers come to a 'fume cupboard' site and let them take away only results that have acceptably low privacy implications," Boiten says. "Census data works like that already."
There are examples of health data sharing done well. Yau points to the UK Biobank, while Booth looks to the Hundred Thousand Genome Project. "They're going about it in a much more disciplined, rigorous, way — it's consensual, safe and transparent," Booth says. But the NHS has different challenges, including a fragmented structure that makes a single policy difficult, as well as limited resources. When a big tech company knocks on the NHS's door, it comes with a team of lawyers to write those data-sharing contracts.
And that's why transparency is so important. Let us see the contracts, says Booth – and that's on both sides of any deal. He points to the Amazon arrangement, saying the NHS should publish the data protection impact assessment it must have done before agreeing such a contract. "We want you to be able to have shown us that you have looked at this entire thing in the round," he says. "Show us that you understand a commercial proposition when it is put to you."
And the commercial side is a problem for the NHS in the long run. These machine learning algorithms and other healthcare systems are being trained on our data and trialed in our hospitals. But so far we don't own the end result. "Analysis of so much data, with statistical or machine learning methods, may bring up all kinds of correlations that may lead to useful treatments and diagnosis," says Boiten. "The identified patterns would be owned by whoever discovers them, not normally by whoever provided the data originally."
That could be set to change, thanks to a glimmer of hope in the Register documents, which reveal three potential business models: giving away the data for free in exchange for help curating it or developing systems; charging an upfront fee for access to the data; or shared IP, equity or profit. The latter could pay off as a longer-term gamble. If real innovation is owned by Silicon Valley, we'll be paying through the nose to access the technologies we helped build.
Digital Society is a digital magazine exploring how technology is changing society. It's produced as a publishing partnership with Vontobel, but all content is editorially independent. Visit Vontobel Impact for more stories on how technology is shaping the future of society.
Updated December 20, 2019 20:17BST: Google's relationship with healthcare provider Ascension has been clarified
This article was originally published by WIRED UK