When Morgan Marquis-Boire heard about the Internet Census 2012, he was excited.
Marquis-Boire, a Google engineer by day, spends his spare time looking for state-sponsored spyware, and here was something new that he could use. The Internet Census was the result of a massive and unprecedented internet scan that compiled data on about 1.3 billion Internet Protocol addresses.
As quickly as possible, he downloaded the Census's 9 terabytes of data and discovered something that nobody had seen before. FinFisher, a spyware program that had been used to spy on dissidents from Bahrain and Ethiopia, was being used in many more countries than people had previously realized.
Marquis-Boire had done his own Internet mapping in the past, and found FinFisher running in 25 countries. But mapping the internet is a bit like trying to map out a city from the tops of very tall buildings: you may eventually eventually get a pretty good overall picture of things, but it's easy to miss out on some of the details -- a laneway here, a little square there.
Because the Internet Census had so many different vantage points -- 420,000 in total -- it offered a unique look at the computers on many different networks. And it showed that FinFisher servers were running in 11 new countries including Austria, Pakistan and South Africa.
But there was a problem. The Internet Census was illegal. Somebody -- nobody knows exactly who -- had built a network of hacked computers called the Carna botnet to generate the data. According to a remarkable academic paper the hacker published with his census, he had taken steps to minimize his botnet's harm. He installed it mostly on routers and set-top boxes and says he took steps to make sure that it didn't hog system resources.
In his paper describing how the Carna botnet worked, the anonymous researcher said that one of his guiding principals was: "Be Nice".
This Internet Census data was great because it probed corners of the internet that other network mapping projects hadn't seen. But it was also tainted because Carna was installed on hundreds of thousands of machines without the consent of the people who actually owned these machines.
And today, not everyone is sure that the data it compiled should be used, at least in the academic community of researchers who map out the internet. "It seems like there's a lot of conflict within the community about whether it's right to use this data because it was gathered in a way that was unethical," says Phillipa Gill, a postdoctoral fellow at The Citizen Lab, a University of Toronto sponsored effort to track state-sponsored uses of malicious software.
This October internet-mapping academics will convene for their annual Internet Measurement Conference in Barcelona. And right now, it's not clear whether they'll accept papers that are based on the Internet Census data. Papers do have to conform to ethical standards, said Krishna Gummadi, one of the conference's program chairs, but it's up to the conference's "program committee to decide whether specific papers meet the expected ethical standards," he said via e-mail.
Nobody on the conference's program committee would say whether using data compiled by a network of hacked computers would violate these standards. That will likely be discussed when the conference takes place says, KC Claffy, principal investigator for the distributed Cooperative Association for Internet Data Analysis (CAIDA) at the University of California's San Diego Supercomputer Center
As more and more of our personal data moves on to the Internet, these ethical questions are becoming more important, Claffy says. "Here's the big question," she says. "Is an IP address private information. Can it identify a person? And if it can and you're profiling the behavior of many IP addresses, you may be doing human subject research. You may need informed consent for that."
Internet researchers are still figuring this stuff out, Gill says. "As a community, network measurement hasn't been very good about articulating our norms in terms of ethics."
The medical community has already thought deeply about these questions. In the U.S., this came in the wake of the Stanford Prison Experiment and the Tuskegee syphilis experiment where U.S. Public Health Service researchers didn't tell subjects that they were infected with syphilis.
After the details of Tuskegee came to light, the federal government created a commission that laid down a set of basic ethical principals that were to govern medical research. These ideas, which are still used as guidelines in medical experiments today were outlined in a 1979 document called the Belmont Report.
Subjects needed to know what they were signing up for, and give their informed consent to experiments; they couldn't be coerced into experiments; and they should be selected in a fair way.
Today, human subject experiments must be reviewed by ethics committees called Institutional Review Boards, before they qualify for federal funding. "If you're going to do medical money and you're getting money from the NIH, [National Institutes of Health] it's absolutely required," says Claffy. "Maybe in 10 years or 20 years, there will be similar requirements for computer science research."
In fact, the U.S. Department of Homeland Security has already bankrolled some study of these ethical questions. In 2012, Claffy and a group of academics created their own version of the Belmont report. Called the Menlo Report, it's a first step toward spelling out the ethical principals that should govern this type of Internet research. The DHS did not respond to requests for comment on this story.
"Today there is not a hard and fast rule that would make the Carna botnet data unacceptable," says John Heidermann, an academic researcher who has been building maps of the internet since 2006.
But Heidermann has another issue with Carna. Because the data was compiled anonymously on a hacked network of computers, it's hard to figure out if there are flaws in the data. Until this week, for example, nobody had even verified whether the Carna botnet even existed (it did).
"I would like to know how accurate it is," says Heidermann.
As for the spyware hunter, Marquis-Boire, he sees no reason to mistrust the data. And he isn't troubled by any ethical taint that may accompany the Carna botnet. The data was available; why wouldn't he use it? "I consider this more like rogue academia rather than criminal activity," he says.