For the last year or so, a computer program called HateBot has been scouring Twitter, finding and logging "hate speech". The data it has collected from the UK is a potent reminder of the casual racism that still exists in our society.
Wired.co.uk first covered the Hatebase project in April 2013. The project, which is a database of hate speech aimed at helping NGOs spot areas of growing conflict, now involves a bot that scrapes data from Twitter, HateBot, and a program for processing the information, HateBrain.
We asked Hatebase to pull out information from its database about hate speech in the UK. What the data suggests is that the most common hate speech in the UK derive from insults used against the Traveller or Gypsy community.
A brief disclaimer: the rest of this post contains offensive words. While we have avoided using them gratuitously, for the purposes of clarity we have not censored any of them with asterisks. "Gypo" and "pikey" were the first and second most common terms collected by Hatebase in the UK, with the two accounting for 15 percent of all hate speech. The numbers don't reflect whether the terms were tweeted specifically at members of the Traveller or Gypsy community, or used generally as an insult.
Third was "nigger", followed by "yardie" and "cunt". The top ten terms, which include their variants, are below in full.
1 - Gypo: 11%
2 - Pikey: 4%
3 - Nigger: 4%
4 - Yardie: 4%
5 - Cunt: 3%
6 - Charver: 2%
7 - Teuchter: 2%
8 - Curry muncher: 2%
9 - Plastic paddy: 2%
10 - Fag: 2%
Some of the terms are more easily apparent as hate speech, an obvious example being "nigger", whereas it's arguable that the term "cunt" is more commonly used in the UK as a general insult rather than a term of gendered hate speech, which it obviously is.
Others will only be recognisable to people from certain parts of the UK. "Charver" is primarily used in the north-east of England and was the focus of a 2005 BBC Inside Out episode, whereas "teuchter" is a Scottish insult that the Urban Dictionary describes as "basically the Scottish equivalent of an American hick".
No doubt some of these terms will be the subject of controversy about how insulting they actually are, especially when compared with each other, but the aim of HateBrain isn't to make subjective calls about the offensive nature of insults. "We capture any term which broadly categorises a specific group of people based on malignant, qualitative and/or subjective attributes -- particularly if those attributes pertain to ethnicity, nationality, religion, sexuality, disability or class," says Timothy Quinn, Director of Technology at the Sentinel Project, which runs the Hatebase project.
The most common category of hate speech in the UK was that relating to ethnicity, followed by nationality and then class, which is broadly in line with Hatebase's global data.
The explicit purpose of the Hatebase project is to detect the use of hate speech in specific areas as a potential precursor to genocide (hate speech as a dehumanising tool in the run-up to mass slaughter is a well-evidenced phenomenon).
For that reason, HateBrain only picks up geolocated tweets, which account for a tiny minority of the total number of tweets sent on Twitter. This inevitably narrows down the pool of data to people who either aren't aware that their tweets are geolocated or are comfortable with making their location public (that decision could be informed or uninformed). Either way, it's a small subset of Twitter's users, who are in turn a small subset of the total population.
Furthermore, HateBrain is designed to search for terms that have already been flagged or inputted on Hatebase.org. As a result, there's the possibility it will miss certain terms.
Finally, it isn't a database of insults -- terms that don't denigrate specific groups of people because of their "ethnicity, nationality, religion, sexuality, disability or class" don't qualify.
So the data is limited and on its own doesn't give a comprehensive picture of hate speech in UK -- its creators don't claim that it does. "Although we're capturing a whole lot of data, the value is for NGOs to do more targeted searches with other data points," says Quinn.
The database is intended to be just one of many data sources NGOs can use to make decisions about where to allocate their resources, he adds: "[At the Sentinel Project] we've got to figure out where we can afford to put a team. We're making those decisions on data from the media, on Hatebase data and other data points."
The UK's data, he says, doesn't show an uptick in hate speech targeting a particular community. Though not downplaying the seriousness of any hate speech, he says it's broadly similar to what you see in Canada, where the project is based, and the US. The tool is aimed at preventing genocide and the UK is thankfully not at risk of that.
But the presence of "gypo" and "pikey", terms that are used to suggest stealing or poverty and that denigrate Gypsy and Traveller communities, at the very top of the hate speech charts for the UK is at least a reminder of what has been called our last "socially acceptable form of racism".
This article was originally published by WIRED UK