A total of 2.5 million articles from 498 different English-language online news outlets have been analysed with AI algorithms in a study of online newspapers.
The study, led by a team from the University of Bristol's Intelligent Systems Laboratory and the School of Journalism at Cardiff University, used data mining, machine learning and natural language processing techniques to analyse the news content of mainstream media for a period of ten months. The 2.5 million articles were automatically annotated into 15 topic areas (such as "Sports" and "Politics"), and scored on their readability, their linguistic subjectivity and for gender imbalances.
An article's readability was assessed using the Flesch Reading Ease Test, which gave the text a score between 0 to 100; the higher the score, the "easier" it was to read. The topics of Sports, Arts and Fashion were found to be the most readable, while Politics, Environment and Science proved the hardest to read. (Head to the gallery to quickly see some results from the study.)
The readability scores of the articles also indicated that online tabloid newspapers were more readable than broadsheets, and they were more likely to employ sentimental language than their highbrow media fellows. In a case study of 15 leading newspapers from the US and the UK, The Sun was found to be the easiest to read (with a readability score similar CBBC's <span class="Apple-style-span">Newsround), while <span class="Apple-style-span">The Guardian was scored as the least readable.
The percentage of adjectives expressing a judgment was also measured for each article, giving a quantity the study viewed as "linguistic subjectivity". Perhaps unsurprisingly, Fashion and Art were the most subjective topics, given their reliance on expressive adjectives; it would be quite a challenge to write a piece on a new art exhibition or fashion collection without use of mildly subjective language. Business, Politics and Elections were the least overtly subjective topics.
Those articles that were more readable also tended to be more subjective -- a point that the study sought to draw attention to, without going so far as making any conclusions of its own: "...for example, the failure of many citizens to engage with political news (Lewis, 2005) or with environmental problems like climate change may, in part, reflect the fact that these news topics tend to be written in less readable language than most others. Indeed, the failure of many citizens (notably those in countries like the US and the UK) to understand what climate change is or the scale of the scientific consensus and alarm about it (Lewis and Boyce, 2009) may not simply be a product of efforts to make it appear controversial."
The research also indicated that in the ten months it carried out its study, men dominated the media content, with Sport and Finance articles the most male biased -- men were mentioned in sport news eight times more often than women. Fashion and Arts were the least biased, with Fashion articles being one the few topics featuring equal proportions of men and women.
The research paper emphasised that its algorithm-generated findings remained "suggestive" rather than conclusive. It proposed that the technology it was exhibiting might complement the skills of human scholars, allowing future investigations into media trends and bias to be both more ambitious and more comprehensive in scale.
Coauthor of the paper Nello Cristianini, Professor of Artificial Intelligence at the University of Bristol, said: "The automation of many tasks in news content analysis will not replace the human judgement needed for fine-grained, qualitative forms of analysis, but it allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis."
Justin Lewis, Head of the School of Journalism, Media and Cultural Studies at Cardiff, added: "Even some of the more predictable findings give us pause for thought. The extent to which news is male dominated shows how far we are from gender equity across most areas of public life. The fact that articles about politics are the least readable might also explain widespread public disengagement."
This article was originally published by WIRED UK