NLP as a Means to Detect Fake News

Artificial intelligence to combat fake news. Collaboration with UPM, UOV and baobab soluciones, with the support of RIS3, for research and development of a public tool for fake news detection.

What is fake news? Historically, there has been news and information aimed at communicating a fact, knowledge or idea about something to the public. One of their characteristics, which is assumed to be intrinsic, is veracity, i.e., that they conform to the truth or reality of what they want to disseminate. However, this is not always the case, either due to ignorance on the part of the sender, errors in the information, the use of satire or a deliberate attempt to spread the wrong message in pursuit of one’s interests, giving rise to what is now known as fake news. Fake news is a term that conceptualises the spread of false news that forms a dangerous disinformation network.

Fake news has existed for thousands of years as a feature of human communication and as a form of disinformation and advertising. For example, during Roman times, when Mark Antony met Cleopatra, Octavian spread a false propaganda campaign against him, later becoming Octavian Augustus, the first Roman imperator thanks, in part, to such disinformation tactics. The defamations that spread against Mark Antony consisted of short phrases and words that had to reach through a channel accessible to a greater or lesser extent to all people. One of these channels was a coin, precisely the words engraved on them next to the image of Mark Antony. It might be an ingenious and effective method, as well as old-fashioned, but what if we imagine a text of up to 140 characters instead of a word on a coin, a fake montage instead of the image of a Roman politician and military man, a bluebird and an “RT” button?

Today, the internet and social media have led to the proliferation of fake news, allowing users to be both producers and consumers simultaneously, facilitating the dissemination of false content. As a result, false information can be spread to thousands of users in a matter of seconds at the click of a button, creating a circle from which it is difficult to break out and identify the information as false.

Social media encourage the consumption of content in line with a particular user’s ideas or thoughts, a phenomenon largely accentuated (and sometimes ignored) by algorithms that facilitate access to content that is more relevant to each subscriber. the information it consumes is conditional on. This phenomenon is known as information bias and consists of the tendency to favour the search for and interpretation of information that confirms one’s own beliefs.

In addition, this happens in a context called post-truth, where it is established that objective data are less important than the opinions and emotions they generate in the public, and thus distort reality. A current example is the employment of this type of manoeuvre for publicity purposes in presidential campaigns, bombarding users who are more sensitive to specific messages with manipulated information.

Such information is present in the broadest range of fields, from politics to medicine to economics. Therefore, given the clear danger posed by the fake news society, aggravated by conflicts as recent as the COVID-19 pandemic, where there was a barrage of contradictory and confusing news for the population, more and more organisations are seeking to paralyse and combat the spread of these diseases. As expected in this era, the greatest ally will be technology, particularly Artificial Intelligence.

Artificial Intelligence to Fight Fake News

Given the difficulty of dealing with fake news with any means other than the information available and that, as mentioned above, it is a vicious circle where it becomes a difficult task to verify or not a news item, Artificial Intelligence arises. This type of knowledge is the combination of algorithms whose purpose is to replicate and develop clever capacities and their implicit processes present in humans (task performance, rational logical thinking, behaviour, etc.), which are already commonly applied to automate activities such as decision making, problem-solving and learning.

There are many areas of application of the large conglomerate of different algorithms that make up this knowledge base, one of which is the so-called natural language processing (PLN o NLP natural language processing) will provide solutions to the problem at hand. This field of knowledge is responsible for investigating how machines communicate with people using the natural languages we know (for example, English or Spanish).

Some of the components of this field are the following:

  • Morphological or lexical analysis: internal analysis of the words that make up a sentence to extract information such as lexical meaning or syntactic category
  • Syntactic analysis: sentence structure analysis.
  • Semantic analysis: once the morphosyntactic evaluation has finished, the meaning of the composition is analysed.
  • Pragmatic analysis: Analysis of the context for the complete interpretation (e.g. there may be a metaphoric context that provides another reading).

Depending on the application, i.e. the problem to be solved, we can use all or some of the analyses described above to address the threat of fake news. However, several algorithms and models are applied to this problem so that they can discern, through text analysis, whether a news item is false and even give it a certain percentage of being fake.

On the other hand, and as a curiosity, although Artificial Intelligence is capable of checking whether something is true or not, there are models capable of composing text as a person would do (as explained in the article, “boom NLP” ) when producing an informative text or even a story… so in this case, it is better not to trust the machines (or to check with the models we do trust whether these texts are true or not!).

Collaboration on the development of a fake news detection tool

As mentioned above, fake news is present in many areas of the World around us. One of them, and a significant one, is medicine and health. It is well known that what happened in 2020 with the rise of COVID-19, either because of ignorance or because of the interests of certain media not to alert the population, caused a wave of disinformation that confused everyone.

The existence of false information in such sensitive areas as health is dangerous, considering that in this case, people’s lives are at stake and at the mercy of what is wrong. That is why it makes sense to opt for tools based on technological advances that are aseptic in evaluating and validating the content that is published.

For this reason, the Polytechnic University of Madrid, the University of Oviedo and baobab soluciones will collaborate on a project to develop a tool for detecting fake news related to cancer. Within medical information, cancer is a recurrent subject of searches by the population due to its importance to the public and the consumption of erroneous data can lead to serious health problems, so the most reliable information possible is required.

This project is also financially supported by the Region of Madrid through the subsidy line RIS3The project aims to make more effective use of the region’s existing knowledge resources to put them at the service of the productive fabric and increase the number of innovative companies.

In this way, by combining the forces of all the organisations mentioned above, we will be able to develop a tool which, once made available to the general public, allows checking the veracity of news and information related to cancer, as well as assessing the quality of the different sources of medical and health information available on the web.

More specifically, this tool will be able to do the following:

  • Manage unstructured information from www pages, blogs, PDF documents, doc(x)], ppt(x), etc., in an integrated way
  • Development of Named Entity Recognition (NER) models to facilitate the correct semantic classification of sentences (in this case the recognition of medical-related entities becomes essential).
  • Development of a lightweight document ingestion environment and searches for words, lemmas or semantic elements of sentences that allow complex relationships between entities to establish.
  • Development of a knowledge base that will make it possible to identify the scientific publications produced, establishing a frame of reference.

Still, the application, scheduled for completion in 2023, aims to be a reference tool to help organisations and individuals deal with the most reliable information on the web and avoid false information that could harm something as precious as people’s health.

baobab soluciones develops advanced analytics applications for companies but also has the mission to use these techniques to bring improvements to society, such as this application or others related to the health sector (improving the use of operating theatres, etc.).