Can machine learning reinstate truth?

“Post Truth” is supposedly the era in which we live. According to Oxford dictionary, it means:

relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. ‘In this era of post-truth politics, it’s easy to cherry-pick data and come to whatever conclusion you desire

How did we end up in this sad situation, 400 years after Enlightenment and the Scientific Revolution? What does the Internet has to do with it, and could software in general and Machine Learning in particular remedy the problems it carries? This article will suggest technology could indeed cure its own failings, by encouraging a structured deep data driven discussions, that will follow and learn the human participants while implementing basic scientific evidence-based methods.

But first we have to diagnose the roots of Post-Truth. The Internet is no doubt the main technology that shaped how humans communicate in the last two decades. It created an unbelievable ability for humans to be in touch with each other instantly and continuously. However, the Internet created a huge virtual room that allows billions of people to express themselves. The result is an overwhelming cacophony which make it impossible for one to judge what voices are reliable and what are fake. This immensely important democratic free expression resulted in major problems to the old media outlets, and major devaluation of experts. As everybody knows, one can now find any view whatsoever expressed by someone on the internet. Enter any opinion you’d like into Google and you are bound to find someone to support it, whether it has to do with vaccination influence on autism, UFOs, or any political view, extreme as it may be.

Current communication tools do not help make the situation any better. From chat rooms and forums, to WhatsApp, Facebook and Slack there is no tool to evaluate the value of the opinions expressed, other then simple voting. Facebook does not have any such pretension: it is designed to let people have valuable past time with friends. Most communication tools are synchronic and conversational in nature, which is not the best way to handle a deep thoughtful discussion. Hence, even if one would like to have a serious rational discussion with friends or experts, it will be very limited if run over the Internet. It will either result with TL;DR or with shallow Twitter-like statements. The only available alternative are scientific online journals, but those are directed to a very limited audience and are heavily and slowly edited and reviewed. (The only outlier is Wikipedia, but as its name suggests Wikipedia is an encyclopedia, i.e. designed mainly for mature finalised knowledge rather than debating new knowledge and complex problems).

 

 

To understand how good rational discussion should be conducted, we do have to look back to the Scientific Revolution. What science is all about is deriving reliable knowledge out of data points collected in laboratories. It requires collaboration and critical thinking within a group. In that sense, it should remind us of current opportunities emerging from “Big Data” and Internet of Things. Our world is becoming widely crowded with sensors that automatically and quickly upload information to the cloud, where it is available to the general public or to business enterprises. In one of the episodes of “Billions”, the Hedge Fund manager, Axelrod, tells his people “everybody has the data, we just have better insights.” The biggest big data challenge is to get to the right business insights based on the available raw data. This is not very different from the challenge each astronomer had in the last centuries: They were faced with large amount of observations and had to derive the best laws of nature out of this enormous body of data points. How did they do it? What made them so successful?

Science, basically, is based on rather a simple method: Scientists start with hypotheses or questions they ask about the world (e.g. How does the planets move? What make water boil at a certain point?), then they try to collect data about the phenomena. When they have collected enough data they can either prove or disprove the hypothesis, and then they publish their results and open them for a critical peer review. This scientific methodology is the essence of enlightenment. This is what allowed reason to triumph old reliances on Church and Canonical texts.

 

Galileo showed the Doge of Venice how to use the telescope
Galileo showed the Doge of Venice how to use the telescope

Thus, what we are still missing from a technological stand point, in order to re-cover Truth, is software that will structure and streamline a data-driven discussion around questions, hypotheses and critical peer review. This Big Data era is a huge opportunity in terms of the amounts of accessible data points we have at our disposal.

But it will not suffice as long as we are destined to shout in the over cluttered virtual room called the Internet. We have to build software that will structure each discussion: It must start with an hypothesis (possible answer to the question), then present data points that are supposed to support or falsify the hypothesis and finally allow a structured debate to take place. Machine learning algorithms can play a major role in this streamlined scientific discussion: Instead of applying Machine Learning to raw data alone (to find interesting correlations or anomalies in the data), we can actually build an algorithm that will study the different contributions from different participants. Along the time it will allow us, by implementing Bayesian techniques and Crowd Mining elements, to score each of the participants for their contributions. Hence, a dynamic level of confidence will be assigned to each of the hypotheses on each topic, based on a smart peer review combined with data analytics. Truth will constitute those hypotheses that are assigned with the highest level of confidence, be they scientific, business or economic views. As with any other scientific Truth, it will always be open for new paradigms and new incoming data that will require critical thinking about our old truths and suggesting new theories about business and economy.

This kind of software will allow much better discussions, and will constitute a layer that could stop the Internet cacophony. Conjectures could be measure for their Truth, and progress could be systematically made. Combining the opportunities of Big Data with such software may accelerate Humanity progress dramatically. It may amount to a software-driven second scientific revolution.

Share