For very many years, crowd sourcing is being explored in academia and industry as an interesting vehicle to get to the truth. Apparently, when many people are asked to their opinion on a factual question, they may find it hard to get to the right answer individually, but as a collective they do get the right answer. The typical example is guessing the weight of an ox, an example known from an observation made in 1906 by statistician Francis Galton. Galton noticed that although individuals find it hard to guess the weight of a bull just by looking at it, the average of all the crowd’s guesses is surprisingly close to the right answer. The major book presenting crowd sourcing in more of a popular way was “The Wisdom of Crowds” by James Surowiecki. The book brought to the fore the possibility of using the masses together in kind of a collaborative mind, to find solutions to problems individuals could not solve.
A more elaborate approach to how humans predict events in a community was presented in “Superforecasting” by Philip Tetlock and Dan Gardner. They present an experiment, conducted with the US intelligence community: 3000 participants, not all of whom were experts, were asked for their predictions on world events. Each participant had to assign a probability to the event (e.g. “Will Trump be reelected in 2020?”), and could change it along the time according to new facts or new ideas emerging within the group she belonged to (some participants worked individually and some were allocated to groups). After the fact, Tetlock and Gardner looked into the best predictors, the super forecasters, and tried to understand what made them better than others. Thus, Tetlock and Gardner represent a more advanced take on crowd sourcing, which is not based on democracy but rather on meritocracy of participants.
A financial application of the same idea is known as “prediction markets”. This methodology allows the crowd to bet on future events. By studying the patterns of these bets as a whole, a good prediction of the market is emerging.
Crowd sourcing had demonstrated in very many ways that communities can collaborate to get answers that are very close to the Truth. But what are the Epistemological foundations that underlie this capacity? Epistemology is the philosophical profession that looks into the ways in which we Humans get to know the world. Its origins are in ancient Greece with the writings of Plato, but the Western world had learned a lot on ways to get to know the world during the enlightenment and the scientific revolution. I would like to share some basic ideas and notions from Epistemology to try and get a deeper understanding of crowd sourcing, and point to possible paths to advance our understanding and use of it by way of using modern AI techniques.
Epistemology had traditionally offered a taxonomy of four ways in which we Humans can come to know things about the world: Perception, Memory, Reasoning and Testimony. Perception has to do with sensible things, that is with things we perceive with our senses, e.g. that the sun is shining or that I hear a dog nearby me. Reasoning is our ability to deduce one thing from another, e.g. that if the sun is shining this morning, I do not need my umbrella today. Memory is essential for us to be able to promote our knowledge, otherwise all we have is “atoms” of present knowledge that will not be carried into the future, e.g. if I had no memory I would not know what can and cannot be done when the sun is shining in the morning. Testimony is the way in which we exchange knowledge between humans, and it much exceeds the mundane meaning of testimony in the courts. Naturally, much of our knowledge we did not perceive ourselves. Others witness it to us, whether they are our parents and teachers in childhood or our professors and mentors in our adult life. Humanity would most probably not progress if each of us had to experience everything ourselves to gain the knowledge others already acquired.
Traditionally, philosophers would treat Perception and Reasoning as the primary sources of knowledge and Testimony as a secondary one. Both Memory and Testimony were thought to be “non generative” sources of knowledge, i.e. ways to preserve and transfer knowledge that was acquired through the senses and reasoning. The Scientific Revolution was based on the principles of Enlightenment, according to which the individual should get to know the world through her own mind and senses, getting rid of the old authority of the Bible and the Priests (Thus actually deleting Devine Revelation as a possible source of knowledge). The individual hero, from Galileo to Einstein, was thought of as an individual who had a genius capacity to perceive and think of the world in unique and innovative ways. Science, as pedestrians would portray it, is first and foremost about collecting data about the world in a lab through our senses, and then making this data into a theory about the world by way of using our minds to generalise what we perceived. Again, peer review (which is akin to testimony) was always considered to be necessary but only as a secondary means to ascertain there was no mistake in the way experiments were conducted or in the way proofs and laws were deduced.
But in the last decades a new wave of philosophical interest in being paid to Testimony. More philosophers and sociologists realise that Testimony or communities of knowledge play a crucial role in how we get to know things. The quantity of statements I believe to know which I did not check myself is huge. I am part of very many communities (in high tech, in philosophy, in my Jewish life, in left-wing Israeli politics) with which I share knowledge and even commitments for a certain knowledge base. A major contributors to this new study of testimony are Coady’s “Testimony”, Kusch in his “Knowledge by Agreement”, and Lehrer in his “Rational Consensus in Science and Society”.
How do these different concepts of knowledge acquisition play in the field of Artificial Intelligence and Big Data? Certainly big data and data science follow, as the name suggest, the scientific glory. “Big data era” is nothing more than the statement that present day sensors and storage technologies (e.g. the volume of it and its accessibility via the cloud and the internet) allow us to study data like never before. An oceanologist who worked 40 years ago, could not really study the ocean, so she had to have a lab that simulate the ocean. Every once in a while she could make a trip to the ocean to calibrate the results, but she could never study the ocean in real time. But today, we have extremely cheap and small sensors we could just throw into the ocean and allow them to transmit their registry (e.g. temperatures, wind, pressure) via satellites directly to the cloud where it will be available in real time to the whole community studying the oceans. In digital healthcare we can put sensors into Apple watch and measure each person’s daily steps, heartbeat, temperature and many more variables. It could all be stored online and studied by medical doctors and statisticians to better our understanding of illnesses. AI and Machine Learning techniques are applied to raw data to find correlations and anomalies so we can explore the immense quantities of available data in faster and better ways. Thus, big data is a modern application of the scientific method, giving priority to raw data and its mathematical patterns, hence to Perception and Reasoning.
What about Testimony? In the world of AI and big data its almost non existent. The fanatics of AI even claim the human mind is unnecessary any more. The mind is a confused machine with many sensory limitations (e.g. we humans do not see the infra red) and many problems in its reasoning (e.g. the psychological biases we suffer). Not to mention the limitations of the human memory. Computers far exceed our memory for many years already. With unsupervised machine learning humans are not even needed for the lame task of tagging examples to help the machines learn.
Crowd sourcing seems to be legging behind both technologically and mathematically. Most applications use simple methods of averages, or even manual methods to study the wisdom of the crowds. Companies like “Wikistrat” employ hundreds of analysts together to successfully predict future strategic events, but do so via a human editor and mediator who have to go through all the comments and devise a document to summarise the results. “Superforecasting” is an innovative approach but the study of why did the super forecasters succeed in their predictions was conducted manually by the authors.
Can we advance the basic idea behind testimony and communal knowledge by way of using machine learning techniques? I want to suggest we can. That’s what we do in Epistema. We are following human discussions in a way that allows us to assign different features to each of the participants. We follow their arguments on different topics, and study the smart patterns of agreements and disagreements amongst their peers in the relevant community. By applying AI algorithms to the communal arguments, rather than the raw data, we manage to assign a probability to different answers to a specific question. This application of AI to human collectives serve to accelerate human thinking in communities. We found a way to apply the most modern mathematical techniques to Testimony in addition to the way it is being implemented to Perception and Reasoning by most of the data scientists. Surprisingly enough, it seems data could be secondary to testimony: by studying the smart patterns of agreements and disagreements along the time amongst communities of experts we can say quite a lot about the world without actually looking into the details of the data sets. Even data sets can be examined through the community’s use and reaction to each specific set. Clearly enough, a combination of “orthodox” data science on data sets with Epistema’s way of following the experts discussions will result with even better predictions. In any case, our unique ability to show progress by studying expert sourcing in a modern way proves that the old philosophical view of Testimony as secondary to Perception was blatantly wrong. Furthermore, it proves it is too early to say farewell to the human mind. Good results can be achieved by humans collaborating together and sharing knowledge, side by side with the new statistical results we can achieve from the benefits of “big data”.