Wikipedia is one of the first websites people visit when looking for the latest information on a person, product, object, or anything else. One of the best things about Wikipedia is that it arguably covers almost every topic and is often the first result on Google Search. However, this very nature of the platform also becomes its challenge.
Created by Jimmy Wales and Larry Sanger, Wikipedia is a free multi-lingual online encyclopedia. It is written and maintained by a community of volunteers through open collaboration and a wiki-based editing system. This means, a page can be edited by a number of people.
This process has overtime led to a number of inaccuracies in the Wikipedia entries. Now, an unexpected company wants to solve this problem faced by the free online encyclopedia. Social Media giant Meta (formerly Facebook) wants to use artificial intelligence (AI) to help make Wikipedia entries more accurate. Here is how it aims to achieve that.
AI to solve a resource problem
Wikipedia’s problem is one that is directly linked to the resource. As a crowdsourced online encyclopedia, it requires more research and corroborations. While volunteers are able to double check footnotes on Wikipedia’s pages, the site continues to add more than 17,000 new articles and thus is unable to solve the inaccuracy problem.
In a blog post, Meta says automated tools are already capable of identifying gibberish or statements lacking citations. The tech company wants to help human editors determine whether a source “actually backs up a claim.”
By using the depth of understanding and analysis of an AI system, Meta wants to solve this challenge. The social media giant claims to have developed the first model capable of automatically scanning hundreds of thousands of citations at once. The model, built on top of Meta AI’s research and advancements, can check whether a claim is backed up by the source.
An AI-based Wikipedia citation recommender
Facebook parent Meta calls its AI-based Wikipedia citation recommender, Side, and has even open-sourced it. For making this AI model work, Meta says it created a new dataset with 134 million public webpages. The company says the dataset is “an order of magnitude larger and significantly more intricate than ever used for this sort of research.”
Meta did not completely automate the process of checking citations for Wikipedia pages. The current process is human editors looking at every citation, often spending their valuable time sifting through thousands of properly cited statements.
With Side, Meta says the AI-based Wikipedia citation recommender will bring questionable citations to the attention of human editors. The model will also suggest a “more applicable source” for a citation deemed to be irrelevant and can even point to a specific passage supporting the claim.
“This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources,” says Shani Evenstein Sigalov, a researcher at Tel Aviv University and long-time Wikimedian.
Sigalov adds, “I look forward to continued improvements in this area, especially as machine learning tools are able to provide more customised citations and multilingual options to serve our Wikimedia communities across more than 300 languages.”
Learning from Wikipedia
Meta says the biggest challenge for its AI model came in the form of needing to check millions of citations and potential evidence documents. Even more challenging aspect was that editing citations requires “near-human language comprehension and acumen.”
The success that Meta is claiming right now is possible because of the building blocks of the next generation citation tools it built earlier. Last year, the company released an AI model integrating information retrieval and verification. “We are training neural networks to learn more nuanced representations of language so they can pinpoint relevant source material in an internet-size pool of data,” Meta says in a blog post.
Despite the training and access to millions of public pages, Meta needs to account for reasoning and common sense that a human editor would use to evaluate a citation. In order to replace that technique, Meta says it relied on natural language understanding (NLU) to estimate the likelihood that a claim can be inferred from a source.
Meta says this idea of teaching machines to understand the relationship between complex text passages will help the AI research community. It sees this effort to be a step towards advancing smarter systems capable of reasoning about real-world knowledge with “more complexity and nuance.”
How AI works to solve the citation problem?
The main component of Meta’s AI system is an open-source web-scale retrieval library called Sphere. This comprises a new dataset of 134 million web pages and Meta has designed a way to use AI to index a vast amount of information.
As part of training its model, Meta says it fed 4 million claims from Wikipedia to its algorithms and taught it to zero in on a single source from a vast pool of webpages to validate each statement.
The model even translates human sentences into complex mathematical representations and is trained to assess content in chunks. When deciding whether to recommend a URL, Meta says its model only considers the most relevant passage.
The AI model built by Meta to improve citations is not ready to be deployed yet but once deployed, it will “bolster the quality of knowledge on Wikipedia.” Being trained on realistic data at such an unprecedented scale, the impact of this model could extend far beyond fixing citations on Wikipedia pages.