How do you keep Wikipedia up to date? What should you add? And when? Researchers at the University of Amsterdam (UvA) have developed a method that offers suggestions for new Wikipedia pages. These suggestions are based on popularity on social media.
David Graus, PhD researcher at the UvA’s Intelligent Systems Lab Amsterdam (ISLA), has collaborated with colleagues to develop a method for automatically recognising new, upcoming or unfamiliar concepts, before they are included in Wikipedia. The algorithm works by analysing social media (Twitter); it learns to recognise unfamiliar concepts by studying how people talk about concepts that are already familiar.
To do this, the researchers used ‘semantic linking’, which involves coupling meaning to words. This helps when interpreting large amounts of content. Semantic linking is all about linking words to concepts that are described in knowledge banks such as Wikipedia or Freebase. This makes smart use of the enormity of online knowledge banks, which collectively describe millions of concepts.
But how do you link concepts that are not yet described in Wikipedia or Freebase? This question is important in various domains, among them digital forensics, in which detectives are keen to recognise and link ‘unknown’ persons in email, for example, in order to generate a profile of key persons within a network. In the context of news, too, the identification of as yet unfamiliar concepts plays an important role.
‘The algorithm we developed is self-learning. It uses the “prior knowledge” held in Wikipedia to learn how to recognise new concepts. This is the first step towards automatically supplementing Wikipedia with new content based on what is being discussed on social media,’ explains Graus, who presented his work at the European Conference on Information Retrieval, which took place in Amsterdam, the Netherlands from 13 to 16 April.
Under the supervision of Professor M. de Rijke, Graus is carrying out his research as part of the Netherlands Organisation for Scientific Research (NWO) project ‘Semantic Search in E-Discovery’, which in turn is part of the Forensic Science programme.
Graus, D., Tsagkias, E., Buitinck, L., de Rijke, M. ‘Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams’ published in Advances in Information Retrieval, the proceedings of the 36th Conference on Information Retrieval (ECIR 2014), 2014.