A search engine that learns from your behaviour

9 June 2022

How do you build a search engine that learns to adapt to the needs of the user? Researchers from the University of Amsterdam investigating personal aspects of users are working closely together with big tech and are meanwhile amazed by some primary schoolchildren’s wisdom.

In recent decades, search engines such as Google, Bing and DuckDuckGo have become indispensable instruments for finding information in the continually expanding digital universe. As such, it is has become increasingly important that the search results found tie in with what the user is actually searching for.

Researchers who develop search algorithms dream of a self-learning, personalised search engine; one that learns to adapt to what an individual user wants. Between 2016 and 2021, professor Maarten de Rijke from the University of Amsterdam led an NWO research project that aimed to contribute to the development of such a self-learning, personalised search engine: project CLeaR, the acronym for Contextual Learning to Rank.

How does a self-learning, personalised search engine differ from the search engine before the project started? ‘What is truly useful for a user depends, amongst other things, on the user’s knowledge and experience and they have previously searched for’, De Rijke says. ‘It cannot simply be captured by merely examining the contents of documents on the web.’

The golden find of Google was a search algorithm that also took into account how often a page was referred to from other pages.

For a long time, examining the contents of web pages was the basis of each search algorithm, and that was continuously improved over the years, of course. In 1998, Google was the first to use a search algorithm that not only examined how often a certain word occurred on a webpage, which all other search engines already did, but also took into account how often the page was referred to from other pages. This second search principle proved to be a golden find, which gave Google a major advantage.

However, De Rijke and his colleagues are also trying to implicitly include the personal aspect of the user. De Rijke: ‘Where does somebody click? What do they download? How long do they spend reading something? Based on the interaction between the user and the search engine, we are trying to understand how the search results can best tie in with the context, knowledge and need of the user.’

A click on a link placed where users rarely look receives an extra push to correct for that disadvantage. In this example, the click would be weighted twice as heavily.

Learning from users

Harrie Oosterhuis was one of the three PhDs who worked on the project. What does he consider to be the most important outcome of his research? ‘We developed a statistical method that learns from the clicking behaviour of users’, Oosterhuis says. ‘If you show a list of results to users, most will only pay attention to the top two or three results. The position of a result can therefore lead to a significant advantage or disadvantage that has nothing to do with the relevance. You need to take this into account when comparing clicking behaviour on different results. For example, if you see that ten percent of the users click on the first result and only five percent on the tenth result, then that tenth result is probably far more attractive. Our method maps such incorrect advantages and disadvantages and corrects for them.’

You could say that the search results which wrongly end up lower on the list of results but which people definitely click on more often should get an extra boost upwards. Oosterhuis obtained his PhD cum laude and won a Best Paper Award for his research at the renowned WSDM conference of search engines in 2021.

Open research community

Oosterhuis saw that his research results were quickly picked up by fellow researchers from both academia and big tech. He does not know for certain whether the results are already used in commercial search engines such as Google, but the chances are high. ‘Another significant advantage of our method,’ Oosterhuis says, ‘is that it not only works when searching for webpages but also when searching for products, images, videos, emails or internal documents.’

Historical data of a user say something about long-term interests, but those do not have to concur with their short-term interests Maarten de Rijke

The other two PhDs in the project, Chang Li and Rolf Jagerman, investigated other aspects of self-learning, personalised search engines.

De Rijke: ‘Li examined how well self-learning search engines perform if they must immediately decide at which position in the ranking a search result should stand in an online search query. In such a case, can we give theoretical guarantees for how good or bad the result is?’

Conversely, the third PhD, Rolf Jagerman, studied how a search engine can learn off-line from historical data. De Rijke: ‘What people search for can change over time, and that influences the ranking with search results. Amongst other things, Rolf examined how a search engine should respond to such changes. A user’s historical data say something about long-term interests, but those do not have to concur with their short-term interests. How can you weigh these two factors against each other?’

After their PhDs, Jagerman went to work for Google and Chang for Apple. Oosterhuis will shortly start at Twitter for one day a week, besides his job as an assistant professor at Radboud University, where he started working after his PhD. ‘Our research community is very open’, De Rijke says. ‘University and commercial research are closely intertwined and strongly influence each other. We attend the same conferences, jointly give tutorials and do internships at each other’s locations.’

Counteracting bias

An important challenge for future research is dealing with implicit biases. Imagine you search for the term “professor” and you are given a list of results topped by a vast number of men and then only several women much lower on the list. The traditional search engines do nothing about this. The method developed by Oosterhuis can correct for this. Oosterhuis: ‘In this example, our search algorithm performs well if the clicking behaviour agrees with what we consider to be socially desirable, namely that users click just as often on female as male professors. However, if people behave in a sexist manner and click on men far more than on women, then the algorithm will learn that same behaviour. In that case, you could then configure the search engine to display equal numbers of men as women, but then there is a significant risk of introducing another type of bias. It is often the case that if one aspect is made more fair, then another aspect will become less fair. It is sort of a waterbed effect. Dealing with this wisely is a major challenge.’

Primary school children

Besides the publication of three PhD theses and numerous scientific articles, De Rijke and Oosterhuis are also pleased with their contributions to the general public. For example, they jointly gave a lecture at NEMO Science Museum in Amsterdam for primary schoolchildren aged eight to ten. Oosterhuis: ‘We used YouTube as an example to explain recommendation systems. At the end of the lecture, we asked the children what they had learned. One of the children gave a perfect summary. We stood there flabbergasted, highly impressed by the answer. Sometimes, I think we can better explain our work to primary schoolchildren than to my own parents. They are very at home with systems like YouTube, and they know that different children are recommended different videos.’

A search engine that learns from your behaviour

Learning from users

Open research community

Counteracting bias

Primary school children

Cookie Consent