Query suggestions are an integral part of modern search engines. In order to address it scientifically, a possible formalization of this feature in web search is by defining the query suggestion problem as the one of recommending a list of related queries to an initial user input query.

Search, however, is often performed in the context of some larger underlying task, and traditional approaches for suggesting queries do not consider the task behind the initial query. There is a growing stream of research aimed at making search engines more task-aware (i.e., recognizing what task the user is trying to accomplish) and customizing the search experience appropriately. In this post, I explain our approach to query suggestion generation for supporting task-based search.

We envisage a user interface where task-based query suggestions are presented once the user has issued an initial query. As illustrated in the following figure, these query suggestions actually come in two flavors: query completions and query refinements. The difference is that the former are prefixed by the initial query, while the latter are not.

The task-aware query suggestions we propose are intended for exploring various aspects (subtasks) of the given task after inspecting the initial search results. Selecting them would allow the user to narrow down the scope of the search. We adhere to the scientific definition of the task understanding problem: given an initial query, the system should return a ranked list of suggestions that represent the set of all tasks a user who submitted the query may be looking for. This means that the goal is to provide a complete coverage of subtasks for an initial query, while avoiding redundancy.

Our aim is to generate suggestions in a setting where past usage data and query logs are not available or cannot be utilized. As an example, this would be typical for systems that have a smaller user base, like in the enterprise domain. We consider to address task-based query suggestion in an end-to-end fashion.

  • One possibility is to use query suggestion APIs, which are offered by all major web search engines.
  • Additionally, we can use the initial query to search for relevant documents, and extract keyphrases from search snippets and from full text documents.
  • Also, we could lend special treatment to WikiHow, which is an extensive database of how-to guides.

Then, we propose a probabilistic generative framework that consists of four components, which incorporate the possible contributions listed above: source importance, document importance, keyphrase relevance, and query suggestion generation.

High-level overview of our approach to query suggestion generation.

We define different estimators for these components, and experimentally compare them using the datasets from a dedicated benchmark campaign associated with the task understanding problem. As a summary of findings,

  • query suggestions provided by major web search engines are unequivocally the most useful information source;
  • using the raw keyphrases extracted from documents almost always performs better than expanding them by taking the original query into account during generation;
  • for web query suggestions it is beneficial to consider the rank order of suggestions, while for web search snippets and documents an uniform contribution performs better.

If you are interested in more of this, please read Sections 6.1 to 6.4 of my thesis Task-Based Support in Search Engines, for technical details of the problem, the terminology, the methodology, and the experimental results and analysis.