Many information needs behind people’s searches revolve around specific entities. I have previously discussed our work on answering the research question What do entity-oriented queries ask for, and how can they be fulfilled? Some search intents correspond to information that can be retrieved from the entry of the entity in a knowledge base of reference, for example, the intent about the number of inhabitants of a city. A particular kind of entity-bearing queries is rather more transaction-oriented. Either trying to book a flight or looking for tickets for a concert, just to mention two popular examples, users are often engaged to fulfill information needs by interacting with a third-party service or application. There has been an increasing interest and efforts towards transforming search engines into actions-guided task completion assistants. As in the scenario of a user trying to book a taxi, an envisaged user interface would recognize the service-oriented search intent, and then allow to perform the required transaction, including the elicitation of the travel details and the request for ordering a taxi.

Here, I summarize our approach to build IntentsKB, a knowledge base (KB) of entity-oriented search intents, for representing in a structured fashion the main search intents that are commonly associated with a given entity type.

A knowledge base of entity-oriented search intents

Each search intent is uniquely identified by an intentID. The KB consists of a set of (subject, predicate, object, confidence) quadruples. The subject is always an intentID, to which we assign a particular kind of object according to each of the three possible predicates:

  • searchedForType, used for associating an intent with an entity type;
  • ofCategory, used for associating an intent with an intent category;
  • expressedBy, used for associating an intent with a lexicalization.

An intent profile is then a set of quadruples that describe an intent.

We propose a pipeline framework to build the knowledge base, consisting of refiner acquisition, refiner categorization, intent discovery, and knowledge base construction stages.

Overview of our proposed framework to construct IntentsKB.

In the first stage, refiners acquisition, we obtain popular type-level query patterns.

Next is the assignment of intent categories to refiners. We approach this task, referred to as refiner categorization, as a single-class classification problem using supervised learning. Each type-refiner pair constitutes an instance, and the four categories defined above are the possible classes.

Once each type-level refiner is mapped to a category, we proceed to discover the intents underlying those refiners, by clustering the refiners that express the same user intent. We make use of the intent categories that have been assigned in the previous step, that is, two refiners in the same cluster must have the same intent category.

In the last step, we construct the full knowledge base representation of intents, i.e., create intent profiles. An intent profile consists of the set of refiners that were clustered together. The profile is assigned a unique intentID.

We define a formula to compute the confidence of a (subject, predicate, object) triple, according to the particular predicate; we also assign a confidence score to the intent profile itself.

All in all, using a small seed of labeled data, we generate a knowledge base comprising over thirty thousand intent profiles, and a hundred a fifty thousand quadruples.

We evaluate the components of our pipeline, as well as proceed to estimate the expected overall quality of the whole knowledge base. We find that, indeed, the higher the associated confidence score, the more likely it is that the triple is correct.

Results: knowledge base correctness. Proportion of triples in the annotated sample (y-axis), and number of intent profiles in the KB (on top of each bar), per confidence bucket.

If you are interested in more of this, please read Sections 5.4 to 5.6 of my thesis Task-Based Support in Search Engines, for technical details of the problem, the terminology, the methodology, and the experimental results and analysis.