Entity-oriented queries is a possible denomination for those queries which revolve around entities. This means queries whose expected result is an entity or a list of entities, or queries that contain entity mentions. Within entity-oriented queries, we are interested in entity-bearing queries, i.e., those that typically consist of (one of the names of) an entity and possibly additional words that refine the intent behind the query. This part made of complementary context terms is referred to as the refiner of the query. Examples of these queries are “the rock movies” and “london book a hotel.”
An envisaged search engine would identify the intent behind such an entity-oriented query and present a result object according to the nature of the intent. For example, a user might ask for the population of the city of Stavanger, say, by querying “stavanger population.” The engine would recognize the population-related intent as a property that can be retrieved from the corresponding entry for Stavanger in a knowledge base of reference. The result might then boil down to a simple entity card for the city, that in particular displays an entry with the number of its population, as taken from the KB. This case is similar to the population field in the entity card for Oslo shown when querying “oslo” in Google, card that we observed in a previous post.
In another scenario, the user instead issues “taxi to stavanger airport.” Here, we would instead aim for recognizing that the intent for the Stavanger Airport behind the refiner “taxi to” rather deals with performing a transaction and booking a taxi. A possible result format, illustrated in the following figure, allows the user to provide details of her travel, such as her position and desired time, and displays a button to order a taxi.
In our work, we aim to understand entity-related search intents by studying those query refiners. What do entity-oriented queries ask for, and how can they be fulfilled?
We collect refiners for a set of prominent entities, and aggregate them across entity types to obtain type-level query patterns. For example, by representing with [airport] any entity of the type airport, we are interested in the intent behind “taxi to” in the type-level query pattern “taxi to [airport]”. We work with a representative sample of 50 entity types from Freebase type system. By analyzing an annotated collection of entity-bearing queries, aggregated by entity types, we obtain a clearer understanding of what entity-oriented queries ask for.
In particular, we make observations regarding which searches can be fulfilled by looking up direct answers from a knowledge base (e.g., when seeking the population of a city), and which would require to interact with external services (such as booking a taxi). We then need a suitable scheme to classify the entity intents. After a close inspection of the type-level refiners, we define the following scheme of intent categories. These categories are focused on how (and from which type of source) the information need can be fulfilled.
Intent categorization scheme
- Property: The refiner is about getting a specific entity property or attribute that can be looked up in a knowledge base. For example, "children" in the query "angelina jolie children" or "opening times" in "at&t stadium opening times." The criterion does not require the refiner to exist as a property in an actual knowledge base, but rather its existence to be reasonable.
- Website: The refiner looks to reach a specific website or application. For example, "twitter" in the query "karpathy twitter."
- Service: The refiner expresses the need to interact with a service, possibly by redirecting to an external site or app. For example, "menu" in the query "keens steakhouse menu" would indicate the need to access to an external site for reading the restaurant's menu. As another example, "new album" in "eric clapton new album" looks for a service to read about, or listen to, or buy the new album. The interaction would possibly involve further parameters, like 'from' and 'to' values for "ticket price" in the query "jpass bullet train ticket price."
- Other: None of the previous ones is applicable. For example, "india" in the query "microsoft india" merely serves to disambiguate the company office from other locations.
Understanding entity-oriented search intents
We annotate each of the type-level query patterns in the collection with one of these four intent categories, and obtain a distribution of entity intent categories per type. From the average proportions in these distributions, we observe that a 54.06% of unique entity-oriented queries are to be fulfilled by interacting with some external service or app, meanwhile, 28.6% look for direct answers from a knowledge base. Further, 5.34% of the type-level refiners represent an attempt to reach a website, while 12.08% of them do not fit into any of the previous three categories.
The entity types with the largest proportion of service intents are
netflix genre (with refiners like “videos,” “live”),
election (“map,” “polls”),
football match (“video,” “highlights”), and
The property intent category covers refiners that are of a more static nature, e.g.,
chemical compound (with refiners like “structural formula,” “molecular weight”),
political party (“slogan,” “president”),
star (“type of star,” “temperature”), or
tower (“hours,” “height”); only the first one is a very prominent type.
Most of the entity types exhibit a non-empty proportion of website intents. Among all the types, this category exceeds the average proportion, e.g., for
blogger. The most frequent website refiners in the whole corpus are “wikipedia,” “twitter,” “facebook,” and “youtube.”
A marginal proportion of refiners are classified as having the other intent. A few exceptional cases with large proportions of other intents are, e.g.,
business operation and
house (where the refiner is usually a location), or
basketball player (for which many refiners refer mostly to an NBA franchise, e.g., “lakers”).
If you are interested in more of this, please read Sections 5.1 to 5.3 of my thesis Task-Based Support in Search Engines, for technical details of the problem, the terminology, the methodology, and the experimental results and analysis.