Darío Garigliotti | publications

BibTeX entries are available on my DBLP profile.

Ph.D. thesis

Task-based Support in Search Engines Garigliotti, Darı́o 2020 [Abstract] [PDF] [Slides]
Web search has become a key technology on which people rely on a daily basis. The evolution of the search experience has also shaped the expectations of people about it. Many users seem to expect today’s search engines to behave like a kind of “wise interpreter,” capable of understanding the meaning behind a search query, realizing its current context, and responding to it directly and appropriately. Semantic search encompasses a large portion of information retrieval (IR) research devoted to study more meaningful representations of the user information need. Entity cards, direct displays, and verticals are examples of how major commercial search engines have indeed capitalized on query understanding. Search is usually performed with a specific goal underlying the query. In many cases, this goal consists of a nontrivial task to be completed. Current search engines support a small set of basic tasks, and most of the knowledge-intensive workload for supporting more complex tasks is left to the user. Task-based search can be viewed as an information access paradigm that aims to enhance search engines with functionalities for recognizing the underlying tasks in searches and providing support for task completion. The research presented in this thesis focuses on utilizing and extending methods and techniques from semantic search in the next stage of the evolution: to support users in achieving their tasks. Our work can be grouped in three grand themes: (1) Entity type information for entity retrieval: we conduct a systematic evaluation and analysis of methods for type-aware entity retrieval, in terms of three main dimensions. We revisit the problem of hierarchical target type identification, present a state-of-the-art supervised learning method, and analyze the usage of automatically identified target entity types for type-aware entity retrieval; (2) Entity-oriented search intents: we propose a categorization scheme for entity-oriented search intents, and study the distributions of entity intent categories per entity type. We develop a method for constructing a knowledge base of entity-oriented search intents; and (3) Task-based search: we design a probabilistic generative framework for task-based query suggestion, and principledly estimate each of its components. We introduce the problems of query-based task recommendation and mission-based task recommendation, and establish respective suitable baselines.

2019

Semi-supervised Learning for Word Sense Disambiguation Garigliotti, Darı́o arXiv e-prints 2019 [Abstract] [PDF]
This work is a study of the impact of multiple aspects in a classic unsupervised word sense disambiguation algorithm. We identify relevant factors in a decision rule algorithm, including the initial labeling of examples, the formalization of the rule confidence, and the criteria for accepting a decision rule. Some of these factors are only implicitly considered in the original literature. We then propose a lightly supervised version of the algorithm, and employ a pseudo-word-based strategy to evaluate the impact of these factors. The obtained performances are comparable with those of highly optimized formulations of the word sense disambiguation method.
NeuType: A Simple and Effective Neural Network Approach for Predicting Missing Entity Type Information in Knowledge Bases Hovda, Jon Arne Bø, Garigliotti, Darı́o, and Balog, Krisztian arXiv e-prints 2019 [Abstract] [PDF]
Knowledge bases store information about the semantic types of entities, which can be utilized in a range of information access tasks. This information, however, is often incomplete, due to new entities emerging on a daily basis. We address the task of automatically assigning types to entities in a knowledge base from a type taxonomy. Specifically, we present two neural network architectures, which take short entity descriptions and, optionally, information about related entities as input. Using the DBpedia knowledge base for experimental evaluation, we demonstrate that these simple architectures yield significant improvements over the current state of the art.
ICTIR
Unsupervised Context Retrieval for Long-Tail Entities Garigliotti, Darı́o, Albakour, Dyaa, Martinez, Miguel, and Balog, Krisztian In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval 2019 [Abstract] [PDF] [Poster] [Slides] [Video]
Monitoring entities in media streams often relies on rich entity representations, like structured information available in a knowledge base (KB). For long-tail entities, such monitoring is highly challenging, due to their limited, if not entirely missing, representation in the reference KB. In this paper, we address the problem of retrieving textual contexts for monitoring long-tail entities. We propose an unsupervised method to overcome the limited representation of long-tail entities by leveraging established entities and their contexts as support information. Evaluation on a purpose-built test collection shows the suitability of our approach and its robustness for out-of-KB entities.
IRJ
Identifying and exploiting target entity type information for ad hoc entity retrieval Garigliotti, Darío, Hasibi, Faegheh, and Balog, Krisztian Information Retrieval Journal 2019 [Abstract] [PDF] [Repository]
Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in two settings: firstly, in an idealized “oracle” setting, assuming that we know the distribution of target types of the relevant entities for a given query; and secondly, in a realistic scenario, where target entity types are identified automatically based on the keyword query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we show that type information can significantly and substantially improve retrieval performance, yielding up to 67% relative improvement in terms of NDCG@10 over a strong text-only baseline in an oracle setting. We further show that using automatic target type detection, we can outperform the text-only baseline by 44% in terms of NDCG@10. This is as good as, and sometimes even better than, what is attainable by using explicit target type information provided by humans. These results indicate that identifying target entity types of queries is challenging even for humans and attests to the effectiveness of our proposed automatic approach.

2018

CIKM
IntentsKB: A Knowledge Base of Entity-Oriented Search Intents Garigliotti, Darı́o, and Balog, Krisztian In Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018 [Abstract] [PDF] [Poster] [Repository] [Slides]
We address the problem of constructing a knowledge base of entity-oriented search intents. Search intents are defined on the level of entity types, each comprising of a high-level intent category (property, website, service, or other), along with a cluster of query terms used to express that intent. These machine-readable statements can be leveraged in various applications, e.g., for generating entity cards or query recommendations. By structuring service-oriented search intents, we take one step towards making entities actionable. The main contribution of this paper is a pipeline of components we develop to construct a knowledge base of entity intents. We evaluate performance both component-wise and end-to-end, and demonstrate that our approach is able to generate high-quality data.
SIGIR
A Semantic Search Approach to Task-Completion Engines Garigliotti, Darı́o In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval 2018 [Abstract] [PDF] [Slides]
Web search has become a key technology in society. The increased engagement of users has enhanced their expectations, leading to an evolution of search engines towards attempting an understanding, or semantics, of information needs. The next paradigm shift is to support task completion, this is, to help the user complete her underlying goal when issuing a search query. In this research, I propose to study semantic search components suitable for task-based search. Our contributions address three main challenges, which are as follows. We conduct a systematic formalization and evaluation of aspects in utilizing entity type information for entity retrieval. We also approach to understanding entity-oriented queries by categorization of search intents. Last, we develop methods for generating high-quality task-based query suggestions. We envisage the capability of the three identified components to complement each other for supporting task completion.
ECIR
Towards an Understanding of Entity-Oriented Search Intents Garigliotti, Darío, and Balog, Krisztian In Advances in Information Retrieval - Proceedings of the 40th European Conference on IR Research 2018 [Abstract] [PDF] [Poster] [Repository]
Entity-oriented search deals with a wide variety of information needs, from displaying direct answers to interacting with services. In this work, we aim to understand what are prominent entity-oriented search intents and how they can be fulfilled. We develop a scheme of entity intent categories, and use them to annotate a sample of queries. Specifically, we annotate unique query refiners on the level of entity types. We observe that, on average, over half of those refiners seek to interact with a service, while over a quarter of the refiners search for information that may be looked up in a knowledge base.
ECIR
Generating High-Quality Query Suggestion Candidates for Task-Based Search Ding, Heng, Zhang, Shuo, Garigliotti, Darío, and Balog, Krisztian In Advances in Information Retrieval - Proceedings of the 40th European Conference on IR Research 2018 [Abstract] [PDF] [Poster] [Repository]
We address the task of generating query suggestions for task-based search. The current state of the art relies heavily on suggestions provided by a major search engine. In this paper, we solve the task without reliance on search engines. Specifically, we focus on the first step of a two-stage pipeline approach, which is dedicated to the generation of query suggestion candidates. We present three methods for generating candidate suggestions and apply them on multiple information sources. Using a purpose-built test collection, we find that these methods are able to generate high-quality suggestion candidates.

2017

ICTIR
On Type-Aware Entity Retrieval Garigliotti, Darı́o, and Balog, Krisztian In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval 2017 [Abstract] [PDF] [Slides]
Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in an idealized "oracle" setting, assuming that we know the distribution of target types of the relevant entities for a given query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we find that type information proves most useful when using large type taxonomies that provide very specific types. We provide further insights on the extensional coverage of entities and on the utility of target types.
ICTIR
Learning to Rank Target Types for Entity-Bearing Queries Garigliotti, Darı́o, and Balog, Krisztian In Proceedings of the 1st International Workshop on LEARning Next gEneration Rankers (LEARNER 2017), co-located with the 3rd ACM International Conference on the Theory of Information Retrieval (ICTIR 2017) 2017 [Abstract] [PDF] [Slides]
This paper revisits the learning-to-rank approach we proposed for automatically identifying the target entity types of queries. After presenting our contributions and results, we draw on the learned lessons and encountered challenges to identify directions for future enhancements.
SIGIR
Target Type Identification for Entity-Bearing Queries Garigliotti, Darı́o, Hasibi, Faegheh, and Balog, Krisztian In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017 [Abstract] [PDF] [Poster] [Repository]
Identifying the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. In this work, we address the problem of automatically detecting the target types of a query with respect to a type taxonomy. We propose a supervised learning approach with a rich variety of features. Using a purpose-built test collection, we show that our approach outperforms existing methods by a remarkable margin.
SIGIR
Generating Query Suggestions to Support Task-Based Search Garigliotti, Darı́o, and Balog, Krisztian In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017 [Abstract] [PDF] [Poster] [Repository]
We address the problem of generating query suggestions to support users in completing their underlying tasks (which motivated them to search in the first place). Given an initial query, these query suggestions should provide a coverage of possible subtasks the user might be looking for. We propose a probabilistic modeling framework that obtains keyphrases from multiple sources and generates query suggestions from these keyphrases. Using the test suites of the TREC Tasks track, we evaluate and analyze each component of our model.
SIGIR
Nordlys: A Toolkit for Entity-Oriented and Semantic Search Hasibi, Faegheh, Balog, Krisztian, Garigliotti, Darı́o, and Zhang, Shuo In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017 [Abstract] [PDF] [Poster] [Repository]
We introduce Nordlys, a toolkit for entity-oriented and semantic search. It provides functionality for entity cataloging, entity retrieval, entity linking, and target type identification. Nordlys may be used as a Python library or as a RESTful API, and also comes with a web-based user interface. The toolkit is open source and is available at http://nordlys.cc.
WSDM
Supervised Ranking of Triples for Type-Like Relations—The Cress Triple Scorer at the WSDM Cup 2017 Hasibi, Faegheh, Garigliotti, Darío, Zhang, Shuo, and Balog, Krisztian In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK 2017 [Abstract] [PDF] [Poster]
This paper describes our participation in the Triple Scoring task of WSDM Cup 2017, which aims at ranking triples from a knowledge base for two type-like relations: profession and nationality. We introduce a supervised ranking method along with the features we designed for this task. Our system has been top ranked with respect to average score difference and 2nd best in terms of Kendall’s tau.
TREC
The University of Stavanger at the TREC 2016 Tasks Track Garigliotti, Darío, and Balog, Krisztian In Proceedings of the Twenty-Fifth Text REtrieval Conference 2017 [Abstract] [PDF]
This paper describes our participation in the Task understanding task of the Tasks track at TREC 2016. We introduce a general probabilistic framework in which we combine query suggestions from web search engines with keyphrases generated from top ranked documents. We achieved top performance among all submitted systems, on both official evaluation metrics, which attests the effectiveness of our approach.

2016

The University of Stavanger at the TREC 2016 Tasks Track Garigliotti, Darío, and Balog, Krisztian In TREC 2016 Working Notes 2016 [Abstract] [PDF]
This paper describes our participation in the Task understanding task of the Tasks track at TREC 2016. We introduce a general probabilistic framework in which we combine query suggestions from web search engines with keyphrases generated from top ranked documents.

2015

ESWC
Open Knowledge Extraction Challenge Nuzzolese, Andrea Giovanni, Gentile, Anna Lisa, Presutti, Valentina, Gangemi, Aldo, Garigliotti, Darı́o, and Navigli, Roberto In Semantic Web Evaluation Challenges - Second SemWebEval Challenge at the 12th Extended Semantic Web Conference 2015 [Abstract] [PDF]
The Open Knowledge Extraction (OKE) challenge is aimed at promoting research in the automatic extraction of structured content from textual data and its representation and publication as Linked Data. We designed two extraction tasks: (1) Entity Recognition, Linking and Typing and (2) Class Induction and entity typing. The challenge saw the participations of four systems: CETUS-FOX and FRED participating to both tasks, Adel participating to Task 1 and OAK@Sheffield participat- ing to Task 2. In this paper we describe the OKE challenge, the tasks, the datasets used for training and evaluating the systems, the evaluation method, and obtained results.

M.Sc. thesis

An Interactive System for the Interpretation of Specifications (Original title in Spanish: Un Sistema Interactivo para la Interpretación de Especificaciones) Garigliotti, Darı́o 2014 [Abstract] [PDF]
En este trabajo estudiamos el problema del tratamiento de una especificación de software expresada en lenguaje natural. Observamos y clasificamos fenómenos lingüísticos sobre un cuerpo de ejemplos de especificaciones. A su vez, exploramos algunos sistemas presentados en la literatura relacionada, identificando sus mejores características. A partir de las mismas, diseñamos e implementamos un sistema que interprete una especificación, expresada en un formato muy simple e informativo, y obtenga un sistema de transiciones etiquetadas. La estrategia de resolución combina un enfoque interactivo con heurísticas ad-hoc de decisión. Se enriquece la representación con el tratamiento de fenómenos semánticos. Varios lineamientos son ofrecidos para su extensión, en particular, hacia un modelo de anotación de ejemplos en el contexto educativo.

2013

JAIIO
Semi-supervised Learning for Word Sense Disambiguation (Original title in Spanish: Desambiguación de Palabras Polisémicas mediante Aprendizaje Semi-supervisado) Garigliotti, Darío In Annals of 42nd JAIIO - Argentine Journals of Informatics 2013 [Abstract] [PDF] [Poster] [Slides]
Este trabajo es una exploración sistemática del impacto de diferentes aspectos de los algoritmos clásicos de desambiguación -no supervisada- de sentidos. Tras identificar los factores relevantes para su funcionamiento, muchos de los cuales estaban solamente implícitos en la descripción de estos algoritmos, implementamos una versión simplificada y levemente supervisada de un algoritmo clásico de reglas de decisión para desambiguación no supervisada. Evaluamos el impacto de cada uno de estos factores en el desempeño del mismo, entre ellos: el leve etiquetado inicial de ejemplos, la ecuación de confiabilidad de una regla y los criterios de aceptación de tales reglas de decisión. Los resultados obtenidos mediante una económica y poderosa evaluación con pseudo-palabras exhiben una performance aceptable en comparación con versiones muy optimizadas, y nos llevan a proponer prometedoras mejoras a futuro.