Ah, the Northern Lights… a crazy colourful, cozy Nordic sky that enlightens the landscape of semantic search!

Entities, such as people, organizations, or products, are natural units for organizing information, and a key enabling component for semantic search. Semantic search can be seen as a rich toolbox, an umbrella encompassing multiple techniques to recognize essential knowledge units in the information need, identify them uniquely in the underlying knowledge repositories, and exploit them to answer queries in a meaningful way. Building blocks of semantic search include, among others, entity retrieval, entity linking, query understanding, and result presentation.

In the area of semantic search, despite recent advances, there was a lack of open, publicly available implementations of standard methods and techniques. With this work, we aimed to fill that gap. We develop Nordlys, an open source toolkit for semantic search: a major step towards reproducible and extensible research in this area.

Nordlys implements a number of traditional and state-of-the-art methods for a range of tasks: entity retrieval, entity linking in queries, target type identification in queries, and entity cataloging. Another important characteristic is that it accommodates various usage needs on different levels:

  • 1). It is made available as an open source Python library that can be integrated into larger applications or can be used as a command line tool for research and experimentation. The code is organized in a three-tier architecture, cleanly separating the various layers of functionality. It incorporates Elastic for supporting retrieval, MongoDB as storage, and scikit-learn library in its machine learning component.
  • 2). It provides a RESTful API, through which Nordlys can be used as a service, much like a black box.

  • 3). The functionality is also available through a graphical web user interface. This interface can be used, for example, to perform user studies on result presentation.


If you are interested in more of this, please read our article for the technical details of our project, and see Nordlys for the source code, a detailed documentation, and the web interface.