Semantic Search Tools Evaluation Campaign 2011OrganizersStuart N. Wrigley, University of Sheffield, UK (s.wrigley AT dcs DOT shef DOT ac DOT uk) Updates to this evaluation campaign will be announced here and per RSS. Follow the project on twitter or RSS. Overview of the Evaluation CampaignIntroductionThe goals of the semantic search tool evaluation initiative are to support developers to improve their tools; compare their tools against their competitors and to generally improve the interoperability of semantic technologies. The short-term goal is to create a set of reference benchmark tests for assessing the strengths and weaknesses of the available tools and to compare them with each other. As such, these tests will focus on the performance of fundamental aspects of the tool in a strictly controlled environment / scenario rather than their ability to solve open-ended, real-life problems. CriteriaFor this evaluation campaign semantic search tools will be evaluated according to a number of different criteria including query expressiveness, usability (effectiveness, efficiency, satisfaction) and scalability. Scalability will address a number of factors including the tool's ability to query a large repository in a reasonable time; the tool's ability to cope with differing ontology sizes; and the tool's ability to cope with a large amount of query results. Query expressiveness will investigate the means by which queries are formulated within the tool and the degree to which this facilitates (or even impedes) the user's question-answering goal. However, given the interactive nature of semantic search tools, a core interest in this evaluation is the usability of a particular tool. Two phase approachThe core functionality of a semantic search tool is to allow a user to discover one or more facts or documents by inputting some form of a query. The manner, in which this input occurs (e.g.: natural language, keywords, visual representation) is not of concern; however, the user experience of using the interface is of interest. Therefore, it is essential that the evaluation procedures emphasize the users' experience with each tool. In order to achieve this goal, the evaluation of each tool is split into two complementary phases: - the automated phase: this involves the use of the tool (in a fully automated environment provided by SEALS) in which the tool is exercised to assess the various aspects described in the evaluation scenarios (below).
- user-in-the-loop phase: this involves a series of experiments involving human subjects, who are given a number of tasks (questions) to complete using a particular tool operating on an particular ontology.
Hence, the two core implications of this are that the user-in-the-loop experiments will be run by each tool provider participating in the evaluation and that all materials required for the user-in-the-loop experiments will be provided by the SEALS consortium. Evaluation scenariosThe evaluation scenarios are listed below (SST = Semantic Search Tool). It is possible for tools to only participate in one scenario, although participation in both would be preferred. - SST Automated 2011.
This scenario addresses: - search performance: The tool's core search quality in terms of precision, recall, etc. - performance and scalability: The tool's ability to load, handle and perform queries on large data sets. - SST User 2011.
This scenario addresses: - usability: how do the end-users react to the tool's query language? Do they like the tool? Are they able to express their questions effectively and quickly? Is the language easy to understand and learn? How expressive is the tool's query language?
Evaluation datasets
Only OWL ontologies will be used as test data: in order to simplify the development of the benchmarks for the first evaluation campaign, it has been decided that search tools operating on purely OWL ontologies will be evaluated. For this campaign, we have selected three core datasets (Mooney will be used for the User scenario and EvoOnt and MusicBrainz will be used for the Automated scenario): - EvoOnt is a set of software ontologies and will be used in the automated phase. Five different data set sizes (1k, 10k, 100k, 1M, 10M triples) and associated questions have been produced in order to measure the scalability and performance of a tool.
- The Mooney Natural Language Learning Data Set will be used in the user-in-the-loop phase. Specifically, we will use the geography subset. This data assumes no specialist knowledge of the field and is thus well suited to a usability study in which a diverse set of subjects will be used.
- A version of the MusicBrainz corpus as used by the organisers of the Question Answering over Linked Data (QALD) workshops.
Evaluation materialsConnecting a semantic search tool with the SEALS Platform is easy and is achieved by creating a wrapper for your tool which implements a Java programming interface. This allows full communication between your tool and the SEALS Platform. Details regarding how to implement this connection (including a full tutorial) can be found on this page.A detailed description of the evaluation scenarios and test data can be found in SEALS deliverable D13.4 (and, for more detailed background information, D13.1). All materials required for the user-in-the-loop experiments (software experiment controller, instructions, etc) will be provided by the SEALS consortium.
How do I get involved?Participation is open to developers interested in evaluating their tool or to anyone who wants to evaluate a certain tool. Participants are just expected to collaborate in the connection of their tool with the SEALS Platform, which will be the infrastructure that will run all the evaluations automatically, and the execution of the user study. Besides checking their results and comparing with others, once the tool is connected to the SEALS Platform participants will also be able to run the evaluations on their own with these and future test data. The first step is to join the SEALS Community. Once you have your community login, you will be able to register your tools for the evaluation campaign. Users with tools registered to the evaluation campaign will receive notifications as further details and services of the SEALS platform become available. Timeline and eventsThe evaluation campaign will take place during Autumn 2011 and Spring 2012. The concrete timeline of the whole evaluation campaign is the following (follow our RSS and twitter feeds for news on when these dates become more precise): - October 2011 Evaluation materials and documentation are provided to participants
- November onwards 2011 Participants upload their tools
- November 2011 - March 2012 Evaluation scenarios are executed
- March - May 2012 Evaluation results are analysed
- June 2012 Evaluation results are discussed at ESWC2012 workshop (workshop to be confirmed)
|