Semantic Search Tools Evaluation Campaign 2010This campaign is now running - get involved! Organizers Stuart N. Wrigley, University of Sheffield, UK (s.wrigley AT dcs DOT shef DOT ac DOT uk) Dorothee Reinhard, University of Zurich, Switzerland (dreinhard AT ifi DOT uzh DOT ch) Updates to this evaluation campaign will be announced here and per RSS. - Full campaign announcement sent
(Posted 19 July 2010) - Video of the SemSearch2010 workshop presentation is now available from videolectures.net.
(Posted 19 May 2010) - Search API for interacting with the SEALS evaluation Platform released.
(Posted 7 May 2010) - The slides used during the SemSearch2010 workshop presentation are available from slideshare and the full paper is available here.
(Posted 4 May 2010) - SEALS will be presenting their evaluation campaign design at the SemSearch2010 workshop (located at WWW2010 in Raleigh, NC, USA) on 26 April 2010 - come and see us to find out more.
(Posted 23 April 2010) Follow us on twitter or RSS. Overview of the Evaluation CampaignIntroductionThe goals of the semantic search tool evaluation initiative are to support developers to improve their tools; compare their tools against their competitors and to generally improve the interoperability of semantic technologies. The short-term goal is to create a set of reference benchmark tests for assessing the strengths and weaknesses of the available tools and to compare them with each other. As such, these tests will focus on the performance of fundamental aspects of the tool in a strictly controlled environment / scenario rather than their ability to solve open-ended, real-life problems. CriteriaFor the first evaluation campaign semantic search tools will be evaluated according to a number of different criteria including query expressiveness, usability (effectiveness, efficiency, satisfaction) and scalability. Scalability will address a number of factors including the tool's ability to query a large repository in a reasonable time; the tool's ability to cope with differing ontology sizes; and the tool's ability to cope with a large amount of query results. Query expressiveness will investigate the means by which queries are formulated within the tool and the degree to which this facilitates (or even impedes) the user's question-answering goal. However, given the interactive nature of semantic search tools, a core interest in this evaluation is the usability of a particular tool. Two phase approachThe core functionality of a semantic search tool is to allow a user to discover one or more facts or documents by inputting some form of a query. The manner, in which this input occurs (e.g.: natural language, keywords, visual representation) is not of concern; however, the user experience of using the interface is of interest. Therefore, it is essential that the evaluation procedures emphasize the users' experience with each tool. In order to achieve this goal, the evaluation of each tool is split into two complementary phases: - the automated phase: this involves the use of the tool (in a fully automated environment provided by SEALS) in which the tool is exercised to assess the various aspects described in the evaluation scenarios (below).
- user-in-the-loop phase: this involves a series of experiments involving human subjects, who are given a number of tasks (questions) to complete using a particular tool operating on an particular ontology.
Hence, the two core implications of this are that the user-in-the-loop experiments will be run by each tool provider participating in the evaluation and that all materials required for the user-in-the-loop experiments will be provided by the SEALS consortium. Evaluation scenariosThe evaluation scenarios are listed below (SST = Semantic Search Tool). It is possible for tools to only participate in one scenario, although participation in both would be preferred. - SST Automated 2010.
This scenario addresses: - search performance: The tool's core search quality in terms of precision, recall, etc. - performance and scalability: The tool's ability to load, handle and perform queries on large data sets. - SST User 2010.
This scenario addresses: - usability: how do the end-users react to the tool's query language? Do they like the tool? Are they able to express their questions effectively and quickly? Is the language easy to understand and learn? How expressive is the tool's query language?
Evaluation datasets
Only OWL ontologies will be used as test data: in order to simplify the development of the benchmarks for the first evaluation campaign, it has been decided that search tools operating on purely OWL ontologies will be evaluated. The evaluation of tools operating over a wider set of resources, e.g., OWL ontologies and document repositories, will be considered for the second evaluation campaign in late 2011 / early 2012. For the first campaign, we have selected two datasets (one per evaluation phase). - EvoOnt is a set of software ontologies and will be used in the automated phase. Five different data set sizes (1k, 10k, 100k, 1M, 10M triples) and associated questions have been produced in order to measure the scalability and performance of a tool.
- The Mooney Natural Language Learning Data Set will be used in the user-in-the-loop phase. Specifically, we will use the geography subset. This data assumes no specialist knowledge of the field and is thus well suited to a usability study in which a diverse set of subjects will be used.
Evaluation materialsConnecting a semantic search tool with the SEALS Platform is easy and is achieved by creating a wrapper for your tool which implements a Java programming interface. This allows full communication between your tool and the SEALS Platform. Details regarding how to implement this connection can be found on this page.A detailed description of the evaluation scenarios and test data can be found in SEALS deliverable D13.1. All materials required for the user-in-the-loop experiments (software experiment controller, instructions, etc) will be provided by the SEALS consortium.
How do I get involved?Participation is open to developers interested in evaluating their tool or to anyone who wants to evaluate a certain tool. Participants are just expected to collaborate in the connection of their tool with the SEALS Platform, which will be the infrastructure that will run all the evaluations automatically, and the execution of the user study. Besides checking their results and comparing with others, once the tool is connected to the SEALS Platform participants will also be able to run the evaluations on their own with these and future test data. The first step is to join the SEALS Community. Once you have your community login, you will be able to register your tools for the evaluation campaign. Users with tools registered to the evaluation campaign will receive notifications as further details and services of the SEALS platform become available. Timeline and eventsThe evaluation campaign will take place during 2010. The concrete timeline of the whole evaluation campaign is the following (follow our RSS and twitter feeds for news on when these dates become more precise): May 2010 Registration opens July-August 2010 Evaluation materials and documentation are provided to participants July-August 2010 Participants upload their tools August-September 2010 Evaluation scenarios are executed September-October 2010 Evaluation results are analysed November 2010 Evaluation results are discussed at ISWC2010 workshop (workshop confirmed)
|