REST API - quhfus/DoSeR-Disambiguation GitHub Wiki
REST API
DoSeR-Disambiguation starts an Apache Tomcat Server which can be addressed via REST interface. In the following, we provide an example how to query the service.
JSON Example
Our service accepts JSON only so far. So query format for DoSeR-Disambiguation is held as simple as possible. As input we use the following two attributes:
- documentUri (optional): This can be null, the URL of a web document or any other String. Is not used in the following.
- surfaceFormsToDisambiguate: This is an array of one or multiple surface forms which should be disambiguated. Each surface form contains a set of sub attributes to describe the surface form more detailed.
A surface form has the following attributes:
- selectedText: The surface form label
- context: The context in which the surface form appears. Basically, the surface form should be located somewhere in the middle of the context. We also note that adding the entire document is not useful at all since we only use 1-2 sentences before and after the surface form.
- startPosition: The startposition of the surface form in the context (measured in chars). If the context starts with the surface form, the startPosition would be 0
In the following, we provide an example of two surface forms within a document:
{
"documentUri":"https://en.wikipedia.org/wiki/Washington,_D.C.",
"surfaceFormsToDisambiguate": [
{
"selectedText":"Washington, D.C.",
"context":"Washington, D.C., formally the District of Columbia and commonly referred to as Washington, the District, or simply D.C., is the capital of the United States. The signing of the Residence Act on July 16, 1790, approved the creation of a capital district located along the Potomac River on the country's East Coast. The U.S. Constitution provided for a federal district under the exclusive jurisdiction of the Congress and the District is therefore not a part of any U.S. state.",
"startPosition": 0
},
{
"selectedText":"United States",
"context":"Washington, D.C., formally the District of Columbia and commonly referred to as Washington, the District, or simply D.C., is the capital of the United States. The signing of the Residence Act on July 16, 1790, approved the creation of a capital district located along the Potomac River on the country's East Coast. The U.S. Constitution provided for a federal district under the exclusive jurisdiction of the Congress and the District is therefore not a part of any U.S. state.",
"startPosition": 144
}
]
}
We can send the json query to the REST interface with the following curl command:
curl -v -H "Accept: application/json" -H "Content-type: application/json" myserver/doser-dis-disambiguationserver/disambiguation/disambiguationWithoutCategories-collective -d @/home/foo/request.json
Our service returns the disambiguation result in json format:
{
"tasks":[
{
"disEntities":[
{
"entityUri":"http://dbpedia.org/resource/Washington,_D.C."
}],
"selectedText":"Washington, D.C."
},
{
"disEntities":[
{
"entityUri":"http://dbpedia.org/resource/United_States"
}],
"selectedText":"United States"
}],
"documentUri":"https://en.wikipedia.org/wiki/Washington,_D.C."
}
The output format return the following attributes:
- disEntities: An array of the disambiguated entities
- documentUri: The documentUri which is specified in the json query
Each disambiguated entity contains the attribute "entityUri" which defines the respective entity URI.
We will extend the REST API description in the near future.