HTTP Interface - VincTheSecond/rextractor GitHub Wiki
RExtractor HTTP interface allows users to submit new documents, obtain document states and download exported data. We design HTTP interface especially for other applications which could run on other servers. If the application runs on the same server, consider to use BASH interface. Here is the list and format of available commands:
This command is available only in BASH interface.
This command is available only in BASH interface.
./?command=server-state
- no GET and POST parameters required
- returns a HTML document with component statuses. If everything is OK, server returns ā[OK]ā on the first line and states of components on other lines. Lines are separated by newline character (ā\nā). In case of an error, server returns ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=server-state
- sample output:
[OK]
Conversion server is ON.
NLP server is ON.
Entity server is ON.
Relation server is ON.
Export server is ON.
./?command=document-state
- GET parameters:
- doc_id -- document identifier
- returns a HTML document with document status and submission time. If everything is OK, server returns ā[OK]ā on the first line, state of the document on the second line and submission time on the third line. Lines are separated by newline character (ā\nā). In case of an error, server returns ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=document-state&doc_id=test01
- sample output:
[OK]
720 Document exported successfully.
2014-10-16 15:35:57
./?command=document-submit
- POST parameters:
- doc_id -- document identifier
- doc_content -- document content
- doc_strategy -- extraction strategy to be applied on the document
- returns a HTML document with result of the submission. If everything is OK, server returns ā[OK]ā on the first line. Lines are separated by newline character (ā\nā). In case of an error, server returns ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=document-submit
doc_id: test01
doc_content: <p>Test sentence</p>
doc_strategy: strategy_01
- sample output:
[OK]
Submitted correctly.
./?command=document-delete
- GET parameters:
- doc_id -- document identifier
- returns a HTML document with result of the operation. If everything is OK, server returns ā[OK]ā on the first line. Lines are separated by newline character (ā\nā). In case of an error, server returns ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=document-delete&doc_id=test01
- sample output:
[OK]
Deleted.
./?command=content-html
- GET parameters:
- doc_id -- document identifier
- returns a HTML document with original document with marked entities. Entities are marked with
<span>
HTML tags. An unique identifier for each text-chunk is given in the attributeid
. All annotations have also CSS classchunk
. If everything is OK, server returns[OK]
on the first line and input HTML document on the other lines. Lines are separated by newline character (ā\nā). In case of an error, server returns[ERROR]
on the first line and error message follows on the second line. - sample request:
./?command=content-html&doc_id=test01
- sample output:
[OK]
<p>Test <span id="1" class="chunk">sentence</span></p>
./?command=content-chunks
- GET parameters:
- doc_id -- document identifier
- chunk_id -- text chunk identifier
- returns details about specified text chunk. For each entity which uses given text chunk return these data fields separated by
\t
:- identifier of the entity
- original form of the entity from DBE
- type of the entity from DBE
- If everything is OK, server returns
[OK]
on the first line and entities details on the other lines. Lines are separated by newline character (ā\nā). In case of an error, server returns[ERROR]
on the first line and error message follows on the second line. - sample request:
./?command=content-chunks&doc_id=test01&chunk_id=1
- sample output:
[OK]
123 sentence piece of text
./?command=content-relations
- GET parameters:
- doc_id -- document identifier
- returns details about relations detected in processed document. Relations are grouped by DBR query. Request returns HTML document. For each DBR query returns following lines:
<h4>Query ID</h4>
<i>Query description</i>
- List of relations. Data fields separated by
\t
:- DBR Query ID
- Entity ID for subject
- Ontological concept for subject
- Text (textual form of specified entity) for subject
- Entity ID for predicate
- Ontological concept for predicate
- Text (textual form of specified entity) for predicate
- Entity ID for object
- Ontological concept for object
- Text (textual form of specified entity) for object
- If everything is OK, server returns
[OK]
on the first line and entities details on the other lines. Lines are separated by newline character (ā\nā). In case of an error, server returns[ERROR]
on the first line and error message follows on the second line. - sample request:
./?command=content-relations&doc_id=test01
- sample output:
[OK]
<h4>Relation #05</h4>
<i>Definitions</i>
05 81 Defined entity expense 561 hasDefinition is 1297 Definition an outflow of money
./?command=export-document
- GET parameters:
- doc_id -- document identifier
- returns a HTML document with annotated entities. If everything is OK, server returns original HTML document with new text chunks annotated with tags āā. Lines are separated by newline character (ā\nā). In case of an error, server returns ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=export-document&doc_id=test01
sample output:
<p>Test <annotation id=ā1ā>sentence</annotation></p>
./?command=export-description
- GET parameters:
- doc_id -- document identifier
- returns a XML document with description of annotated entities and relations. If everything is OK, server returns XML document. Lines are separated by newline character (ā\nā). In case of an error, server returns a HTML document with text ā[ERROR]ā on the first line and error message follows on the second line.
- sample request:
./?command=export-description&doc_id=test01
- sample output:
<?xml version="1.0" encoding="utf-8"?>
<document>
<metadata/>
<entities/>
<relations/>
</document>
./?command=list-(submit|convert|nlp|entity|relation|export)
- GET parameters (optional):
- fromDate -- date in format YYYY-MM-DD
- fromDateTime -- date and time in format YYYY-MM-DDTHH:MM:SS
- toDate -- date in format YYYY-MM-DD
- toDateTime -- date and time in format YYYY-MM-DDTHH:MM:SS
- returns a HTML document with list of document with status specified in the command. If everything is OK, server returns HTML document with document ids. Each document id is printed on separate line. Lines are separated by newline character (ā\nā). In case of an error, server returns a HTML document with text ā[ERROR]ā on the first line. An error message follows on the second line.
one can use date and time parameters to filter just the document which were exported in specified time interval. Here is the list of available combination of datetime parameters:
- fromDate -- returns documents exported from fromDate, 00:00:00 to the present
- fromDate, toDate -- returns documents exported from fromDate, 00:00:00 to toDate, 23:59:59
- fromDateTime -- returns documents exported from fromDateTime to the present
- fromDateTime, toDateTime -- returns documents exported from fromDateTime to toDateTime
- sample request:
./?command=list-export&fromDate=2014-09-01
- sample output:
test01
test02
./?command=list-all
- GET parameters (optional):
- start -- specifies the offset of the first document to return
- limit -- specifies the maximum number of documents to return
- order by -- specifies the key attribute for the sorting. Available values:
- ctime -- timestamp of the document submission
- status -- state of the processing of the document
- id -- document identifier
- order dir -- specifies the direction of the ordering. Possible directions
- returns a HTML document with list of documents ordered by given attribute and direction. Each document is described by these data fields:
- document ID
- submission time
- document status
- If everything is OK, server returns HTML document with text "[OK]" on the first line. Second line contains information about whole collection. Data fields are separated by tabulator character (
\t
) and express:- total number of submitted documents
- start
- limit
- Other lines contain data about documents, as specified above. Each document is printed on separate line. Lines are separated by newline character (ā\nā). In case of an error, server returns a HTML document with text ā[ERROR]ā on the first line. An error message follows on the second line.
- sample request:
./?command=list-all&start=0&limit=2
- sample output:
[OK]
16 0 10
test01 2014-10-27 13:42:43 720 Document exported successfully.
test05 2014-10-22 14:56:43 720 Document exported successfully.