HTTP Interface - VincTheSecond/rextractor GitHub Wiki

RExtractor HTTP interface allows users to submit new documents, obtain document states and download exported data. We design HTTP interface especially for other applications which could run on other servers. If the application runs on the same server, consider to use BASH interface. Here is the list and format of available commands:

Server administration

Server start

This command is available only in BASH interface.

Server stop

This command is available only in BASH interface.

Server state

./?command=server-state
  • no GET and POST parameters required
  • returns a HTML document with component statuses. If everything is OK, server returns ā€œ[OK]ā€ on the first line and states of components on other lines. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=server-state
  • sample output:
[OK] 
Conversion server is ON.
NLP server is ON. 
Entity server is ON. 
Relation server is ON. 
Export server is ON.

Document handling

Document state

./?command=document-state
  • GET parameters:
    • doc_id -- document identifier
  • returns a HTML document with document status and submission time. If everything is OK, server returns ā€œ[OK]ā€ on the first line, state of the document on the second line and submission time on the third line. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=document-state&doc_id=test01
  • sample output:
[OK] 
720 Document exported successfully. 
2014-10-16 15:35:57 

Document submission

./?command=document-submit
  • POST parameters:
    • doc_id -- document identifier
    • doc_content -- document content
    • doc_strategy -- extraction strategy to be applied on the document
  • returns a HTML document with result of the submission. If everything is OK, server returns ā€œ[OK]ā€ on the first line. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=document-submit
doc_id: test01
doc_content: <p>Test sentence</p>
doc_strategy: strategy_01
  • sample output:
[OK]
Submitted correctly.

Document deletion

./?command=document-delete
  • GET parameters:
    • doc_id -- document identifier
  • returns a HTML document with result of the operation. If everything is OK, server returns ā€œ[OK]ā€ on the first line. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=document-delete&doc_id=test01
  • sample output:
[OK] 
Deleted.

HTML presentation of processed documents

Annotations in HTML format

./?command=content-html
  • GET parameters:
    • doc_id -- document identifier
  • returns a HTML document with original document with marked entities. Entities are marked with <span> HTML tags. An unique identifier for each text-chunk is given in the attribute id. All annotations have also CSS class chunk. If everything is OK, server returns [OK] on the first line and input HTML document on the other lines. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns [ERROR] on the first line and error message follows on the second line.
  • sample request:
./?command=content-html&doc_id=test01
  • sample output:
[OK] 
<p>Test <span id="1" class="chunk">sentence</span></p>

Chunk details

./?command=content-chunks
  • GET parameters:
    • doc_id -- document identifier
    • chunk_id -- text chunk identifier
  • returns details about specified text chunk. For each entity which uses given text chunk return these data fields separated by \t:
    • identifier of the entity
    • original form of the entity from DBE
    • type of the entity from DBE
  • If everything is OK, server returns [OK] on the first line and entities details on the other lines. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns [ERROR] on the first line and error message follows on the second line.
  • sample request:
./?command=content-chunks&doc_id=test01&chunk_id=1
  • sample output:
[OK] 
123   sentence   piece of text

List of relations

./?command=content-relations
  • GET parameters:
    • doc_id -- document identifier
  • returns details about relations detected in processed document. Relations are grouped by DBR query. Request returns HTML document. For each DBR query returns following lines:
    • <h4>Query ID</h4>
    • <i>Query description</i>
    • List of relations. Data fields separated by \t:
      • DBR Query ID
      • Entity ID for subject
      • Ontological concept for subject
      • Text (textual form of specified entity) for subject
      • Entity ID for predicate
      • Ontological concept for predicate
      • Text (textual form of specified entity) for predicate
      • Entity ID for object
      • Ontological concept for object
      • Text (textual form of specified entity) for object
  • If everything is OK, server returns [OK] on the first line and entities details on the other lines. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns [ERROR] on the first line and error message follows on the second line.
  • sample request:
./?command=content-relations&doc_id=test01
  • sample output:
[OK] 
<h4>Relation #05</h4>
<i>Definitions</i>
05	81	Defined entity	expense	561	hasDefinition	is	1297	Definition     an outflow of money

Export of processed documents

Export of annotated document

./?command=export-document
  • GET parameters:
    • doc_id -- document identifier
  • returns a HTML document with annotated entities. If everything is OK, server returns original HTML document with new text chunks annotated with tags ā€œā€. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=export-document&doc_id=test01

sample output:

<p>Test <annotation id=ā€1ā€>sentence</annotation></p>

Export of annotation description

./?command=export-description
  • GET parameters:
    • doc_id -- document identifier
  • returns a XML document with description of annotated entities and relations. If everything is OK, server returns XML document. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns a HTML document with text ā€œ[ERROR]ā€ on the first line and error message follows on the second line.
  • sample request:
./?command=export-description&doc_id=test01
  • sample output:
<?xml version="1.0" encoding="utf-8"?>
<document>
      <metadata/>
      <entities/>
      <relations/>
</document>

Lists of documents

List of documents

./?command=list-(submit|convert|nlp|entity|relation|export)
  • GET parameters (optional):
    • fromDate -- date in format YYYY-MM-DD
    • fromDateTime -- date and time in format YYYY-MM-DDTHH:MM:SS
    • toDate -- date in format YYYY-MM-DD
    • toDateTime -- date and time in format YYYY-MM-DDTHH:MM:SS
  • returns a HTML document with list of document with status specified in the command. If everything is OK, server returns HTML document with document ids. Each document id is printed on separate line. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns a HTML document with text ā€œ[ERROR]ā€ on the first line. An error message follows on the second line. one can use date and time parameters to filter just the document which were exported in specified time interval. Here is the list of available combination of datetime parameters:
    • fromDate -- returns documents exported from fromDate, 00:00:00 to the present
    • fromDate, toDate -- returns documents exported from fromDate, 00:00:00 to toDate, 23:59:59
    • fromDateTime -- returns documents exported from fromDateTime to the present
    • fromDateTime, toDateTime -- returns documents exported from fromDateTime to toDateTime
  • sample request:
./?command=list-export&fromDate=2014-09-01
  • sample output:
test01
test02

Details about documents

./?command=list-all
  • GET parameters (optional):
    • start -- specifies the offset of the first document to return
    • limit -- specifies the maximum number of documents to return
    • order by -- specifies the key attribute for the sorting. Available values:
      • ctime -- timestamp of the document submission
      • status -- state of the processing of the document
      • id -- document identifier
    • order dir -- specifies the direction of the ordering. Possible directions
  • returns a HTML document with list of documents ordered by given attribute and direction. Each document is described by these data fields:
    • document ID
    • submission time
    • document status
  • If everything is OK, server returns HTML document with text "[OK]" on the first line. Second line contains information about whole collection. Data fields are separated by tabulator character (\t) and express:
    • total number of submitted documents
    • start
    • limit
  • Other lines contain data about documents, as specified above. Each document is printed on separate line. Lines are separated by newline character (ā€˜\nā€™). In case of an error, server returns a HTML document with text ā€œ[ERROR]ā€ on the first line. An error message follows on the second line.
  • sample request:
./?command=list-all&start=0&limit=2
  • sample output:
[OK]
16	0	10
test01	2014-10-27 13:42:43	720 Document exported successfully.
test05	2014-10-22 14:56:43	720 Document exported successfully.
āš ļø **GitHub.com Fallback** āš ļø