BASH Interface - VincTheSecond/rextractor GitHub Wiki

RExtractor BASH interface offers several easy commands for working with the server and documents. Structure of the about is the same for all commands. If command has been done without errors, application prints "[OK]" on the first line. Then it prints requested data. In the case of an error, application prints "[ERROR]" on the first line and error description on the second line. Application exits with return value 0 if everything has been done OK and with value 1 if an error occured.

Here is the list of available commands:

Server administration

Server start

./rextractor server-start
  • starts a daemon for each component, then it prints server status. It returns 0 if all daemons are running, otherwise 1.

Server stop

./rextractor server-stop
  • kills all daemons, than prints server status. It returns 0 if all daemons are running, otherwise 1.

Server status

./rextractor server-state
  • this prints server status. It returns 0 if all daemons are running, otherwise 1.

Document handling

Document state

./rextractor document-state DOCUMENT_ID
  • return current state of the document. See [list of states](List of states).

Document submission

./rextractor document-submit FILE
  • submit document into system. Unique ID of the document is created (filename without the suffix). Fails if ID already exists.

Document deletion

./rextractor document-delete DOCUMENT_ID
  • delete all data about given document. Fails if document is processing by any of component.

HTML presentation of processed documents

Annotations in HTML format

./rextractor content-html DOCUMENT_ID
  • prints on STDOUT an original document (HTML document) with marked entities. Entities are marked with HTML tags. An unique identifier for each text-chunk is given in the attribute id. All annotations have also CSS class "chunk".

Chunk details

./rextractor content-chunks DOCUMENT_ID CHUNK_ID
  • returns details about specified text chunk. For each entity which uses given text chunk return these data fields separated by \t:
    • identifier of the entity
    • original form of the entity from DBE
    • type of the entity from DBE

List of relations

./rextractor content-relations DOCUMENT_ID
  • returns details about relations detected in processed document. Relations are grouped by DBR query. Request returns HTML document. For each DBR query returns following lines:
    • Query ID

    • Query description
    • List of relations. Data fields separated by \t:
      • DBR Query ID
      • Entity ID for subject
      • Ontological concept for subject
      • Text (textual form of specified entity) for subject
      • Entity ID for predicate
      • Ontological concept for predicate
      • Text (textual form of specified entity) for predicate
      • Entity ID for object
      • Ontological concept for object
      • Text (textual form of specified entity) for object

Export of processed documents

Export of annotated document

This command is available only in HTTP interface. One can access to annotated document directly through the file system. See [description of directories](File structure) used in RExtractor installation.

Export of annotation description

This command is available only in HTTP interface. One can access to annotated document directly through the file system. See [description of directories](File structure) used in RExtractor installation.

List of documents

List of documents

./rextractor list-(submit|convert|nlp|entity|relation|export) FROM_DATE|NO TO_DATE|NO FROM_DATE_TIME|NO TO_DATE_TIME|NO
  • prints list of documents with specified status
  • FROM_DATE and TO_DATE must be in format YYYY-MM-DD. FROM_DATE_TIME and TO_DATE_TIME must be in format YYYY-MM-DDTHH:MM:SS. One can avoid of specifying time constraints with special value "NO" which is valid value for all time parameters.
  • Here is the list of available combination of datetime parameters:
    • FROM_DATE -- returns documents exported from fromDate, 00:00:00 to the present
    • FROM_DATE, TO_DATE -- returns documents exported from fromDate, 00:00:00 to toDate, 23:59:59
    • FROM_DATE_TIME -- returns documents exported from fromDateTime to the present
    • FROM_DATE_TIME, TO_DATE_TIME -- returns documents exported from fromDateTime to toDateTime

Details about documents

./rextractor list-all START LIMIT
  • returns a list of documents submitted into collection. Each document is described by these data fields (separated by \t):
    • document ID
    • submission time
    • document status
  • Parameter START specifies the offset of the first document to return. LIMIT specifies the maximum number of documents to return.
  • Firstly, output contains information about whole collection on. Data fields are separated by tabulator character (\t) and express:
    • total number of submitted documents
    • start
    • limit
  • Other lines contain data about documents, as specified above. Each document is printed on separate line.
⚠️ **GitHub.com Fallback** ⚠️