API Creating a concordance - czcorpus/kontext GitHub Wiki

HTTP API / Creating a concordance

While it is possible to use both simple and advanced query types, it is strongly advised to use the advanced query variant when dealing with the API as the query is much easier to encode. The simple variant is meant to be simple for web interface users which is paid off by query's complex internal structure and evaluation.

Request

  • URL: /query_submit?format=json
  • HTTP Method: POST
  • content type: application/json

Body (all properties)

{
  "type": "concQueryArgs",
  "maincorp": "syn2020",
  "usesubcorp": null,
  "viewmode": "kwic",
  "pagesize": 40,
  "attrs": ["word","tag"],
  "attr_vmode": "visible-kwic",
  "base_viewattr": "word",
  "ctxattrs": [],
  "structs": ["text","p","g"],
  "refs": ["%3Ddoc.author"],
  "fromp": 0,
  "shuffle": 0,
  "queries": [
    {
      "qtype": "advanced",
      "corpname": "syn2020",
      "query": "[word=\"celou\"] [lemma=\"pravda\"]",
      "pcq_pos_neg": "pos",
      "include_empty": false,
      "default_attr":"word"
    }
  ],
  "text_types": {
    "doc.txtype_group": ["FIC: beletrie", "NMG: publicistika"]
  },
  "context":
  {
    "fc_lemword_wsize": [-5, 5],
    "fc_lemword": "",
    "fc_lemword_type": "all",
    "fc_pos_wsize": [-5, 5],
    "fc_pos": [],
    "fc_pos_type": "all"
  },
  "async": false
}

Body (minimal version)

{
  "type": "concQueryArgs", 
  "queries": [
    {
      "qtype": "advanced",
      "corpname":"syn2020",
      "query": "[word=\"celou\"] [lemma=\"pravda\"]"
    }
  ]
}

Request body arguments

(For more technical insight, see interface ConcQueryArgs in public/files/js/models/query/common.ts)

name required / default type description
type ✳️ concQueryArgs this is always a constant identifying the form type
usesubcorp null string|null null or name of user's subcorpus
viewmode kwic kwic|sen|align|null (align applies only for parallel corpora)
pagesize 40 number|null a positive number specifying size of the resulting page
attrs word Array<string>|null a list of KWIC's positional attributes to be retrieved
ctxattrs word Array<string>|null a list of non-KWIC positional attributes to be retrieved
attr_vmode visible-kwic visible-all|
visible-kwic|
visible-multiline|
mouseover|
null
this is useful mostly for GUI clients
base_viewattr word string|null the main attribute the flow of text will be based on
structs null Array<string>|null A list (possibly empty) of structural attributes to be shown.
refs null Array<string>|null A list (possibly empty) of additional metadata attached to each row. Please note for historical reasons, the values must have the = prefix, which is encoded in URLs as %3D. So e.g. adding doc.author requires you to write %3Ddoc.author (see the example JSON body above)
fromp 0 number|null a number specifying a starting page
shuffle 0 0|1 if 1 the the lines will be shuffled (this negatively affects performance)
queries ✳️ Array<{}>|null a list of objects, each for active corpus (normally 1 item, for aligned corpora > 1)
queries[].qtype ✳️ simple|advanced advanced (strongly advised, see introduction of the section)
queries[].corpname ✳️ string a corpus identifier
queries[].query ✳️ string A JSON-encoded CQL query (e.g. [word=\"their\"] [lemma=\"truth\"])
queries[].pcq_pos_neg pos pos|neg applies for aligned corpora queries
queries[].include_empty false true|false
queries[].default_attr word string|null a positional attribute applied for simplied CQL expressions (e.g. with default attribute word one can write "foo" instead of [word="foo"])
text_types {} {}|null an object to restrict search to specified structural attributes and their respective values

Response

  • HTTP status: 201 Created (if without errors)
  • content type: application/json
{
  "size": 110,
  "finished": true,
  "conc_args": {
    "maincorp": "syn2020",
    "viewmode": "kwic",
    "pagesize": 40,
    "attrs": "word,tag",
    "attr_vmode": "visible-kwic",
    "base_viewattr": "word",
    "structs": "text,p,g"
  },
  "query_overview": {
  },
  "Q": [ "~gUgICee6K2ka" ],
  "conc_persistence_op_id": "gUgICee6K2ka"
}

Important entries

name description
size size of the resulting concordance (in tokens)
finished if querie's async is set to true (which is recommended for API clients) then this is always true
conc_persistence_op_id a public ID of the resulting concordance
conc_args additional parameters affecting how the concordance is displayed
⚠️ **GitHub.com Fallback** ⚠️