HTTP API - czcorpus/kontext Wiki

KonText HTTP API

Table of contents

Creating a concordance query

While it is possible to use both simple and advanced query types it is strongly advised to use the advanced variant when dealing with the API as the query encoding is much easier in such case.

  • URL: /query_submit?format=json
  • HTTP Method: POST
  • content type: application/json

Request body:

{
  "type": "concQueryArgs",
  "maincorp": "syn2020",
  "usesubcorp": null,
  "viewmode": "kwic",
  "pagesize": 40,
  "attrs": ["word","tag"],
  "attr_vmode": "visible-kwic",
  "base_viewattr": "word",
  "ctxattrs": [],
  "structs": ["text","p","g"],
  "refs": [],
  "fromp": 0,
  "shuffle": 0,
  "queries": [
    {
      "qtype": "advanced",
      "corpname": "syn2020",
      "query": "[word=\"celou\"] [lemma=\"pravda\"]",
      "pcq_pos_neg": "pos",
      "include_empty": false,
      "default_attr":"word"
    }
  ],
  "text_types": {},
  "context":
  {
    "fc_lemword_wsize": [-5, 5],
    "fc_lemword": "",
    "fc_lemword_type": "all",
    "fc_pos_wsize": [-5, 5],
    "fc_pos": [],
    "fc_pos_type": "all"
  },
  "async": false
}

Parameters

name description
type this is always a constant concQueryArgs
usesubcorp null or name of user's subcorpus
viewmode kwic|sen|align (align works only for parallel corpora)
pagesize a positive number specifying size of the resulting page
attrs a list of positional attributes we want to retrieve
attr_vmode visible-all|visible-kwic|visible-multiline|mouseover
base_viewattr the main attribute the flow of text will be based on
ctxattrs TODO
structs a list (possibly empty) of structural attributes to be shown
refs a list (possibly empty) of additional metadata attached to each row
fromp a number specifying a starting page
shuffle `0
queries a list of objects, each for active corpus (normally 1 item, for aligned corpora > 1)
queries[].qtype advanced
queries[].corpname "syn2020",
queries[].query A CQL query (e.g. [word=\"their\"] [lemma=\"truth\"])
queries[].pcq_pos_neg applies for aligned corpora queries
queries[].include_empty true|false
queries[].default_attr a positional attribute applied for simplied CQL expressions

Response:

  • HTTP status: 201 Created (if without errors)
  • content type: application/json
{
  "size": 110,
  "finished": true,
  "conc_args": {
    "maincorp": "syn2020",
    "viewmode": "kwic",
    "pagesize": 40,
    "attrs": "word,tag",
    "attr_vmode": "visible-kwic",
    "base_viewattr": "word",
    "structs": "text,p,g"
  },
  "query_overview": {
  },
  "Q": [ "~gUgICee6K2ka" ],
  "conc_persistence_op_id": "gUgICee6K2ka"
}

Parameters

name description
size size of the resulting concordance (in tokens)
finished if async is set to true then this is always true
conc_persistence_op_id a public ID of the resulting concordance
conc_args additional parameters affecting how the concordance is displayed

Displaying a concordance

  • URL: /view
  • HTTP Method: GET

Parameters (in URL)

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
format for API use, json is required (without it, an HTML page is returned

Response

(only a subset of the most important entries is shown below)

{
  "kwiclen": 2,
  "Lines": [
    {
      "Left": [],  // see the following section for the description
      "Kwic": [],  // ditto
      "Right": []  // ditto
    },
    {
      "Left": ["..."],
      "Kwic": ["..."],
      "Right": ["..."]
    }
  ],
  "conc_persistence_op_id": "RSiw4GIgW08s",
  "concsize": 115,
  "result_arf": 51.31,
  "result_relative_freq": 0.94
}

The format of Left, Kwic, Right entries is as follows:

[
  {
    "str": "setměním", 
    "class": "", 
    "tail_posattrs": ["setmění", "NNNS7-----A----"]
  }
  // other items/positions
]
attribute description
str value of the token (or structure - e.g. <p>)
class type of the value - empty string (normal token), col0 coll (for KWIC), strc (structure)
tail_posattrs additional positional attributes for the position (e.g. tag, lemma,...) - based on attrs and attr_vmode

Frequency distribution for text types

  • URL: /freqtt
  • HTTP Method: GET

Parameters (in URL)

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
fttattr dot-separated structure and structural attribute (e.g. doc.first_published)
ftt_include_empty 0,1 - if 1 then also values with no occurrences will be returned
flimit 0,1,...,N - a minimum absolute frequency of a searched phenomenon
format json (otherwise, an HTML page is returned

Response:

  • HTTP status: 200 OK (if without errors)
  • content type: application/json

Important entries:

name (path) description
Blocks Frequency results (array)
Blocks[i] Frequency results entry
Blocks[i].Head Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm
Blocks[i].Items Array of individual lines
Blocks[i].Items[i].Word[0].n Value of a respective structural attribute
Blocks[i].Items[i].freq Absolute frequency
Blocks[i].Items[i].rel Instances per million (ipm)

Frequency distribution for positional attributes

🚧

Two-dimensional frequency distribution

  • URL: /freqct
  • HTTP Method: GET

Parameters (in URL)

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
ctattr1 attribute applied for the 1st dimension (both positional and structural attributes are supported)
ctattr2 the same as ctattr1 but for the 2nd dimension
ctfcrit1 the 1st dimension criterion
ctfcrit2 the 2nd dimension criterion
ctminfreq a minimum frequency of included entries; the units are defined by the ctminfreq_type parameter
ctminfreq_type abs - absolute freq., pabs - percentile of abs. freq., ipm - instances per million, pipm - percentile of ipm

Response:

  • HTTP status: 200 OK (if without errors)
  • content type: application/json
name (path) description
freq_type "2-attribute" (this is mostly used by the client application)
attr1 matches the ctattr1 given in the request
attr2 matches the ctattr2 given in the request
data.data[i][0] matching 1st dimension value
data.data[i][1] matching 2nd dimension value
data.data[i][2] absolute frequency
data.data[i][3] base set size for i.p.m. calculation *️⃣

*️⃣ More information about base set size:

  1. in case of a relationship between two structural attributes, the value is always 1000000 which should be interpreted as "not applicable"
  2. in case of two positional attributes, the base set size equals the size of a respective concordance
  3. in case of one positional and one structural attribute, the base set size is a number of tokens in a subcorpus specified by a respective structural attribute value (i.e. not affected by a respective concordance)

Example (you must be logged-in to KonText):

https://www.korpus.cz/kontext/freqct?q=~vMSCwEgqqSOu&ctfcrit1=0<0&ctfcrit2=0&ctattr1=lemma_lc&ctattr2=doc.txtype_group&ctminfreq=80&ctminfreq_type=pabs&format=json