Spec [WIP]: Extending the DataStore - ckan/ckan GitHub Wiki

Goal: Allow extensions to hook into the DataStore to tweak data ingestion and queries.

The main use case right now is allowing spatial queries with PostGIS

Preliminary proposal

This is very rough and bound to change!

  1. datastore_search action gets these data_dicts

     {
         "resource_id": "xxxx",
         "fields": ["field1", "field2", "geom"],
         "filters": {"field1": "x", "field2": 4, "bbox": [xmin, ymin, xmax, ymax]}
         "sort": ["field2 desc"]
     }
    
     {
         "resource_id": "xxxx",
         "fields": ["field1", "field2", "geom"],
         "filters": {"field1": "x", "field2": 4, "geom": "wkt_text or geojson"}
         "sort": ["field2 desc"]
     }
    
     {
         "resource_id": "xxxx",
         "fields": ["field1", "field2", "geom"],
         "filters": {"field1": "x", "field2": 4, "center": [lat,lon], "distance": x}
         "sort": ["distance", "field2 desc"]
     }
    
  2. db.search_data builds this query_dict object with the default supported options

     {
         'select': ['field1', 'field2'],
         'where': [
             ('field1', 'x',),
             ('field2', 4,),
         ],
         'order': ['field2 desc'],
     }
    
  3. Just before building the SQL query, we call any extensions passing the query_dict, as well as the original data_dict

     for plugin in PluginImplementations(IDataStore):
         query_dict = plugin.search_query(context, data_dict, query_dict)
    
  4. The spatial extension (or the NHM one) will check data_dict to see for any spatial params and modify query_dict accordingly:

     {
         'select': ['field1', 'field2', 'ST_AsGeoJSON(geom) as geom'],
         'where': [
             ('field1', 'x',),
             ('field2', 4,),
             ('ST_GeomFromText(wkt_text ,4326), 'wkt',)  # 2nd item (value) could be None
         ],
         'order': ['field2 desc'],
     }
    

    Q: Should we give access to extensions to 'rank_column', 'ts_query', 'from', 'limit' and 'offset'?

  5. db.search_data will use query_dict to create the final SQL

     SELECT field1, field2, ST_AsGeoJSON(geom) as geom
     FROM xxxx
     WHERE field1 = 'x' AND
         field2 = 4 AND
         ST_Intersects(geom, ST_GeomFromText(wkt_text ,4326))
     LIMIT 1000
     OFFSET 0
     ORDER BY field2 desc
    

Things that will need to change

  • Do not fail if you provide a filter that doesn't exist on the table, just ignore it.
  • Function that creates default query_dict from data_dict
  • Function that creates the SQL from the query_dict
  • ...

TODO

Extension points in create_indexes, create_table, ...