Code development workflow - geopython/pygeoapi GitHub Wiki
NOT SO FAST !!! First scroll and see if there is something unknown to you and then check the TL;RD at the end...
Code workflow is the sequential steps and procedures necessary for code development, testing and implementation.
pygeoapi contributions guidelines and instruction on how to submit tickets are found on this page CONTRIBUTING.md.
Developers work on their own (project) forks, this is a personal sandbox where code is developed and implemented by its author. With time, code on main project and fork will start to divert, since code from other branches and forks gets merged into the project. It is rather important for code from the project to be constantly synced into the fork and working branch
Check the github tutorial on how fork and sync: fork-a-repo
pygeoapi-master ---FORK---> pygeoapi-user001-master
Github issues should be related to bugs, new feature requests, blue sky research etc. For bug reporting please follow the guideline what to put in your bug report
code development should be oriented in such a way that it solves (or deals with) one issue only. Issues tend to be associated with branches, and code commits go into that specific branch. This also facilitates the Pull Request reviewing process.
pygeoapi-master ---FORK---> pygeoapi-user001-master
\----------- pygeoapi-user001-issue4456
Don't forget to sync/merge the main pygeoapi-master into your fork's master and merge (or rebase) master into branch version-control-branching. This requires that you first configure a remote for a fork, indicating the upstream location of the main code:
git remote add upstream https://github.com/geopython/pygeoapi.git
A good programmer is the one that writes clear and easy to understand code based on well established guidelines, not the one that writes smart code.
pygeoapi follows the PEP 8 β the Style Guide for Python Code and Python naming conventions, in a nutshell:
-
snake_case
for variables - lower case for
modules
andpackages
- upper case for
CONSTANTS
-
UpperCaseCamel
for classes - CAPITALS for constants
- Methods can also be protected with
_
or private__
- Variable name collision is avoid by adding an extra
_
e.g Usecsv_ instead of
csv` - English words only, with proper description of functionality and/or content.
- Follow OGC standard names (See: 4.1 pygeoapi API)
PEP8 style convention helps on readability, but code should also be understandable. This can be achieved by simple English variables, good comments, and consistency.
Hoe to write code that everyone can read
Source: https://xkcd.com/1513/ and Geo-python
Documentation is what makes or breaks a project, thou shall not say: "The code is already explanatory". If you wrote readable code it is already explanatory BUT you still have to indicate what it does, how it can made run and more important the inputs/outputs as python is a loosely type language.** Any pull request without proper code docstring is automatically rejected**
pygeoapi uses python docstring and reStructuredText
A good introduction to pydocs can be found in the links below:
Every single method/function/class should be documented using docstrings following reStructuredText (reST) syntax, for example:
class RasterioProvider(BaseProvider):
"""Rasterio Provider"""
def __init__(self, provider_def):
"""
Initialize object
:param provider_def: provider definition
:returns: pygeoapi.provider.rasterio_.RasterioProvider
"""
Python packages should also have basic description on their __init__.py
file e.g:
"""OGC process package, each process is an independent module"""
Note: Type hints are not yet supported
Read the docs is a very popular documentation platform and used for pygeoapi documentation. If pydocs are properly written, read the docs (RTD) will automatically build the content, this step should be done before a pull request.
RDT code is located on folder docs, with folder organization and file content defined as a python generic RDT project, before you proceed please read "How to set up your python projec docs for success (https://towardsdatascience.com/how-to-set-up-your-python-project-docs-for-success-aab613f79626)", to have an idea how things work.
pygeoapi RDT content is on folder pygeoapi/docs/source
. *.rst
files are the sources where documentation should be written/updated
Table of Contents (TOC), is defined on [index.rst](https://raw.githubusercontent.com/geopython/pygeoapi/master/docs/source/index.rst)
.. _index:
.. image:: /_static/pygeoapi-logo.png
:scale: 50%
:alt: pygeoapi logo
pygeoapi |release| documentation
==================================
:Author: the pygeoapi team
:Contact: pygeoapi at lists.osgeo.org
:Release: |release|
:Date: |today|
.. toctree::
:maxdepth: 4
:caption: Table of Contents
:name: toc
introduction
how-pygeoapi-works
The TOC names will the point to the individual *rst
Remember pydocs and code comments ?! RTD will automatically pull the content from pygeoapi code and build the API documentation, **it is important that new packages and modules are added to openapi.rst
**. For example a package, module/class and then class would have the following syntax
Provider
--------
.. automodule:: pygeoapi.provider
:show-inheritance:
:members:
:private-members:
:special-members:
Base class
^^^^^^^^^^
.. automodule:: pygeoapi.provider.base
:show-inheritance:
:members:
:private-members:
:special-members:
CSV provider
^^^^^^^^^^^^
.. automodule:: pygeoapi.provider.csv_
:show-inheritance:
:members:
:private-members:
On folder pygeoapi/docs
:
#make help
make html
::
Running Sphinx v3.0.1
loading pickled environment... done
::
The HTML pages are in build/html
and documentation in available on build/html
as read the docs that can viewed on a browser: firefox build/html/index.html
Github has very good support for RDT and you can use even use it on your personal repository on the issue that you are working on.
First, you need to create an account on read the docs Sign up
. You can (and should) use authentication using your github account, the following steps assume that you used your github account.
Second, connect your read the docs account to github on admin > control panel
, Connected services > Connect to Github
This will allow you to choose the repository and branch from where RDT will import the documents and build them.
If the process was successful it should not be necessary to preconfigure the webhooks.
Third, on the dashboard (Profile drop down > My projects
) click Import a project
And refresh for sync between RTD and Github. You should be able see your private pygeoapi project (<username/pygeoapi>
), just add it
As default RTD will build docs from master, it is expected for you to work on your fork in a specific branch (see: Issues and branches), therefore RTD should be set to use the branch.
On the project details, give a name related to the issue that you are working on e.g pygeoapi-532
(this will then part of the public URL), and tick Edit advanced project options
On the advance options, type the name of working/default branch, and select Python
as programming language
Finally, on the project page click on Build project
and enjoy the automation project, in a few minutes you will have your documenation online :), for this example it will be something like: https://pygeoapi-532.readthedocs.io
Note: Every time you push to the default branch RTD will update the online documentation.
pygeoapi code uses or implements:
- an API first approach that is wrapped by a web framework (Flask or Starlette),
- Object oriented template pattern
- Plugins
- EAFP (itβs easier to ask for forgiveness than permission)
- Prefer DRY (Don't Repeat Yourself) but when necessary WET (Write Everything Twice)
The API structure is defined on pygeoapi/apy.py module and class API
, this is the projects's core. The method naming in class API
is no coincidence, it follow OGC API names and definitions, for example, in OGC Features we have an endpoint defined as:
GET /collections
This REST end point describes the collections available, the associated method is:
def describe_collections(self, headers_, format_, dataset=None):
<VERB>_<OBJECT>
is the standard terminology.
Web-frameworks libraries are responsible for:
- HTTP requests/responses
- URL routing
- Configuration loading
REST end points defined by the OGC standards (see here for example) are supported by the web-framework, with its communities approaches, philosophies and perks.
Currently the are two web-frameworks supported
- Flask code (pygeoapi/flask_app.py)
- Starlette code (pygeoapi/starlette_app.py)
pygeoapi project tends to use Flask as the default web-framework. As guideline, the function name convention should be identical (or very close) to the HTTP request route e.g:
@BLUEPRINT.route('/openapi')
def openapi():
"""
OpenAPI endpoint
:returns: HTTP response
"""
with open(os.environ.get('PYGEOAPI_OPENAPI'), encoding='utf8') as ff:
openapi = yaml_load(ff)
pygeoapi code is object oriented (classes), and implements a template method pattern Wikipedia: template method pattern. Template method pattern is normally used on code base that implement multiple components that have an overlap functionality, behavior or properties.
The provider package contains the following modules:
.
βββ __init__.py
βββ base.py
βββ elasticsearch_.py
βββ geojson.py
:
base.py
module contains a parent classes that will be used on the specific data provider modules (e.g geojson.py).
#base.py
class BaseProvider:
"""generic Provider ABC"""
def __init__(self, provider_def):
:
def get_fields(self):
raise NotImplementedError()
def write(self, options={}, data=None):
raise NotImplementedError()
class BaseProvider
is the template that creates the specific classes for each different data provider, this template contains all methods necessary.
You can see the base class being extended on module geojson.py
from pygeoapi.provider.base import BaseProvider
class GeoJSONProvider(BaseProvider):
"""Provider class backed by local GeoJSON files
:
def get_fields(self):
if os.path.exists(self.data):
with open(self.data) as src:
data = json.loads(src.read())
fields = {}
for f in data['features'][0]['properties'].keys():
fields[f] = 'string'
return fields
Checking class GeoJSONProvider
there isn't a write method, if pygeoapi tries to call method write it will end up in the base class and triggering a raise NotImplementedError()
that will be properly addressed by pygeoapi API.
This is the pygeoapi code approach, base classes defining precisely what it is expected and avoiding duplication.
Doubts!? Check these links:
Currently pygeoapi supports the following plugins:
- provider (data provider loading)
- formatter (export formats loading)
- process (available processes)
plugin functionality is called in the api.py
and only the necessary plugins will be loaded on bases of the configuration yaml file, for example :
#api.py
from pygeoapi.plugin import load_plugin
p = load_plugin('provider', get_provider_by_type(
collections[k]['providers'], 'feature'))
And then up for the code on api.py
to implement it.
Plugin code location is on module plugin.py
(of course), and it is basically a class loader for other modules.
#: formatters and processes available
PLUGINS = {
'provider': {
'CSV': 'pygeoapi.provider.csv_.CSVProvider',
'Elasticsearch': 'pygeoapi.provider.elasticsearch_.ElasticsearchProvider', # noqa
'GeoJSON': 'pygeoapi.provider.geojson.GeoJSONProvider',
'OGR': 'pygeoapi.provider.ogr.OGRProvider',
:
:
},
'formatter': {
'CSV': 'pygeoapi.formatter.csv_.CSVFormatter'
}
}
The plugins structure is PLUGINS->(MODULE)->(TYPE=>CODE_LOCATION)
Read the pygeoapi docs on plugins everything for full detail explanation
Python language is oriented to EAFP (it's Easier to Ask for Forgi1veness that Permission) instead of LBLYL (Look Before You Jump), this basically drills down to use of exceptions on code.
EAFP states that you should try something and if it fails to deal with the error:
#pygeoapi.provider.rasterio_.RasterioProvider
import rasterio
from pygeoapi.provider.base import ProviderConnectionError
try:
self._data = rasterio.open(self.data)
:
except Exception as err:
LOGGER.warning(err)
raise ProviderConnectionError(err)
In the above example the code tries to open the data source and if it this raises an error it will catch the exception, logs it and re-raise the exception as a ProviderConnectionError
, there was a problem and we asked forgiveness on the exception code section.
Using a LBLYL approach the could would be:
import os
if os.access(self.data, os.R_OK):
self._data = rasterio.open(self.data)
else:
LOGGER.warning(err)
raise ProviderConnectionError(err)
Python's benevolent dictator for life disagrees with the motivation (you can read it here), but Explicit is better than implicit and Don't Repeat Yourself (DRY)
For more info on EAFP versus LBLYL:
Object oriented template pattern (class abstraction), plugins and EAFP are used to prevent repetition of code implementation, aka Don't Repeat Yourself. Orthogonality (in computer science definition) we have WET (Write Everything Twice) that promotes code duplication when ever necessary or efficient
pygeoapi is required to be packed and implemented on systems that have dependency/package distribution limitations this also forces WET implementation on code base. For example the code below is a good example of WET that is being implemented, since we are only reliable on the datatime library and not taking advantage of pendulum
#api.py
if te['begin'] is not None and datetime_begin != '..':
if datetime_begin < te['begin']:
datetime_invalid = True
At the end of the day, DRY and WET should be implemented side by side and are not complete opposites...but let keep the code dry and lean.
More on WET/DRY reasoning:
pygeoapi project implements test driven development (TDD), on the development workflow try to first write a test unit that fails, write the pygeoapi code and keep on going until the test unit passes. A very detailed TDD workflow development can be found here,and extra advantages
code testing is done local on the user computer and then later also on the CI/CD pipeline. Please take it seriously, you will be surprised where code can break.....
pygeopapi uses pytest for unit testing based on the pygeoapi testing documentation
Tests are on folder /tests
and each python module (*.py
) bundles several tests based on global functionality or system, root folder contains the pytest.ini
that env variables.
New code should have new unit tests and pytest should be run locally to determine that things are OK, for example:
python -m pytest tests/test_api.py
test code is grouped into modules with the name convention: test_<SYSTEM>_*
, with dummy or specific config files on /test
. Supporting the test code we have multiple datasets with share-friendly licences on subfolder test/data
. If your test require extra data please add it but always small datasets.
Having properly pytest is the first step to determine if developed code can properly integrate pygeoapi and accepted in a pull request
Flake8 is a code style checker, it will check for several of PEP8 requirements. You can do it file by file or just in bulk:
find . -type f -name "*.py" | xargs flake8
All code for PR has to be clean. Exceptionally inline ignoring errors can be added to the code See inline ignoring errors
pygeoapi implements travis as CI/CD, as good practice it is recommended that your working branch uses travis to build pygeoapi on every push you do, and then later on the pull request. Building of pygeoapi is a second level test/integration after local code testing. To start running travis:
- Sign in or up on https://travis-ci.org/ with your github account. Give permission to access github
- Click on your
profiles picture > settings
- Sync your repositories on
Left panel > sync account
, You should see all your personal github repositories - Activate
pygeoapi
, this will send you to build dashboard. - Everyime you push to your working branch and/or master Travis will build pygeoapi and hopefully you will get a green screen like this
Travis configuration/implementation is defined on .travis.yml
(at root level). There we have definition of python versions to use on testing, docker images to use advance data systems like ElasticSearch, code quality checks etc etc. If you are just doing small code changes on pygeoapi likely you will not need to change anything on the file. If you are writing a full data provider implementation for database XPTO, then submit a new .travis.yml
on your pull request.
Check the following tutorials:
https://imgs.xkcd.com/comics/automation.png
- Join us on gitter
- Fork pygeoapi into your private repository
- Pick or create an issue number
- Create a branch on your fork with a naming convention related to working issue
- Create pytest and code
- Check if tests pass, flake8 everything
- Write the docs
- Make a pull request
- Update content and code based on review
Yes, it the TL;DR is more or the the contributing guidelines CONTRIBUTING.md