awips ml usage guide - Unidata/awips-ml GitHub Wiki
This wiki contains information about using awips-ml.
- Interacting with the Containers
- User Customization
- Modifying the Containers
- Configuration
- Troubleshooting
Sometimes it is useful to step into the containers to check on logs, files, etc. To do this, run the following exec
command (this command assumes bash
is installed in the container being accessed):
docker exec -it [container_name] bash
To view running containers, run docker ps
which (if containers are running) should return something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
59abc5e97a8a awips-ml_edex "/usr/sbin/init" 31 minutes ago Up 31 minutes 0.0.0.0:388->388/tcp, :::388->388/tcp, 0.0.0.0:9581-9582->9581-9582/tcp, :::9581-9582->9581-9582/tcp edexc
e060cf1bb651 awips-ml_tf "/usr/bin/tf_serving…" 31 minutes ago Up 31 minutes 0.0.0.0:8500-8501->8500-8501/tcp, :::8500-8501->8500-8501/tcp tfc
f3bfa05deec5 awips-ml_process "python server/edex_…" 31 minutes ago Up 31 minutes processc
You can find [container_name]
in the NAMES
column of the output.
In general, anytime you modify any files, the containers need to be rebuilt. To do this, run the following commands:
docker-compose down
docker-compose build
docker-compose up
A heavier duty/more "blunt" way to do this is docker system prune
which will delete several types of information stored by docker. See documentation here.
awips-ml provides several interfaces for users to customize the data pre/post-processing within the processc
container. These interfaces are found in the /usr/
folder and there functionality is described below:
- environment.yml: This is an environment.yml file that users can use to include their own custom conda environment within the container.
- preproc.py: This is a custom pre-processing script that is invoked each time a new file is downloaded into the
processc
container prior. This script takes the numpy array from the transmitted netCDF file and corresponding variable. The output of this script should be a numpy array that matches the expected input dimensions of the tensorflow model hosted intfc
. - postproc.py: This is a custom post-processing script that is invoked each time model output is returned from the
tfc
container machine learning model. This script takes the outputted numpy array from the machine learning model. The output of this script should be a numpy array with dimensions that match the original netCDF file downloaded from the EDEX container. - custom_processc_script.sh: This is a bash script that allows users to include their own custom bash scripts to modify the
processc
container to accommodate their data processing workflows.
awips-ml also offers two ways for user's to customize the machine learning model endpoint that is deployed in tfc
. Users can:
- Include a model generating script in
tfc/etc
that generates the model from scratch, or - Include the pre-trained model weights from a
model.save()
command in thetfc/user_model
folder.
More instructions on how to use this functionality is given in tfc/Dockerfile
.
Sometimes it is useful to expose the TensorFlow model for testing purposes. Currently awips-ml uses a docker network for all intra-container networking. This means that no ports are visible outside of the docker network namespace by default. To expose the model ports in the tfc
container, add this line to the docker-compose.yml
file under the tf
section:
ports:
- 8500:8500
- 8501:8501
This will allow users to send data from the host OS (outside of the docker network namespace) over these ports: 8500
for the REST API, 8501
for gRPC. awips-ml uses the REST API.
awips-ml is composed of three containers and some other directories which are all configurable according to user needs.
-
edexc
: this is the container that runs the actual EDEX server. -
processc
: this is the container that takes data ingested byedexc
and preprocesses before sending totfc
and post-processes data recieved fromtfc
before sending back toedexc
. -
tfc
: this is the container where the TensorFlow machine learning model exists. -
server
: this directory has several common utilities used by different containers. Unless noted below, files in this directory should not be modified by users. -
docker-compose.yml
: This file controls howedexc
,processc
, andtfc
are launched/interact with each other. In general user configuration should not be necessary. In general/where possible user configurations exist in specific files. Users should (in general) not need to modify anyDockerfile
files.
This container has several configuration files that control the type of data ingested by EDEX and CAVE specific configuration. These files are all found in edexc/etc/conf
- files in edexc/etc/systemd
should not be modified by users. Unless noted below, files in edexc/etc/conf
should not be edited by users:
This file controls the type of data ingested by the EDEX container. Note that several example entries are commented out. Users should modify this file so that the EDEX container ingests relevant data.
Modifications can be made by uncommenting an existing line or adding their own. The string in quotes is a regex statement that matches patterns on the upstream LDM. For example:
REQUEST UNIWISC|NIMAGE "OR_ABI-L2-CMIPM1-M6C09_G17.*" iddc.unidata.ucar.edu # GOES Channel 9 Mesoscale 1
Is requesting OR_ABI-L2-CMIPM1-M6C09_G17.*
all GOES 17 (G17) Advanced Baseline Imager (ABI) Level 2 (L2) products with product name Cloud & Moisture Imagery (CMIP) in the Mesoscale 1 (M1) ABI scene. Channel 09 (M6C09) is the specific channel being requested which corresponds to Mid-level water vapor. Info on file naming conventions for ldmd.conf
can be found at the following links:
- http://edc.occ-data.org/goes16/getdata/#file-formats
- http://cimss.ssec.wisc.edu/goes/ABI_File_Naming_Conventions.pdf
The upstream LDM which the EDEX container gets data from is specified by iddc.unidata.ucar.edu
. Users must select an upstream LDM that is willing to serve them data.
The pqact.conf
file handles actions as the EDEX container ingests new data from the upstream LDM. Documentation on this file can be found here. A relevant example for GOES cloud and moisture data is:
NIMAGE ^/data/ldm/pub/native/satellite/GOES/([^/]*)/Products/CloudAndMoistureImagery/([^/]*)/([^/]*)/([0-9]{8})/([^/]*)(c[0-9]{7})(..)(.....)_ml.nc
FILE -close -edex /awips2/data_store/GOES/\4/\7/CMI-IDD/\5\6\7\8_ml.nc4 # handle inputs for awips-ml
NIMAGE ^/data/ldm/pub/native/satellite/GOES/([^/]*)/Products/CloudAndMoistureImagery/([^/]*)/([^/]*)/([0-9]{8})/([^/]*)(c[0-9]{7})(..)(.....).nc
EXEC /home/awips/anaconda3/envs/grpc_env/bin/python /server/trigger.py /awips2/data_store/GOES/\4/\7/CMI-IDD/\5\6\7\8.nc4 edex_container
Note that the two entries have similar pattern matching with different commands as described in the pqact.conf
documentation linked above. The major difference here is the inclusion of the EXEC
entry which calls a python script that alerts the EDEX container of a newly recieved file and sends it to the tfc
container.
Use this filename to change the hostname:
<hostname>[name].docker</hostname>
Additionally, the line
<time-offset>0</time-offset>
has a different value than the default (3600, more information here) because it was causing inconsistent behavior for EDEX file ingestion.
The tfc
container is designed to be lightweight in the sense that users only need to point to the location of their trained model. Users can do this by modifying tfc/Dockerfile
:
COPY ./tfc/models/[saved_model] /models/model
Where [saved_model]
is the location of the model they'd like to serve with the tfc
container. Note that [saved_model]
must conform to this directory structure:
[saved_model]/[version_number]/
because the underlying TensorFlow docker image in tfc
needs a version number to run.
This container does not have any configuration options associated with it.
This folder contains several configuration files/scripts used for handling data I/O from the edexc
/EDEX server. Users do not need to modify the container_servers.py
or trigger.py
files directly as these can be controlled with config.yaml
The main parameter to change in this file is variable_spec
- this is the netCDF
variable that is passed between edexc
and processc
(and eventually tfc
).
Besides this, config.yaml
controls several aspects of the inter-container networking and which ports the edexc
and processc
containers communicate with each other; in general these ports do not need to be modified as they are restricted to the docker network namespace so they shouldn't interfere with the host OS's network namespace.
This section covers common problems. If your question is not answered here, feel free to open a new issue for help.
- Try waiting for a few minutes to see if data loads - sometimes there is a lag between launching the EDEX container and when data is available.
- If no data eventually appears, interact with the container and check the LDM log by:
docker exec -it edexc bash
less /awips2/ldm/logs/ldmd.log
Within this log file, you should see something similar to:
20211021T183357.073945Z iddc.unidata.ucar.edu[1111] requester6.c:make_request:311 NOTE Upstream LDM-6 on iddc.unidata.ucar.edu is willing to be a primary feeder
If you do not see a message like this, that means that whatever upstream LDM specified in the ldmd.conf
file is rejecting your requests. Generally this means your IP address is being rejected. Contact the upstream LDM administrator for more information. In the case of Unidata LDM's, your IP address needs to be associated with a .edu
domain.
If the above conditions are true, try entering the container and running edex status
. If the output looks like:
[edex status]
postgres :: running :: pid 188
pypies :: running :: pid 268
qpid :: running :: pid 305
EDEXingest :: running :: pid 777 1906
EDEXgrib :: not running
EDEXrequest :: running :: pid 739 1919
ldmadmin :: not running
This could be indicative that the container was not shut down properly before being restarted. In this case, bring down the container (docker-compose down
), delete any stored data (docker system prune
), and then rebuild/relaunch the container (docker-compose build && docker-compose up
).
Generally it is convenient to launch a container in detached mode (docker-compose up -d
), however this means that you can't see the output of the container. If your container is crashing it can be convenient to launch the container normally (docker-compose up
) and view the output (especially for the processc
/tfc
containers).
Additionally it can be useful to look at the outputs of the containers themselves by attaching to the container process launched by docker-compose
; you can do this via:
docker attach [container_name]
If your data has been successfully transformed in your machine learning model and shows up in CAVE's Product Browser but nothing displays in the map view (except potentially a color bar) then you may need to clear CAVE's cache. This can be done by deleting the caveData directory (listed below, more information on this here)
- macOS:
/Users/[username]/Library/caveData
- Linux:
/home/[username]/caveData
- Windows:
C:\Users\[username]\caveData
This occurs if you are using a docker version >v4.3.0 which introduced a breaking change for awips-ml. awips-ml uses centos7 with no current plans to upgrade to centos8. Docker versions <v4.3.0 should work and awips-ml was developed using Docker v3.5.2. Downgrading to different Docker versions is possible by going to their website; the downgrade link to Docker v3.5.2 is here.
File an issue (ideally with a link to your forked awips-ml
repository). Useful places to look for logs within the edexc
container are:
awips2/edex/logs/edex-ingest-[product_type]-[date].log
awips2/ldm/logs/ldmd.log
- The output of the python script handling communication between
edexc
andtfc
can be viewed via the following command within theedexc
container:
sudo journalctl -fu listener_start.service