Zarr and NCDatasets - gher-uliege/Documentation GitHub Wiki
Zarr and NetCDF
Utilities
Convert from NetCDF to Zarr:
~/opt/netcdf-4.8.1/bin/nccopy ~/dataset.nc 'file:///tmp/test4#mode=nczarr,file'
Inspect Zarr file/directory:
~/opt/netcdf-4.8.1/bin/ncdump -h 'file:///tmp/test4#mode=nczarr,file'
NCDatasets and Julia:
Use a custom NetCDF library (PR pending https://github.com/JuliaPackaging/Yggdrasil/pull/3620):
using Preferences, NetCDF_jll
set_preferences!(NetCDF_jll, "libnetcdf_path" => "/home/abarth/opt/netcdf-4.8.1/lib/libnetcdf.so.19")
# restart Julia
using NetCDF_jll
# check file path
@show NetCDF_jll.get_libnetcdf_path()
using NCDatasets
ds = NCDataset("file:///tmp/test4#mode=nczarr,file")
Returns
NCDataset: file:///tmp/test4#mode=nczarr,file
Group: /
Dimensions
longitude = 1000
time = 31
latitude = 500
Variables
v1 (1000 × 500 × 31)
Datatype: UInt8
Dimensions: longitude × latitude × time
Attributes:
add_offset = 1.0
scale_factor = 5.0
Apparently the Zarr specification do not have the concept of named dimension. Xarray and NCzarr add named dimension but unfortunately in incompatible ways. The Zarr object created above cannot be loaded by Xarray.
import xarray as xr
xr.open_zarr("file:///tmp/test4")
Returns:
KeyError: 'Zarr object is missing the attribute `_ARRAY_DIMENSIONS`, which is required for xarray to determine variable dimensions.'
In order to make an Xarray-compatible Zarr object one needs to use the option xarray
:
~/opt/netcdf-4.8.1/bin/nccopy ~/dataset.nc 'file:///tmp/test5#mode=file,xarray'
import xarray as xr
ds = xr.open_zarr("file:///tmp/test5")
ds["v1"][0,0,0].values
using NCDatasets
ds = NCDataset("file:///tmp/test5#mode=xarray,file")
ds["v1"][0,0,0]
Unfortunately, the precise type information of attributes is lost. E.g. floating point attributes are returned as integers if they have no factional part.
More information at: