Regrid
Regrid#
[1]:
import xarray
import numpy
import pandas
import climtas
import xesmf
import dask.array
We have a large Dask dataset that we’d like to regrid to a different resolution, so we can compare it with a different dataset
[2]:
time = pandas.date_range('20010101', '20040101', freq='D', closed='left')
data = dask.array.random.random((len(time),50,100), chunks=(90,25,25))
lat = numpy.linspace(-90, 90, data.shape[1])
lon = numpy.linspace(-180, 180, data.shape[2], endpoint=False)
da = xarray.DataArray(data, coords=[('time', time), ('lat', lat), ('lon', lon)], name='temperature')
da.lat.attrs['standard_name'] = 'latitude'
da.lon.attrs['standard_name'] = 'longitude'
da
[2]:
<xarray.DataArray 'temperature' (time: 1095, lat: 50, lon: 100)> dask.array<random_sample, shape=(1095, 50, 100), dtype=float64, chunksize=(90, 25, 25), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2001-01-01 2001-01-02 ... 2003-12-31 * lat (lat) float64 -90.0 -86.33 -82.65 -78.98 ... 78.98 82.65 86.33 90.0 * lon (lon) float64 -180.0 -176.4 -172.8 -169.2 ... 169.2 172.8 176.4
- time: 1095
- lat: 50
- lon: 100
- dask.array<chunksize=(90, 25, 25), meta=np.ndarray>
Array Chunk Bytes 41.77 MiB 439.45 kiB Shape (1095, 50, 100) (90, 25, 25) Count 104 Tasks 104 Chunks Type float64 numpy.ndarray - time(time)datetime64[ns]2001-01-01 ... 2003-12-31
array(['2001-01-01T00:00:00.000000000', '2001-01-02T00:00:00.000000000', '2001-01-03T00:00:00.000000000', ..., '2003-12-29T00:00:00.000000000', '2003-12-30T00:00:00.000000000', '2003-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- lat(lat)float64-90.0 -86.33 -82.65 ... 86.33 90.0
- standard_name :
- latitude
array([-90. , -86.326531, -82.653061, -78.979592, -75.306122, -71.632653, -67.959184, -64.285714, -60.612245, -56.938776, -53.265306, -49.591837, -45.918367, -42.244898, -38.571429, -34.897959, -31.22449 , -27.55102 , -23.877551, -20.204082, -16.530612, -12.857143, -9.183673, -5.510204, -1.836735, 1.836735, 5.510204, 9.183673, 12.857143, 16.530612, 20.204082, 23.877551, 27.55102 , 31.22449 , 34.897959, 38.571429, 42.244898, 45.918367, 49.591837, 53.265306, 56.938776, 60.612245, 64.285714, 67.959184, 71.632653, 75.306122, 78.979592, 82.653061, 86.326531, 90. ])
- lon(lon)float64-180.0 -176.4 ... 172.8 176.4
- standard_name :
- longitude
array([-180. , -176.4, -172.8, -169.2, -165.6, -162. , -158.4, -154.8, -151.2, -147.6, -144. , -140.4, -136.8, -133.2, -129.6, -126. , -122.4, -118.8, -115.2, -111.6, -108. , -104.4, -100.8, -97.2, -93.6, -90. , -86.4, -82.8, -79.2, -75.6, -72. , -68.4, -64.8, -61.2, -57.6, -54. , -50.4, -46.8, -43.2, -39.6, -36. , -32.4, -28.8, -25.2, -21.6, -18. , -14.4, -10.8, -7.2, -3.6, 0. , 3.6, 7.2, 10.8, 14.4, 18. , 21.6, 25.2, 28.8, 32.4, 36. , 39.6, 43.2, 46.8, 50.4, 54. , 57.6, 61.2, 64.8, 68.4, 72. , 75.6, 79.2, 82.8, 86.4, 90. , 93.6, 97.2, 100.8, 104.4, 108. , 111.6, 115.2, 118.8, 122.4, 126. , 129.6, 133.2, 136.8, 140.4, 144. , 147.6, 151.2, 154.8, 158.4, 162. , 165.6, 169.2, 172.8, 176.4])
Here’s a variable on the target grid, that we’d like to regrid da
to
[3]:
t_lat = numpy.linspace(-90, 90, 10)
t_lon = numpy.linspace(0, 360, 50, endpoint=False)
t_da = xarray.DataArray(dask.array.zeros((len(t_lat), len(t_lon))), coords=[('lat', t_lat), ('lon', t_lon)])
t_da
[3]:
<xarray.DataArray 'zeros-18791f2cff3f47059e412e1e9c5470f9' (lat: 10, lon: 50)> dask.array<zeros, shape=(10, 50), dtype=float64, chunksize=(10, 50), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 -90.0 -70.0 -50.0 -30.0 -10.0 ... 30.0 50.0 70.0 90.0 * lon (lon) float64 0.0 7.2 14.4 21.6 28.8 ... 331.2 338.4 345.6 352.8
- lat: 10
- lon: 50
- dask.array<chunksize=(10, 50), meta=np.ndarray>
Array Chunk Bytes 3.91 kiB 3.91 kiB Shape (10, 50) (10, 50) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray - lat(lat)float64-90.0 -70.0 -50.0 ... 70.0 90.0
array([-90., -70., -50., -30., -10., 10., 30., 50., 70., 90.])
- lon(lon)float640.0 7.2 14.4 ... 338.4 345.6 352.8
array([ 0. , 7.2, 14.4, 21.6, 28.8, 36. , 43.2, 50.4, 57.6, 64.8, 72. , 79.2, 86.4, 93.6, 100.8, 108. , 115.2, 122.4, 129.6, 136.8, 144. , 151.2, 158.4, 165.6, 172.8, 180. , 187.2, 194.4, 201.6, 208.8, 216. , 223.2, 230.4, 237.6, 244.8, 252. , 259.2, 266.4, 273.6, 280.8, 288. , 295.2, 302.4, 309.6, 316.8, 324. , 331.2, 338.4, 345.6, 352.8])
The main way to do a regridding in Xarray is to use xesmf, which uses the ESMF library to do the actual regridding. However data that is chunked horizontally produces an error message, which is an issue for very large grids.
[4]:
re = xesmf.Regridder(da, t_da, method='bilinear')
try:
re(da)
except Exception as e:
print("Error", e)
/home/swales/miniconda3/envs/dev/lib/python3.8/site-packages/dask/array/core.py:378: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
o = func(*args, **kwargs)
Error dimension lat on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, either rechunk into a single dask array chunk along this dimension, i.e., ``.chunk(lat: -1)``, or pass ``allow_rechunk=True`` in ``dask_gufunc_kwargs`` but beware that this may significantly increase memory usage.
/home/swales/miniconda3/envs/dev/lib/python3.8/site-packages/xesmf/frontend.py:466: FutureWarning: ``output_sizes`` should be given in the ``dask_gufunc_kwargs`` parameter. It will be removed as direct parameter in a future version.
dr_out = xr.apply_ufunc(
Instead you have to remove the horizontal chunking before doing a xesmf regrid.
[5]:
re(da.chunk({'lat': None, 'lon': None}))
[5]:
<xarray.DataArray 'temperature' (time: 1095, lat: 10, lon: 50)> dask.array<transpose, shape=(1095, 10, 50), dtype=float64, chunksize=(90, 10, 50), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2001-01-01 2001-01-02 ... 2003-12-31 * lon (lon) float64 0.0 7.2 14.4 21.6 28.8 ... 331.2 338.4 345.6 352.8 * lat (lat) float64 -90.0 -70.0 -50.0 -30.0 -10.0 ... 30.0 50.0 70.0 90.0 Attributes: regrid_method: bilinear
- time: 1095
- lat: 10
- lon: 50
- dask.array<chunksize=(90, 10, 50), meta=np.ndarray>
Array Chunk Bytes 4.18 MiB 351.56 kiB Shape (1095, 10, 50) (90, 10, 50) Count 156 Tasks 13 Chunks Type float64 numpy.ndarray - time(time)datetime64[ns]2001-01-01 ... 2003-12-31
array(['2001-01-01T00:00:00.000000000', '2001-01-02T00:00:00.000000000', '2001-01-03T00:00:00.000000000', ..., '2003-12-29T00:00:00.000000000', '2003-12-30T00:00:00.000000000', '2003-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- lon(lon)float640.0 7.2 14.4 ... 338.4 345.6 352.8
array([ 0. , 7.2, 14.4, 21.6, 28.8, 36. , 43.2, 50.4, 57.6, 64.8, 72. , 79.2, 86.4, 93.6, 100.8, 108. , 115.2, 122.4, 129.6, 136.8, 144. , 151.2, 158.4, 165.6, 172.8, 180. , 187.2, 194.4, 201.6, 208.8, 216. , 223.2, 230.4, 237.6, 244.8, 252. , 259.2, 266.4, 273.6, 280.8, 288. , 295.2, 302.4, 309.6, 316.8, 324. , 331.2, 338.4, 345.6, 352.8])
- lat(lat)float64-90.0 -70.0 -50.0 ... 70.0 90.0
array([-90., -70., -50., -30., -10., 10., 30., 50., 70., 90.])
- regrid_method :
- bilinear
The climtas.regrid functions support regridding horizontally chunked data. By default the regridding is bilinear, with weights generated by CDO, but you can also supply your own regridding weights or have the library generate them for you from ESMF or CDO.
[6]:
climtas.regrid.regrid(da, t_da)
[6]:
<xarray.DataArray 'temperature' (time: 1095, lat: 10, lon: 50)> dask.array<reshape, shape=(1095, 10, 50), dtype=float64, chunksize=(90, 10, 50), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2001-01-01 2001-01-02 ... 2003-12-31 * lat (lat) float64 -90.0 -70.0 -50.0 -30.0 -10.0 ... 30.0 50.0 70.0 90.0 * lon (lon) float64 0.0 7.2 14.4 21.6 28.8 ... 331.2 338.4 345.6 352.8
- time: 1095
- lat: 10
- lon: 50
- dask.array<chunksize=(90, 10, 50), meta=np.ndarray>
Array Chunk Bytes 4.18 MiB 351.56 kiB Shape (1095, 10, 50) (90, 10, 50) Count 1124 Tasks 13 Chunks Type float64 numpy.ndarray - time(time)datetime64[ns]2001-01-01 ... 2003-12-31
array(['2001-01-01T00:00:00.000000000', '2001-01-02T00:00:00.000000000', '2001-01-03T00:00:00.000000000', ..., '2003-12-29T00:00:00.000000000', '2003-12-30T00:00:00.000000000', '2003-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
- lat(lat)float64-90.0 -70.0 -50.0 ... 70.0 90.0
- units :
- degrees_north
- standard_name :
- latitude
array([-90., -70., -50., -30., -10., 10., 30., 50., 70., 90.])
- lon(lon)float640.0 7.2 14.4 ... 338.4 345.6 352.8
- units :
- degrees_east
- standard_name :
- longitude
array([ 0. , 7.2, 14.4, 21.6, 28.8, 36. , 43.2, 50.4, 57.6, 64.8, 72. , 79.2, 86.4, 93.6, 100.8, 108. , 115.2, 122.4, 129.6, 136.8, 144. , 151.2, 158.4, 165.6, 172.8, 180. , 187.2, 194.4, 201.6, 208.8, 216. , 223.2, 230.4, 237.6, 244.8, 252. , 259.2, 266.4, 273.6, 280.8, 288. , 295.2, 302.4, 309.6, 316.8, 324. , 331.2, 338.4, 345.6, 352.8])