climtas.blocked
Contents
climtas.blocked#
Xarray operations that act per Dask block
groupby#
- climtas.blocked.blocked_groupby(da: xarray.core.dataarray.DataArray, indexer=None, **kwargs) climtas.blocked.BlockedGroupby [source]#
Create a blocked groupby
Mostly works like
xarray.groupby()
, however this will have better chunking behaviour at the expense of only working with data regularly spaced in time.grouping may be one of:
‘dayofyear’: Group by number of days since the start of the year
‘monthday’: Group by (‘month’, ‘day’)
>>> time = pandas.date_range('20020101','20050101', freq='D', closed='left') >>> hourly = xarray.DataArray(numpy.random.random(time.size), coords=[('time', time)])
>>> blocked_doy_max = blocked_groupby(hourly, time='dayofyear').max() >>> xarray_doy_max = hourly.groupby('time.dayofyear').max() >>> xarray.testing.assert_equal(blocked_doy_max, xarray_doy_max)
- Parameters
da (
xarray.DataArray
) – Resample targetindexer/kwargs (Dict[dim, grouping]) – Mapping of dimension name to grouping type
- Returns
- class climtas.blocked.BlockedGroupby(da: xarray.core.dataarray.DataArray, grouping: str, dim: str = 'time')[source]#
A blocked groupby operation, created by
blocked_groupby()
Works like
xarray.core.groupby.DataArrayGroupBy
, with the constraint that the data contains no partial yearsThe benefit of this restriction is that no extra Dask chunks are created by the grouping, which is important for large datasets.
- apply(op: climtas.blocked.DataArrayFunction, **kwargs) xarray.core.dataarray.DataArray [source]#
Apply a function to the blocked data
self.da is blocked to replace the self.dim dimension with two new dimensions, ‘year’ and self.grouping. op is then run on the data, and the result is converted back to the shape of self.da.
Use this to e.g. group the data by ‘dayofyear’, then rank each ‘dayofyear’ over the ‘year’ dimension
- Parameters
op ((
xarray.DataArray
, **kwargs) ->xarray.DataArray
) – Function to apply**kwargs – Passed to op
- Returns
xarray.DataArray
shaped like self.da
- block_dataarray() xarray.core.dataarray.DataArray [source]#
Reshape self.da to have a ‘year’ and a self.grouping axis
The self.dim axis is grouped up into individual years, then for each year that group’s self.dim is converted into self.grouping, so that leap years and non-leap years have the same length. The groups are then stacked together to create a new DataArray with ‘year’ as the first dimension and self.grouping replacing self.dim.
Data for a leap year self.grouping in a non-leap year is NAN
- Returns
The reshaped
xarray.DataArray
See:
apply()
will block the data, apply a function and then unblock the dataunblock_dataarray()
will convert a DataArray shaped like this method’s output back into a DataArray shaped like self.da
- max() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.max()
See:
reduce()
- mean() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.mean()
See:
reduce()
- min() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.min()
See:
reduce()
- nanpercentile(q: float) xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.nanpercentile()
over the ‘year’ axisSlower than
percentile()
, but will be correct if there’s missing data (e.g. on leap days)- Parameters
q (
float
) – Percentile within the interval [0, 100]
See:
reduce()
,percentile()
- percentile(q: float) xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.percentile()
over the ‘year’ axisFaster than
nanpercentile()
, but may be incorrect if there’s missing data (e.g. on leap days)- Parameters
q (
float
) – Percentile within the interval [0, 100]
See:
reduce()
,nanpercentile()
- rank(method: str = 'average') xarray.core.dataarray.DataArray [source]#
Rank the samples using
scipy.stats.rankdata()
over the ‘year’ axis- Parameters
method – See
scipy.stats.rankdata()
See:
apply()
- reduce(op: climtas.blocked.DataArrayFunction, **kwargs) xarray.core.dataarray.DataArray [source]#
Reduce the data over ‘year’ using op
self.da is blocked to replace the self.dim dimension with two new dimensions, ‘year’ and self.grouping. op is then run on the data to remove the ‘year’ dimension
Note there will be NAN values in the data when there isn’t a self.grouping value for that year (e.g. dayofyear = 366 or (month, day) = (2, 29) in a non-leap year)
Use this to e.g. group the data by ‘dayofyear’, then get the mean values at each ‘dayofyear’
- Parameters
op ((
xarray.DataArray
, **kwargs) ->xarray.DataArray
) – Function to apply**kwargs – Passed to op
- Returns
xarray.DataArray
shaped like self.da, but with self.dim replaced by self.grouping
- sum() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.sum()
See:
reduce()
- unblock_dataarray(da: xarray.core.dataarray.DataArray) xarray.core.dataarray.DataArray [source]#
Inverse of
block_dataarray()
Given a DataArray constructed by
block_dataarray()
, returns an ungrouped DataArray with the original self.dim axis from self.da.Data for a leap year self.grouping in a non-leap year is dropped
percentile#
- climtas.blocked.approx_percentile(da: Union[xarray.core.dataarray.DataArray, dask.array.core.Array, numpy.ndarray], q, dim: Optional[str] = None, axis: Optional[int] = None, skipna: bool = True)[source]#
Return an approximation of the qth percentile along a dimension of da
For large Dask datasets the approximation will compute much faster than
numpy.percentile()
If da contains Dask data, it will use Dask’s approximate percentile algorithim extended to multiple dimensions, see
dask.array.percentile()
If da contains Numpy data it will use
numpy.percentile()
- Parameters
da – Input dataset
q – Percentile to calculate in the range [0,100]
dim – Dimension name to reduce (xarray data only)
axis – Axis number to reduce
skipna – Ignore NaN values (like
numpy.nanpercentile()
)
- Returns
Array of the same type as da, otherwise as
numpy.percentile()
- climtas.blocked.dask_approx_percentile(array: dask.array.routines.array, pcts, axis: int, interpolation='linear', skipna=True)[source]#
Get the approximate percentiles of a Dask array along ‘axis’, using the ‘dask’ method of
dask.array.percentile()
.- Parameters
array – Dask Nd array
pcts – List of percentiles to calculate, within the interval [0,100]
axis – Axis to reduce
skipna – Ignore NaN values (like
numpy.nanpercentile()
) if true
- Returns
Dask array with first axis the percentiles from ‘pcts’, remaining axes from ‘array’ reduced along ‘axis’
resample#
- climtas.blocked.blocked_resample(da: xarray.core.dataarray.DataArray, indexer=None, **kwargs) climtas.blocked.BlockedResampler [source]#
Create a blocked resampler
Mostly works like
xarray.resample()
, however unlike Xarray’s resample this will maintain the same number of Dask chunksThe input data is grouped into blocks of length count along dim for further operations (see
BlockedResampler
)Count must evenly divide the size of each block along the target axis
>>> time = pandas.date_range('20010101','20010110', freq='H', closed='left') >>> hourly = xarray.DataArray(numpy.random.random(time.size), coords=[('time', time)])
>>> blocked_daily_max = blocked_resample(hourly, time='1D').max() >>> xarray_daily_max = hourly.resample(time='1D').max() >>> xarray.testing.assert_identical(blocked_daily_max, xarray_daily_max)
>>> blocked_daily_max = blocked_resample(hourly, time=24).max() >>> xarray_daily_max = hourly.resample(time='1D').max() >>> xarray.testing.assert_identical(blocked_daily_max, xarray_daily_max)
- Parameters
da (
xarray.DataArray
) – Resample targetindexer/kwargs (Dict[dim, count]) – Mapping of dimension name to count along that axis. May be an integer or a time interval understood by pandas (that interval must evenly divide the dataset).
- Returns
- class climtas.blocked.BlockedResampler(da: xarray.core.dataarray.DataArray, dim: str, count: int)[source]#
A blocked resampling operation, created by
blocked_resample()
Works like
xarray.core.resample.DataarrayResample
, with the constraint that the resampling is a regular interval, and that the resampling interval evenly divides the length along dim of every Dask chunk in da.The benefit of this restriction is that no extra Dask chunks are created by the resampling, which is important for large datasets.
- max() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.max()
- mean() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.mean()
- min() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.min()
- nanmax() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.nanmax()
- nanmin() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.nanmin()
- reduce(op: Callable, **kwargs) xarray.core.dataarray.DataArray [source]#
Apply an arbitrary operation to each resampled group
The function op is applied to each group. The grouping axis is given by axis, this axis should be reduced out by op (e.g. like
numpy.mean()
does)- Parameters
op ((
numpy.array
, axis, **kwargs) ->numpy.array
) – Function to reduce out the resampled dimension**kwargs – Passed to op
- Returns
A resampled
xarray.DataArray
, where every self.count values along self.dim have been reduced by op
- sum() xarray.core.dataarray.DataArray [source]#
Reduce the samples using
numpy.sum()