Given a set of data data
, and a statistics function statfunction
that
applies to that data, computes the bootstrap confidence interval for
statfunction
on that data. Data points are assumed to be delineated by
axis 0.
- data: array_like, shape (N, ...) OR tuple of array_like all with shape (N, ...)
- Input data. Data points are assumed to be delineated by axis 0. Beyond this,
the shape doesn’t matter, so long as
statfunction
can be applied to the
array. If a tuple of array_likes is passed, then samples from each array (along
axis 0) are passed in order as separate parameters to the statfunction. The
type of data (single array or tuple of arrays) can be explicitly specified
by the multi parameter.
- statfunction: function (data, weights=(weights, optional)) -> value
This function should accept samples of data from data
. It is applied
to these samples individually.
If using the ABC method, the function _must_ accept a named weights
parameter which will be an array_like with weights for each sample, and
must return a _weighted_ result. Otherwise this parameter is not used
or required. Note that numpy’s np.average accepts this. (default=np.average)
- alpha: float or iterable, optional
- The percentiles to use for the confidence interval (default=0.05). If this
is a float, the returned values are (alpha/2, 1-alpha/2) percentile confidence
intervals. If it is an iterable, alpha is assumed to be an iterable of
each desired percentile.
- n_samples: float, optional
- The number of bootstrap samples to use (default=10000)
- method: string, optional
- The method to use: one of ‘pi’, ‘bca’, or ‘abc’ (default=’bca’)
- output: string, optional
- The format of the output. ‘lowhigh’ gives low and high confidence interval
values. ‘errorbar’ gives transposed abs(value-confidence interval value) values
that are suitable for use with matplotlib’s errorbar function. (default=’lowhigh’)
- epsilon: float, optional (only for ABC method)
- The step size for finite difference calculations in the ABC method. Ignored for
all other methods. (default=0.001)
- multi: boolean, optional
- If False, assume data is a single array. If True, assume data is a tuple/other
iterable of arrays of the same length that should be sampled together. If None,
decide based on whether the data is an actual tuple. (default=None)
- confidences: tuple of floats
- The confidence percentiles specified by alpha
- ‘pi’: Percentile Interval (Efron 13.3)
- The percentile interval method simply returns the 100*alphath bootstrap
sample’s values for the statistic. This is an extremely simple method of
confidence interval calculation. However, it has several disadvantages
compared to the bias-corrected accelerated method, which is the default.
- ‘bca’: Bias-Corrected Accelerated Non-Parametric (Efron 14.3) (default)
- This method is much more complex to explain. However, it gives considerably
better results, and is generally recommended for normal situations. Note
that in cases where the statistic is smooth, and can be expressed with
weights, the ABC method will give approximated results much, much faster.
- ‘abc’: Approximate Bootstrap Confidence (Efron 14.4, 22.6)
- This method provides approximated bootstrap confidence intervals without
actually taking bootstrap samples. This requires that the statistic be
smooth, and allow for weighting of individual points with a weights=
parameter (note that np.average allows this). This is _much_ faster
than all other methods for situations where it can be used.
To calculate the confidence intervals for the mean of some numbers:
>> boot.ci( np.randn(100), np.average )
Given some data points in arrays x and y calculate the confidence intervals
for all linear regression coefficients simultaneously:
>> boot.ci( (x,y), scipy.stats.linregress )
Efron, An Introduction to the Bootstrap. Chapman & Hall 1993