@uwdata/kde

0.0.2 • Public • Published

Fast Gaussian Kernel Density Estimation

Fast Gaussian kernel density estimation in 1D or 2D. Uses Deriche's approximation for accurate, linear-time O(N + K) estimation.

All code is written as ESM modules.

Build Instructions

To build a bundle (ESM module or minified UMD):

  1. Run yarn to install dependencies.
  2. Run yarn build to build the bundles.

Compiled bundles will be written to the dist directory.

API Documentation

1D Density Estimation


# kde.density1d(data[, options])

Creates a new 1D density estimator for the input data. Returns an estimator object that includes the methods listed below, and also provides an iterator over resulting density points.

  • data: An array of input data values for which to perform density estimation. The array values may be numbers or objects.
  • options: Options for configuring density estimation.
    • x: An accessor function for input values. By default this is the identity function, correponding to data as an array of numbers. If the x option is not function valued, it will be treated as a key to look up on entries of the input data.
    • weight: An accessor function for weights. By default all input points are given the same weight. The weight values must sum to one. If the weight option is not function valued, it will be treated as a key to look up on entries of the input data.
    • bandwidth: The kernel bandwidth (standard deviation) to use. If unspecified, the bandwidth is automatically calculated using the nrd heuristic and the adjust option.
    • adjust: A fractional value by which to scale (adjust) an automatically calculated bandwidth. For example, an adjust value of 0.5 will result in half the automatically-determined bandwidth. This option is ignored if the bandwidth option is specified.
    • extent: The extent over which to compute kernel density estimation as a two-element array. Note that input data values outside the extent are ignored, potentially resulting in inaccurate densities relative to the full data. If unspecified, the extent is automatically calculated based on the input data extent and the pad option.
    • pad: The amount (in kernel bandwidths) by which to extend an automatically-calculated extent. The default value is 3, capturing 99% of the density from the most extreme points. Set this value to 0 to trim the density estimate to the minimum and maximum observed data points. This option is ignored if the extent option is provided.
    • size: The size (number of bins) to use for the internal grid. The default is 512 bins. The returned density estimate will include a total of size equally-spaced sample points over the extent.

Example

// perform 1D estimation with bandwidth = 1 over domain [0, 10]
// returns an iterator over [ { x, y }, ... ] points
kde.density1d([1, 2, 5, 5, 6, 9], { bandwidth: 1, extent: [0, 10] })

# density1d.grid()

Returns the internal grid array of total accumulated density values per bin. To instead produce an array of objects containing coordinate values and probability density function estimates, use density1d.points().

# density1d.points([x, y])

Returns an iterator over objects containing a sample point (x) and density value (y).

  • x: The property name for the sample point (default "x").
  • y: The property name for the estimated density value (default "y").

# density1d.bandwidth([bandwidth])

Get or set the bandwidth (standard deviation) of the Gaussian kernel. Setting the bandwidth will update the estimator efficiently without re-performing binning. The extent will remain unchanged, even if previously determined automatically.

2D Density Estimation


# kde.density2d(data)

Creates a new 2D density estimator for the input data. Returns an estimator object that includes the methods listed below, and also provides an iterator over resulting density points.

  • data: An array of input data values for which to perform density estimation.
  • options: Options for configuring density estimation.
    • x: An accessor function for x-dimension input values. The default retrieves index 0. If the x option is not function valued, it will be treated as a key to look up on entries of the input data.
    • y: An accessor function for y-dimension input values. The default retrieves index 1. If the y option is not function valued, it will be treated as a key to look up on entries of the input data.
    • weight: An accessor function for weights. By default all input points are given the same weight. The weight values must sum to one. If the weight option is not function valued, it will be treated as a key to look up on entries of the input data.
    • bandwidth: The kernel bandwidths (standard deviation) to use. If array-valued, specifies the x- and y-bandwidths separately. If number-valued, sets both x- and y-bandwidths to the same value. If unspecified, the bandwidths are automatically per-dimension calculated using the nrd heuristic and the adjust option.
    • adjust: A fractional value by which to scale (adjust) an automatically calculated bandwidth. For example, an adjust value of 0.5 will result in half the automatically-determined bandwidth. This option is ignored if the bandwidth option is specified.
    • extent: The extent over which to compute kernel density estimation along both the x- and y-dimensions. If an array of arrays is provided, specified the x- and y-extents separately. If a single two-number array is provided, sets both x- and y-extents to the same value. Note that input data values outside the extent are ignored, potentially resulting in inaccurate densities relative to the full data. If unspecified, the extent is automatically calculated based on the input data extent and the pad option.
    • pad: The amount (in kernel bandwidths) by which to extend an automatically-calculated extent. The default value is 3, capturing 99% of the density from the most extreme points. Set this value to 0 to trim the density estimate to the minimum and maximum observed data points. This option is ignored if the extent option is provided.
    • size: The size (number of bins) to use for the internal grid. The default is [256, 256] bins. If array-valued, specifies the x- and y-sizes separately. If number-valued, sets both x- and y-sizes to the same value. The returned density estimate will include a total of size[0] * size[1] equally-spaced sample points over the extent.

Example

// perform 2D estimation with bandwidths [1, 1] over extent [[0, 10], [0, 10]]
// use default grid size ([256, 256])
// returns an iterator over [ { x, y, z }, ... ] points
const data = [[1, 1], [1, 2], [5, 4], [5, 3], [6, 2], [8, 7]];
kde.density2d(data, { bandwidth: 1, extent: [0, 10] })
// perform 2D estimation with different bandwidths and extent for x and y
// returns an iterator over [ { x, y, z }, ... ] points
const data = [[1, 1], [1, 2], [5, 4], [5, 3], [6, 2], [8, 7]];
kde.density2d(data, { bandwidth: [1, 0.5], extent: [[1, 9], [1, 8]] })

# density2d.grid()

Returns the internal grid array of total accumulated density values per bin. To instead produce an array of objects containing coordinate values and probability density function estimates, use density2d.points().

# density2d.points([x, y, z])

Returns an iterator over objects containing sample points (x, y) and density value (z).

  • x: The property name for the x-dimension sample point (default "x").
  • y: The property name for the y-dimension sample point (default "y").
  • z: The property name for the estimated density value (default "z").

# density2d.bandwidth([bandwidth])

Get or set the bandwidths (standard deviations) of the Gaussian kernel. If array-valued, specifies the x- and y-bandwidths separately. If number-valued, sets both x- and y-bandwidths to the same value. Setting the bandwidth will update the estimator efficiently without re-performing binning. The extent will remain unchanged, even if previously determined automatically.

# density2d.heatmap([options])

Generate a heatmap image of the 2D density. Returns an HTML canvas element.

  • options: Options for heatmap image generation.
    • color: A color function that maps density values (normalized to the domain [0, 1]) to RGB color objects with r, g, b, and opacity properties.
    • clamp: Sets the range of density values to a given [min, max] array. Values below the minimum or above the maximum will be clamped to the provided values. Values within the clamped range are then normalized to the domain [0, 1].
    • canvas: An existing canvas element to draw into. By default a new canvas instance is created with dimensions matching the density estimator size option.

Utility Methods


# kde.nrd(data, accessor)

Calculates a suggested bandwidth for a set of numeric data values, using Scott's normal reference distribution (NRD) heuristic.

Package Sidebar

Install

npm i @uwdata/kde

Weekly Downloads

49

Version

0.0.2

License

BSD-3-Clause

Unpacked Size

79.4 kB

Total Files

21

Last publish

Collaborators

  • luke-s-snyder
  • jheer