Aggregator Interface
October 4, 2024 ยท View on GitHub
The Aggregator interface describes a type of class that performs aggregation.
Terminology
Aggregation is a 2-step process:
- Sort: Group a collection of data points by some property into bins.
- Aggregate: for each bin, calculate a numeric output (result) from some metrics (values) from all its members. Multiple results can be obtained independently (channels).
An implementation of the Aggregator interface takes the following inputs:
- The number of data points
- The group that each data point belongs to, by mapping each data point to a binId (array of integers)
- The values to aggregate, by mapping each data point in each channel to one value (number)
- The method (operation) to reduce a list of values to one number, such as SUM
And yields the following outputs:
- A list of binIds that data points get sorted into
- The aggregated values (result) as a list of numbers, comprised of one number per bin per channel
- The [min, max] among all aggregated values (domain) for each channel
Example
Consider the task of making a histogram that shows the result of a survey by age distribution.
- The data points are the list of participants, and we know the age of each person.
- Suppose we want to group them by 5-year intervals. A 21-year-old participant is assigned to the bin of age 20-25, with binId
[20]. A 35-year-old participant is assigned to the bin of age 35-40, with binId[35], and so on. - For each bin (i.e. age group), we calculate 2 values:
- The first channel is "number of participants". Each participant in this group yields a value of 1, and the result equals all values added together (operation: SUM).
- The second channel is "average score". Each participant in this group yields a value that is their test score, and the result equals the sum of all scores divided by the number of participants (operation: MEAN).
- As the outcome of the aggregation, we have:
- Bins:
[15, 20, 25, 30, 35, 40] - Channel 0 result:
[1, 5, 12, 10, 8, 3] - Channel 0 domain:
[1, 12] - Channel 1 result:
[6, 8.2, 8.5, 7.9, 7.75, 8] - Channel 1 domain:
[6, 8.5]
- Bins:
Methods
An implementation of Aggregator should expose the following methods:
setProps {#setprops}
Set runtime properties of the aggregation.
aggregator.setProps({
pointCount: 10000,
attributes: {...},
operations: ['SUM', 'MEAN'],
binOptions: {groupSize: 5}
});
Arguments:
pointCount(number) - number of data points.attributes(Attribute[]) - the input data.operations(string[]) - How to aggregate the values inside a bin, defined per channel.binOptions(object) - arbitrary settings that affect bin sorting.onUpdate(Function) - callback when a channel has been recalculated. Receives the following arguments:channel(number) - the channel that just updated
setNeedsUpdate {#setneedsupdate}
Flags a channel to need update. This could be a result of change in the input data or bin options.
aggregator.setNeedsUpdate(0);
Arguments:
channel(number, optional) - mark the given channel as dirty. If not provided, all channels will be updated.
update {#update}
Called after all props are set and before results are accessed. The aggregator should allocate resources and redo aggregations if needed at this stage.
aggregator.update();
preDraw {#predraw}
Called before the result buffers are drawn to screen. Certain types of aggregations are dependent on render time context and this is alternative opportunity to update just-in-time.
aggregator.preDraw();
getBin {#getbin}
Get the information of a given bin.
const bin = aggregator.getBin(100);
Arguments:
index(number) - index of the bin to locate it ingetBins()
Returns:
id(number[]) - Unique bin ID.value(number[]) - Aggregated values by channel.count(number) - Number of data points in this bin.pointIndices(number[] | undefined) - Indices of data points in this bin if available. This field may not be populated when using GPU-based implementations.
getBins {#getbins}
Get an accessor to all bin IDs.
const binIdsAttribute = aggregator.getBins();
Returns:
- A binary attribute of the output bin IDs, or
- null, if
updatehas never been called
getResult {#getresult}
Get an accessor to the aggregated values of a given channel.
const resultAttribute = aggregator.getResult(0);
Arguments:
channel(number) - the channel to retrieve results from
Returns:
- A binary attribute of the output values of the given channel, or
- null, if
updatehas never been called
getResultDomain {#getresultdomain}
Get the [min, max] of aggregated values of a given channel.
const [min, max] = aggregator.getResultDomain(0);
Arguments:
channel(number) - the channel to retrieve results from
Returns the domain ([number, number]) of the aggregated values of the given channel.
destroy {#destroy}
Dispose all allocated resources.
aggregator.destroy();
Members
An implementation of Aggregator should expose the following members:
binCount (number) {#bincount}
The number of bins in the aggregated result.
Source
modules/aggregation-layers/src/common/aggregator/aggregator.ts