API.md
March 15, 2026 · View on GitHub
Contents
- Initialization
- Aggregate state
- Mathematical functions
- Vectorized mathematical functions
- Updating a TDigest
- Merging TDigests
- Serialization
- Comparison
- Other methods and properties
Initialization
TDigest()
Creates a new TDigest instance.
from fastdigest import TDigest
digest = TDigest()
print(digest)
TDigest(max_centroids=1000)
Note: The
max_centroidsparameter controls how large the data structure is allowed to grow. A lower value means more compression, enabling a smaller memory footprint and faster computation speed at the cost of some precision.The default value of 1000 offers a great balance of speed and high precision.
Setting
max_centroidsto 0 disables compression entirely. This will incur a significant performance cost on all operations and is not recommended.
TDigest.from_values(x, w=None)
Creates a TDigest directly from any sequence of numeric values x. The optional weights w can be either a sequence of the same length as x, or a scalar that will be used as the weight for the entire batch.
Static method.
import numpy as np
from fastdigest import TDigest
digest = TDigest.from_values([2.71, 3.14, 1.42]) # from list
digest = TDigest.from_values((42,)) # from tuple
digest = TDigest.from_values(range(101)) # from range
digest = TDigest.from_values([1, 2], w=[1, 2]) # weighted individually
digest = TDigest.from_values([1, 2], w=2.0) # weighted with scalar
data = np.random.random(10_000)
digest = TDigest.from_values(data) # from NumPy array
print(f"{digest}: {len(digest)} centroids from {digest.n_values} values")
TDigest(max_centroids=1000): 988 centroids from 10000 values
Aggregate state
self.mass()
Returns the total amount of values ingested.
Equivalent to float(n_values) if no weighted updates were used.
digest = TDigest.from_values(range(11))
print(f"Mass: {digest.mass()}")
Mass: 11.0
self.sum()
Returns the sum of all ingested values.
digest = TDigest.from_values(range(11))
print(f"Sum: {digest.sum()}")
Sum: 55.0
self.min()
Returns the lowest ingested value.
digest = TDigest.from_values(range(11))
print(f"Minimum: {digest.min()}")
Minimum: 0.0
self.max()
Returns the highest ingested value.
digest = TDigest.from_values(range(11))
print(f"Maximum: {digest.max()}")
Maximum: 10.0
Mathematical functions
self.quantile(q)
Estimates the value at the quantile q (between 0 and 1).
Inverse function of cdf(x).
Also available as vectorized quantile_vec(q).
from fastdigest import TDigest
import numpy as np
normally_distributed_data = np.random.normal(0, 1, 10_000)
digest = TDigest.from_values(normally_distributed_data)
print(f" Median: {digest.quantile(0.5):.3f}")
print(f"99th percentile: {digest.quantile(0.99):.3f}")
Median: 0.001
99th percentile: 2.274
self.percentile(p)
Estimates the value at the pth percentile.
Alias for quantile(p/100).
digest = TDigest.from_values(normally_distributed_data)
print(f" Median: {digest.percentile(50):.3f}")
print(f"99th percentile: {digest.percentile(99):.3f}")
Median: 0.001
99th percentile: 2.274
self.median()
Estimates the median value.
Alias for quantile(0.5).
digest = TDigest.from_values(normally_distributed_data)
print(f"Median: {digest.median():.3f}")
Median: 0.001
self.iqr()
Estimates the interquartile range (IQR).
Alias for quantile(0.75) - quantile(0.25).
digest = TDigest.from_values(normally_distributed_data)
print(f"IQR: {digest.iqr():.3f}")
IQR: 1.334
self.cdf(x)
Estimates the relative rank (cumulative probability) of the value x.
Inverse function of quantile(q).
Also available as vectorized cdf_vec(x).
digest = TDigest.from_values(normally_distributed_data)
print(f"cdf(0.0) = {digest.cdf(0.0):.3f}")
print(f"cdf(1.0) = {digest.cdf(1.0):.3f}")
cdf(0.0) = 0.500
cdf(1.0) = 0.846
self.probability(x1, x2)
Estimates the probability of finding a value in the interval [x1, x2].
Alias for cdf(x2) - cdf(x1).
digest = TDigest.from_values(normally_distributed_data)
prob = digest.probability(-2.0, 2.0)
prob_pct = 100 * prob
print(f"Probability of value between ±2: {prob_pct:.1f}%")
Probability of value between ±2: 95.4%
self.mean()
Calculates the arithmetic mean of the distribution as sum()/mass().
digest = TDigest.from_values(range(11))
print(f"Mean value: {digest.mean()}")
Mean value: 5.0
self.trimmed_mean(q1, q2)
Estimates the truncated mean between the two quantiles q1 and q2.
data = list(range(11))
data[-1] = 100_000 # extreme outlier
digest = TDigest.from_values(data)
mean = digest.mean()
trimmed_mean = digest.trimmed_mean(0.1, 0.9)
print(f" Mean: {mean}")
print(f"Trimmed mean: {trimmed_mean}")
Mean: 9095.0
Trimmed mean: 5.0
self.mad()
Estimates the median absolute deviation (MAD) of the distribution.
digest = TDigest.from_values(range(101))
print(f"MAD: {digest.mad()}")
MAD: 25.0
self.var()
Estimates the population variance of the distribution.
normally_distributed_data = np.random.normal(0, 1, 10_000)
digest = TDigest.from_values(normally_distributed_data)
print(f"Variance: {digest.var():.3f}")
Variance: 1.010
self.std()
Estimates the standard deviation of the distribution.
Alias for var() ** 0.5.
digest = TDigest.from_values(normally_distributed_data)
print(f"Standard deviation: {digest.std():.3f}")
Standard deviation: 1.005
self.is_normal()
Performs a Kolmogorov-Smirnov test to determine if the ingested data follows a normal distribution.
normally_distributed_data = np.random.normal(0, 1, 10_000)
normal_digest = TDigest.from_values(normally_distributed_data)
skewed_data = np.random.standard_gamma(5, 10_000)
skewed_digest = TDigest.from_values(skewed_data)
print(normal_digest.is_normal())
print(skewed_digest.is_normal())
True
False
Note: The significance threshold of the test can be adjusted via the optional argument
alpha(0.05 by default).
Vectorized mathematical functions
These methods take a sequence (e.g. list, array) argument and return the results as a list.
They are significantly faster than looping over quantile(q)/cdf(x) when estimating many () values at once.
self.quantile_vec(q)
Estimates the values at the quantiles q (between 0 and 1).
from fastdigest import TDigest
digest = TDigest.from_values(range(41))
results = digest.quantile_vec([0.25, 0.5, 0.75])
print(results)
[10.0, 20.0, 30.0]
self.cdf_vec(x)
Estimates the relative ranks (cumulative probabilities) of the values x.
digest = TDigest.from_values(range(41))
results = digest.cdf_vec([10, 20, 30])
print(results)
[0.25, 0.5, 0.75]
Updating a TDigest
self.update(x, w=None)
Updates a digest in-place with a single value x, with optional weight w.
from fastdigest import TDigest
digest = TDigest.from_values([1, 2, 3, 4, 5, 6])
digest.update(7)
digest.update(42, w=5.0)
print(f"{digest}: {digest.n_values} values, combined weight of {digest.mass()}")
TDigest(max_centroids=1000): 8 values, combined weight of 12.0
Note: This writes to a stack-allocated buffer before merging, which is significantly faster than
batch_updatefor small ad-hoc updates, e.g. in streaming applications.
self.batch_update(x, w=None)
Updates a digest in-place by merging a sequence of many values x at once. The optional weights w can be either a sequence of the same length as x, or a scalar that will be used as the weight for the entire batch.
import numpy as np
digest = TDigest()
digest.batch_update([1, 2, 3, 4, 5, 6])
digest.batch_update(np.arange(7, 11)) # using numpy array
digest.batch_update([1, 2], w=[1, 2]) # weighted individually
digest.batch_update([1, 2], w=2.0) # weighted with scalar
digest.batch_update([5]) # can also just be one value ...
digest.batch_update([]) # ... or empty
print(f"{digest}: {digest.n_values} values, combined weight of {digest.mass()}")
TDigest(max_centroids=1000): 15 values, combined weight of 18.0
Note: This directly performs a merge, which is faster than looping over
updateif you have the data in advance.
Merging TDigests
self.merge(other)
Creates a new TDigest instance from two digests.
Alias: + operator
from fastdigest import TDigest
digest1 = TDigest.from_values(range(50), max_centroids=1000)
digest2 = TDigest.from_values(range(50, 101), max_centroids=3)
merged = digest1 + digest2 # alias for digest1.merge(digest2)
print(f"{merged}: {len(merged)} centroids from {merged.n_values} values")
TDigest(max_centroids=1000): 53 centroids from 101 values
Note: When merging TDigests with different
max_centroidsparameters, the larger value is used for the new instance.
self.merge_inplace(other)
Updates a digest in-place with the centroids from an other TDigest.
Alias: += operator
digest = TDigest.from_values(range(50), max_centroids=30)
tmp_digest = TDigest.from_values(range(50, 101))
digest += tmp_digest # alias for: digest.merge_inplace(tmp_digest)
print(f"{digest}: {len(digest)} centroids from {digest.n_values} values")
TDigest(max_centroids=30): 30 centroids from 101 values
Note: Using this method leaves the
max_centroidsparameter of the calling TDigest unchanged.
merge_all(digests)
Creates a new TDigest instance from an iterable of digests that are efficiently merged in a single operation.
Module-level function.
from fastdigest import merge_all
# create a list of 10 digests from (non-overlapping) ranges
partial_digests = []
for i in range(10):
partial_data = range(i * 10, (i+1) * 10)
digest = TDigest.from_values(partial_data, max_centroids=30)
partial_digests.append(digest)
# merge all digests and create a new instance
merged = merge_all(partial_digests)
print(f"{merged}: {len(merged)} centroids from {merged.n_values} values")
TDigest(max_centroids=30): 30 centroids from 100 values
Note: This function has an optional argument
max_centroids. IfNone(default), the new instance inherits the largestmax_centroidsparameter of the input digests. Otherwise, the specified value is used.
Serialization
self.to_dict()
Returns a dict representation of the TDigest.
import json
from fastdigest import TDigest
digest = TDigest.from_values(range(101), max_centroids=3)
tdigest_dict = digest.to_dict()
print(json.dumps(tdigest_dict, indent=2))
{
"max_centroids": 3,
"mass": 101.0,
"sum": 5050.0,
"min": 0.0,
"max": 100.0,
"n_values": 101,
"centroids": [
{
"m": 10.5,
"c": 22.0
},
{
"m": 49.5,
"c": 56.0
},
{
"m": 89.0,
"c": 23.0
}
]
}
Note: In the "centroids" list, each centroid is represented as a dict with keys "m" (mean) and "c" (count/weight). The "max_centroids", "mass", "sum", "min", "max" and "n_values" keys are optional — if missing, their values are inferred from the centroids/set to default. This allows full backward compatibility with dicts created by the tdigest Python library.
TDigest.from_dict(tdigest_dict)
Creates a new TDigest instance from the tdigest_dict.
Static method.
restored = TDigest.from_dict(tdigest_dict)
print(f"{restored}: {len(restored)} centroids from {restored.n_values} values")
TDigest(max_centroids=3): 3 centroids from 101 values
self.to_bytes()
Returns a serialized binary representation of the TDigest.
digest = TDigest.from_values(range(101), max_centroids=3)
with open("digest.bin", "wb") as f:
f.write(digest.to_bytes())
Note: This is much faster and more efficient than
to_dict.
TDigest.from_bytes(data)
Creates a new TDigest instance from the serialized binary data.
Static method.
with open("digest.bin", "rb") as f:
restored = TDigest.from_bytes(f.read())
print(f"{restored}: {len(restored)} centroids from {restored.n_values} values")
TDigest(max_centroids=3): 3 centroids from 101 values
Note: You can also use the
picklemodule for serialization. This usesto_bytes/from_bytesinternally but produces a different format that is not interchangeable with TDigest's native methods.
Comparison
self.equals(other)
Returns True if both TDigests have identical centroids, properties and max_centroids, otherwise False.
Raises TypeError if other is not a TDigest.
Alternative (without type strictness): ==, != operators
from fastdigest import TDigest
digest = TDigest.from_values(range(101))
restored = TDigest.from_dict(digest.to_dict())
print(f"digest == restored: {digest.equals(restored)}")
digest == restored: True
Other methods and properties
self.copy()
Returns a copy of the instance.
self.is_empty()
Returns True if no data has been ingested yet.
self.max_centroids
Returns the max_centroids parameter. Can also be assigned to, changing future behavior of the instance.
self.n_values
Returns the total number of individual ingested values (disregarding weights).
Integer equivalent of mass() if no weighted updates were used.
self.n_centroids
Returns the number of centroids in the digest.
self.centroids
Returns the centroids as a list of (mean, weight) tuples.
Magic methods / operators
self == other: alias forself.equals(other)but withTypeErrorsuppressed → other types returnFalseself != other: alias fornot self.equals(other)but withTypeErrorsuppressed → other types returnTrueself + other: alias forself.merge(other)self += other: alias forself.merge_inplace(other)bool(digest): alias fornot digest.is_empty()len(digest): alias fordigest.n_centroidsiter(digest): returns an iterator overdigest.centroidscopy(digest),deepcopy(digest): alias fordigest.copy()str(digest),repr(digest): returns a string representation