README.md

May 19, 2026 ยท View on GitHub

throttled-py

๐Ÿ”ง High-performance Python rate limiting library with multiple algorithms (Fixed Window, Sliding Window, Token Bucket, Leaky Bucket & GCRA) and storage backends (Redis, In-Memory).

Python Coverage Status Coverage Status Downloads Welcome Issue Featured๏ฝœHelloGitHub

็ฎ€ไฝ“ไธญๆ–‡ | English

๐Ÿ”ฐ Installation | ๐ŸŽจ Quick Start | ๐Ÿ“ Usage | โš™๏ธ Data Models | ๐Ÿ“Š Benchmarks | ๐Ÿƒ Inspiration | ๐Ÿ“š Version History | ๐Ÿ“„ License

โœจ Features

๐Ÿ”ฐ Installation

$ pip install throttled-py

Note: v3.x requires Python >=3.10. If you are using Python 3.8/3.9, install throttled-py<3.0.0.

1) Optional Dependencies

Starting from v2.0.0, only core dependencies are installed by default.

To enable additional features, install optional dependencies as follows (multiple extras can be comma-separated):

$ pip install "throttled-py[redis]"

$ pip install "throttled-py[otel]"

$ pip install "throttled-py[redis,otel]"
ExtraDescription
memoryIn-Memory backend is available by default (memory extra installs no additional dependencies).
redisUse Redis as storage backend.
otelEnable OpenTelemetry hook support.
fastapiFastAPI integration with decorator-based rate limiting.

๐ŸŽจ Quick Start

1) Core API

2) Example

from throttled import RateLimiterType, Throttled, utils

throttle = Throttled(
    # ๐Ÿ“ˆ Use Token Bucket algorithm
    using=RateLimiterType.TOKEN_BUCKET.value,
    # ๐Ÿชฃ Set quota: 1,000 tokens per second (limit), bucket size 1,000 (burst)
    quota="1000/s burst 1000",
    # ๐Ÿ“ By default, global MemoryStore is used as the storage backend.
)

def call_api() -> bool:
    # ๐Ÿ’ง Deduct 1 token for key="/ping"
    result = throttle.limit("/ping", cost=1)
    return result.limited

if __name__ == "__main__":
    # ๐Ÿ’ป Python 3.12.10, Linux 5.4.119-1-tlinux4-0009.1, Arch: x86_64, Specs: 2C4G.
    # โœ… Total: 100000, ๐Ÿ•’ Latency: 0.0068 ms/op, ๐Ÿš€ Throughput: 122513 req/s (--)
    # โŒ Denied: 98000 requests
    benchmark: utils.Benchmark = utils.Benchmark()
    denied_num: int = sum(benchmark.serial(call_api, 100_000))
    print(f"โŒ Denied: {denied_num} requests")

3) Asynchronous

The core API is the same for synchronous and asynchronous code. Just replace from throttled import ... with from throttled.asyncio import ... in your code.

For example, rewrite 2) Example to asynchronous:

import asyncio
from throttled.asyncio import RateLimiterType, Throttled, utils

throttle = Throttled(
    using=RateLimiterType.TOKEN_BUCKET.value,
    quota="1000/s burst 1000",
)


async def call_api() -> bool:
    result = await throttle.limit("/ping", cost=1)
    return result.limited


async def main():
    benchmark: utils.Benchmark = utils.Benchmark()
    denied_num: int = sum(await benchmark.async_serial(call_api, 100_000))
    print(f"โŒ Denied: {denied_num} requests")

if __name__ == "__main__":
    asyncio.run(main())

๐Ÿ“ Usage

1) Basic Usage

Function Call

from throttled import Throttled

# Default: In-Memory storage, Token Bucket algorithm, 60 reqs / min.
throttle = Throttled()

# Deduct 1 request -> RateLimitResult(limited=False,
# state=RateLimitState(limit=60, remaining=59, reset_after=1, retry_after=0))
print(throttle.limit("key", 1))
# Check state -> RateLimitState(limit=60, remaining=59, reset_after=1, retry_after=0)
print(throttle.peek("key"))

# Deduct 60 requests (limited) -> RateLimitResult(limited=True,
# state=RateLimitState(limit=60, remaining=59, reset_after=1, retry_after=60))
print(throttle.limit("key", 60))

Decorator

from throttled import Throttled, exceptions

@Throttled(key="/ping", quota="1/m")
def ping() -> str:
    return "ping"

ping()

try:
    ping()  # Raises LimitedError
except exceptions.LimitedError as exc:
    print(exc)  # Rate limit exceeded: remaining=0, reset_after=60, retry_after=60

Context Manager

You can use the context manager to limit the code block. When access is allowed, return RateLimitResult.

If the limit is exceeded or the retry timeout is exceeded, it will raise LimitedError.

from throttled import Throttled, exceptions

def call_api():
    print("doing something...")

throttle: Throttled = Throttled(key="/api/v1/users/", quota="1/m")
with throttle as rate_limit_result:
    print(f"limited: {rate_limit_result.limited}")
    call_api()

try:
    with throttle:
        call_api()
except exceptions.LimitedError as exc:
    print(exc)  # Rate limit exceeded: remaining=0, reset_after=60, retry_after=60

Wait & Retry

By default, rate limiting returns RateLimitResult immediately.

You can specify a timeout to enable wait-and-retry behavior. The rate limiter will wait according to the retry_after value in RateLimitState and retry automatically.

Returns the final RateLimitResult when the request is allowed or timeout reached.

from throttled import RateLimiterType, Throttled, utils

throttle = Throttled(
    using=RateLimiterType.GCRA.value,
    quota="100/s burst 100",
    # โณ Set timeout=1 to enable wait-and-retry (max wait 1 second)
    timeout=1,
)

def call_api() -> bool:
    # โฌ†๏ธโณ Function-level timeout overrides global timeout
    result = throttle.limit("/ping", cost=1, timeout=1)
    return result.limited

if __name__ == "__main__":
    # ๐Ÿ‘‡ The actual QPS is close to the preset quota (100 req/s):
    # โœ… Total: 1000, ๐Ÿ•’ Latency: 35.8103 ms/op, ๐Ÿš€ Throughput: 111 req/s (--)
    # โŒ Denied: 8 requests
    benchmark: utils.Benchmark = utils.Benchmark()
    denied_num: int = sum(benchmark.concurrent(call_api, 1_000, workers=4))
    print(f"โŒ Denied: {denied_num} requests")

2) Storage Backends

Redis

You only need very simple configuration, and it supports connecting to Redis standalone, sentinel, and cluster modes.

The following example uses Redis as the storage backend, options supports all Redis configuration items, see RedisStore Options.

from throttled import RateLimiterType, Throttled, store

@Throttled(
    key="/api/products",
    using=RateLimiterType.TOKEN_BUCKET.value,
    quota="1/m",
    store=store.RedisStore(
        # Standalone mode
        server="redis://127.0.0.1:6379/0",
        # Sentinel mode
        # server="redis+sentinel://:yourpassword@host1:26379,host2:26379/mymaster"
        # Cluster mode
        # server="redis+cluster://:yourpassword@host1:6379,host2:6379",
        options={}
    ),
)
def products() -> list:
    return [{"name": "iPhone"}, {"name": "MacBook"}]

products()  # Success
products()  # Raises LimitedError

In-Memory

By default, a global MemoryStore instance with a maximum capacity of 1024 is used as the storage backend when no storage backend is specified. Therefore, it is usually not necessary to manually create a MemoryStore instance.

Different instances mean different storage spaces, if you want to throttle the same Key at different locations in your program, make sure that Throttled receives the same MemoryStore and uses a consistent Quota.

The following example uses memory as the storage backend and throttles the same Key on ping and pong:

from throttled import Throttled, store

mem_store = store.MemoryStore()

@Throttled(key="ping-pong", quota="1/m", store=mem_store)
def ping() -> str: return "ping"

@Throttled(key="ping-pong", quota="1/m", store=mem_store)
def pong() -> str: return "pong"

ping()  # Success
pong()  # Raises LimitedError

3) Algorithms

The rate limiting algorithm is specified by the using parameter. The supported algorithms are as follows:

from throttled import RateLimiterType, Throttled

throttle = Throttled(
    # ๐ŸŒŸSpecifying a current limiting algorithm
    using=RateLimiterType.FIXED_WINDOW.value, 
    quota="1/m"
)
assert throttle.limit("key", 2).limited is True

4) Quota Configuration

from throttled import Throttled

throttle = Throttled(
    key="/api/ping",
    quota="100/s",
    # quota="100/s burst 200",
    # quota="100 per second",
    # quota="100 per second burst 200",
)


if __name__ == "__main__":
    print(throttle.limit())
  • [1] quota accepts a readable string with these patterns:

    • n / unit
    • n / unit burst <burst>
    • n per unit
    • n per unit burst <burst>
  • [2] unit supports s / m / h / d / w.

  • [3] burst means extra bucket capacity for traffic spikes, and takes effect for: TOKEN_BUCKET / LEAKING_BUCKET / GCRA.

  • [4] If burst is omitted in string mode, it defaults to n in the same rule. For example, 1/s is equivalent to 1/s burst 1.

โš™๏ธ Data Models & Configuration

1) RateLimitResult

RateLimitState represents the result after executing the RateLimiter for the given key.

FieldTypeDescription
limitedboolLimited represents whether this request is allowed to pass.
stateRateLimitStateRateLimitState represents the result after executing the RateLimiter for the given key.

2) RateLimitState

RateLimitState represents the current state of the rate limiter for the given key.

FieldTypeDescription
limitintLimit represents the maximum number of requests allowed to pass in the initial state.
remainingintRemaining represents the maximum number of requests allowed to pass for the given key in the current state.
reset_afterfloatResetAfter represents the time in seconds for the RateLimiter to return to its initial state. In the initial state, Limit=Remaining.
retry_afterfloatRetryAfter represents the time in seconds for the request to be retried, 0 if the request is allowed.

3) Quota

Quota represents the quota limit configuration.

FieldTypeDescription
burstintOptional burst capacity that allows exceeding the rate limit momentarily(supports Token / Leaky Bucket, GCRA).
rateRateThe base rate limit configuration.

4) Rate

Rate represents the rate limit configuration.

FieldTypeDescription
perioddatetime.timedeltaThe time period for which the rate limit applies.
limitintThe maximum number of requests allowed within the specified period.

5) Store Configuration

Common Parameters

ParamDescriptionDefault
serverRedis connection URL, you can use it to connect to Redis in any deployment mode."redis://localhost:6379/0"
optionsStorage-specific configurations{}

RedisStore Options

RedisStore is developed based on the Redis API provided by redis-py.

In terms of Redis connection configuration management, the configuration naming of django-redis is basically used to reduce the learning cost.

ParameterDescriptionDefault
SOCKET_TIMEOUTConnectionPool parameters.null
SOCKET_CONNECT_TIMEOUTConnectionPool parameters.null
CONNECTION_POOL_KWARGSConnectionPool construction parameters.{}
REDIS_CLIENT_KWARGSRedisClient construction parameters.{}
SENTINEL_KWARGSSentinel construction parameters.{}
CONNECTION_FACTORY_CLASSConnectionFactory is used to create and maintain ConnectionPool.Automatically select via the server scheme by default.
Standalone: "throttled.store.ConnectionFactory"
Sentinel:"throttled.store.SentinelConnectionFactory"
Cluster: "throttled.store.ClusterConnectionFactory"
REDIS_CLIENT_CLASSRedisClient import path.Automatically select sync/async mode by default.
Sync(Standalone/Sentinel): "redis.client.Redis"
Async(Standalone/Sentinel): "redis.asyncio.client.Redis"
Sync(Cluster): "redis.cluster.RedisCluster"
Async(Cluster): "redis.asyncio.cluster.RedisCluster"
CONNECTION_POOL_CLASSConnectionPool import path.Automatically select via the server scheme and sync/async mode by default.
Sync(Standalone): "redis.connection.ConnectionPool"
Async(Standalone): "redis.asyncio.connection.ConnectionPool"
Sync(Sentinel): "redis.sentinel.SentinelConnectionPool"
Async(Sentinel): "redis.asyncio.sentinel.SentinelConnectionPool"
Cluster: "Disabled"
SENTINEL_CLASSSentinel import path.Automatically select sync/async mode by default.
Sync: "redis.Sentinel"
Async: "redis.asyncio.Sentinel"

MemoryStore Options

MemoryStore is essentially a LRU Cache based on memory with expiration time.

ParameterDescriptionDefault
MAX_SIZEMaximum capacity. When the number of stored key-value pairs exceeds MAX_SIZE, they will be eliminated according to the LRU policy.1024

6) Exception

All exceptions inherit from throttled.exceptions.BaseThrottledError.

LimitedError

When a request is throttled, an exception is thrown, such as: Rate limit exceeded: remaining=0, reset_after=60, retry_after=60..

FieldTypeDescription
rate_limit_resultRateLimitResultThe result after executing the RateLimiter for the given key.

DataError

Thrown when the parameter is invalid, such as: Invalid key: None, must be a non-empty key..

๐Ÿ“Š Benchmarks

1) Test Environment

  • Python Version: 3.13.1 (CPython implementation)
  • Operating System: macOS Darwin 23.6.0 (ARM64 architecture)
  • Redis Version: 7.x (local connection)

2) Performance Metrics

Throughput in req/s, Latency in ms/op.

Algorithm TypeIn-Memory (Single-thread)In-Memory (16 threads)Redis (Single-thread)Redis (16 threads)
Baseline [1]1,692,307 / 0.0002135,018 / 0.0004 [2]17,324 / 0.057116,803 / 0.9478
Fixed Window369,635 / 0.002357,275 / 0.253316,233 / 0.061015,835 / 1.0070
Sliding Window265,215 / 0.003449,721 / 0.299612,605 / 0.078613,371 / 1.1923
Token Bucket365,678 / 0.002354,597 / 0.282113,643 / 0.072713,219 / 1.2057
Leaky Bucket364,296 / 0.002354,136 / 0.288713,628 / 0.072712,579 / 1.2667
GCRA373,906 / 0.002353,994 / 0.289512,901 / 0.076912,861 / 1.2391
  • [1] Baseline: In-Memory - dict[key] += 1, Redis - INCRBY key increment.
  • [2] In-Memory concurrent baseline uses threading.RLock for thread safety.
  • [3] Performance: In-Memory - ~2.5-4.5x dict[key] += 1 operations, Redis - ~1.06-1.37x INCRBY key increment operations.
  • [4] Benchmark code: tests/benchmarks/test_throttled.py.

๐Ÿƒ Inspiration

Rate Limiting, Cells, and GCRA, by Brandur Leach

๐Ÿ“š Version History

See CHANGELOG

๐Ÿ“„ License

The MIT License