The Three Pillars of TensorTrade

February 4, 2026 · View on GitHub

TensorTrade sits at the intersection of three complex domains. Understanding how they connect is essential before diving into code.

Learning Objectives

After reading this tutorial, you will understand:

What reinforcement learning brings to trading
What trading concepts you need for RL
How TensorTrade bridges these domains

The Three Domains

        ┌─────────────────────┐
        │   Reinforcement     │
        │      Learning       │
        │                     │
        │  • Agent learns     │
        │  • From rewards     │
        │  • Through actions  │
        └──────────┬──────────┘
                   │
    ┌──────────────┼──────────────┐
    │              │              │
    v              v              v
┌───────────┐ ┌───────────┐ ┌───────────┐
│  Trading  │ │TensorTrade│ │   Data    │
│  Domain   │ │           │ │Engineering│
│           │ │  Bridges  │ │           │
│ • Markets │ │   all 3   │ │ • OHLCV   │
│ • Orders  │ │  domains  │ │ • Features│
│ • Risk    │ │           │ │ • Streams │
└───────────┘ └───────────┘ └───────────┘

Pillar 1: Reinforcement Learning

RL is about learning through trial and error. An agent takes actions in an environment and receives rewards.

The RL Loop

┌─────────────────────────────────────────┐
│                                         │
│   Environment                           │
│   ┌─────────────────────────────────┐  │
│   │                                 │  │
│   │  State: [price, rsi, trend...]  │  │
│   │                                 │  │
│   └──────────────┬──────────────────┘  │
│                  │                      │
│                  v                      │
│   ┌─────────────────────────────────┐  │
│   │  Agent sees state, picks action │  │
│   │  Action: BUY / SELL / HOLD      │  │
│   └──────────────┬──────────────────┘  │
│                  │                      │
│                  v                      │
│   ┌─────────────────────────────────┐  │
│   │  Environment returns reward     │  │
│   │  Reward: +\$50 or -\$30           │  │
│   └──────────────┬──────────────────┘  │
│                  │                      │
│                  v                      │
│   Agent updates its policy to get      │
│   more rewards in the future           │
│                                         │
└─────────────────────────────────────────┘

Key RL Concepts for Trading

Concept	In Trading
State	Market data: price, volume, indicators
Action	Trade decision: buy, sell, hold
Reward	Profit/loss from the action
Policy	The strategy the agent learns
Episode	One complete trading period

Why RL for Trading?

Traditional algorithmic trading uses fixed rules:

IF RSI < 30 THEN BUY
IF RSI > 70 THEN SELL

RL discovers rules from data:

Agent learns: "When RSI < 30 AND volume is rising AND
trend is up, buying has historically led to profit"

The agent can discover patterns humans might miss.

Pillar 2: Trading Domain

You don't need to be a quant, but you need these fundamentals.

The Order Management System (OMS)

TensorTrade simulates a real trading system:

┌─────────────────────────────────────────────────┐
│                    Portfolio                     │
│  ┌──────────────┐      ┌──────────────────────┐│
│  │ USD Wallet   │      │ BTC Wallet           ││
│  │ \$10,000      │      │ 0.0 BTC              ││
│  └──────────────┘      └──────────────────────┘│
│                                                 │
│  Net Worth = USD + (BTC × Current Price)       │
└─────────────────────────────────────────────────┘
          │
          │ Agent says: BUY
          v
┌─────────────────────────────────────────────────┐
│                    Exchange                      │
│                                                 │
│  Order: BUY 0.1 BTC @ \$100,000                  │
│  Commission: 0.1% = \$10                         │
│                                                 │
│  Execution:                                     │
│    USD Wallet: \$10,000 - \$10,010 = -\$10        │
│    BTC Wallet: 0.0 + 0.0999 = 0.0999 BTC       │
└─────────────────────────────────────────────────┘

Key Trading Concepts

Concept	Definition	In TensorTrade
Position	What you own	Cash, BTC, or both
Commission	Trading fee	0.1% typical
Net Worth	Total value	USD + BTC value
P&L	Profit & Loss	Final - Initial
Slippage	Price movement during execution	Simulated

The Commission Problem

This is crucial. From our experiments:

Agent trades 2,000 times per month
Each trade costs 0.1% commission
2,000 × 0.1% × \$10,000 = \$2,000 in fees

Agent's direction prediction profit: +\$239
Commission cost: -\$2,000
Net result: -\$1,761 LOSS

The agent can predict direction, but trades too much.

Pillar 3: Data Engineering

RL agents learn from features. Better features = better learning.

OHLCV Data

The foundation of all trading data:

┌─────────────────────────────────────────────────┐
│                 One Candle (1 hour)             │
│                                                 │
│  Open:   \$100,000  (price at start)            │
│  High:   \$101,500  (highest price)             │
│  Low:    \$99,200   (lowest price)              │
│  Close:  \$100,800  (price at end)              │
│  Volume: 1,234 BTC (amount traded)             │
└─────────────────────────────────────────────────┘

Feature Engineering

Raw OHLCV data isn't enough. The agent needs derived features:

# Returns: How much did price change?
returns_1h = (close - prev_close) / prev_close

# RSI: Is market overbought/oversold?
rsi = compute_rsi(close, period=14)

# Trend: Is price above/below average?
trend = (close - sma_50) / sma_50

Scale-Invariant Features

Critical insight from experiments: Raw prices don't generalize.

BAD:  price = \$100,000  (depends on when you look)
GOOD: returns = +2.5%   (works at any price level)

BAD:  volume = 1,234 BTC  (varies wildly)
GOOD: volume_ratio = current / 20-day average  (normalized)

Our best models use 13 scale-invariant features, not 34 raw indicators.

How TensorTrade Connects Everything

┌────────────────────────────────────────────────────────────────┐
│                       TensorTrade                              │
│                                                                │
│  DATA LAYER                                                    │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  DataFeed: OHLCV + Derived Features                      │ │
│  │  Stream API for reactive data processing                 │ │
│  └──────────────────────────────────────────────────────────┘ │
│                              │                                 │
│                              v                                 │
│  ENVIRONMENT LAYER                                             │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  TradingEnv: Gym-compatible environment                  │ │
│  │  • Observer: Creates state from DataFeed                 │ │
│  │  • ActionScheme: Converts actions to orders              │ │
│  │  • RewardScheme: Computes learning signal                │ │
│  └──────────────────────────────────────────────────────────┘ │
│                              │                                 │
│                              v                                 │
│  EXECUTION LAYER                                               │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  OMS: Simulates real trading                             │ │
│  │  • Exchange: Price feed + commission                     │ │
│  │  • Broker: Order execution                               │ │
│  │  • Portfolio: Wallet management                          │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                │
└────────────────────────────────────────────────────────────────┘