Track C: Full Introduction
February 4, 2026 · View on GitHub
New to both RL and trading? Start here.
What You'll Learn
This track covers:
- Basic trading concepts (what is buying/selling)
- Basic RL concepts (what is an agent)
- How TensorTrade combines them
- Your first training run
Prerequisites
- Python programming experience
- Basic math (percentages, basic algebra)
- No prior trading or ML experience required
The Journey
┌─────────────────────────────────────────────────────────────┐
│ │
│ START HERE │
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────┐│
│ │ What is Trading? ││
│ │ - Buying and selling assets ││
│ │ - Making money from price changes ││
│ │ - Markets and exchanges ││
│ └───────────────────────────────────────────────────────┘│
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────┐│
│ │ What is RL? ││
│ │ - Learning from trial and error ││
│ │ - Agent, environment, reward ││
│ │ - How policies are learned ││
│ └───────────────────────────────────────────────────────┘│
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────┐│
│ │ TensorTrade Overview ││
│ │ - How the pieces fit together ││
│ │ - Running your first script ││
│ │ - Understanding the output ││
│ └───────────────────────────────────────────────────────┘│
│ │ │
│ v │
│ CONTINUE TO CORE TUTORIALS │
│ │
└─────────────────────────────────────────────────────────────┘
What is Trading?
The Basic Idea
Trading is buying something hoping it will be worth more later.
You buy: 1 Bitcoin at \$100,000
Time passes...
Bitcoin is now: \$110,000
You sell: 1 Bitcoin at \$110,000
Profit: \$110,000 - \$100,000 = \$10,000
The Catch: You Can Also Lose
You buy: 1 Bitcoin at \$100,000
Time passes...
Bitcoin drops to: \$90,000
You sell: 1 Bitcoin at \$90,000
Loss: \$90,000 - \$100,000 = -\$10,000
The Goal
Predict which direction prices will move:
- If you think price goes UP → BUY first, sell later
- If you think price goes DOWN → Don't buy (or sell if you have it)
What is Reinforcement Learning?
The Basic Idea
Teach a computer to make decisions by rewarding good outcomes.
Like training a dog:
- Dog sits → Give treat (positive reward)
- Dog jumps on table → No treat (no reward)
- Over time, dog learns: sitting = treats
Like training a trader:
- Agent buys → Price goes up → Positive reward
- Agent buys → Price goes down → Negative reward
- Over time, agent learns: when to buy
The Key Terms
| Term | Meaning | In TensorTrade |
|---|---|---|
| Agent | The learner/decision maker | A neural network |
| Environment | Where the agent acts | Simulated market |
| State | What the agent sees | Market data (prices, indicators) |
| Action | What the agent does | Buy, sell, or hold |
| Reward | Feedback signal | Profit or loss |
The Loop
1. Agent sees market state (prices, indicators)
2. Agent decides: BUY, SELL, or HOLD
3. Environment shows result (price moved up/down)
4. Agent receives reward (profit or loss)
5. Agent adjusts strategy based on reward
6. Repeat thousands of times
After many iterations, agent learns patterns:
"When RSI is low AND trend is up, buying usually works"
How TensorTrade Works
The Setup
# 1. Get market data
data = fetch_btc_data() # Historical prices
# 2. Create simulated market
exchange = Exchange(commission=0.1%) # 0.1% fee per trade
portfolio = Portfolio(starting_cash=\$10,000)
# 3. Create the RL environment
env = TradingEnv(
data=data,
portfolio=portfolio,
reward_scheme=PBR, # Reward based on position returns
)
# 4. Train the agent
agent = PPO() # The learning algorithm
agent.train(env, iterations=100)
# 5. Test the agent
results = agent.evaluate(test_data)
print(f"Profit: ${results.profit}")
What Happens During Training
Episode 1 (random behavior):
Agent: BUY, SELL, BUY, SELL, HOLD, BUY...
Result: Lost \$500 (random trading loses money)
Reward: Negative
Episode 10 (starting to learn):
Agent: BUY when RSI low, HOLD when uncertain...
Result: Lost \$200 (still learning)
Reward: Less negative
Episode 100 (learned patterns):
Agent: BUY at support, SELL at resistance...
Result: Gained \$100 (patterns are working!)
Reward: Positive
Episode 1000 (refined strategy):
Agent: Trades only with high confidence...
Result: Gained \$239 (direction prediction works!)
Reward: Consistently positive
Your First Run
Step 1: Install
# Create Python environment
python3.12 -m venv tensortrade-env
source tensortrade-env/bin/activate
# Install TensorTrade
pip install -r requirements.txt
pip install -e .
Step 2: Run Demo
python examples/training/train_simple.py
Step 3: Understand Output
======================================================================
Episode 1/5
======================================================================
Step | Action | USD Balance | BTC Balance | Net Worth | Reward
----------------------------------------------------------------------
1 | BUY | \$0.00 | 0.099800 | \$10,000.00 | +0.00
What this means:
- Step 1: First hour of trading
- Action BUY: Agent decided to buy Bitcoin
- USD Balance $0: All cash was used to buy BTC
- BTC Balance 0.099800: How much Bitcoin we now have
- Net Worth $10,000: Total value (BTC value + cash)
- Reward +0.00: No reward yet (price hasn't changed)
The Big Discovery
After all our experiments, we found:
┌─────────────────────────────────────────────────────┐
│ │
│ The agent CAN predict market direction! │
│ │
│ At 0% commission: +\$239 PROFIT │
│ At 0.1% commission: -\$650 LOSS │
│ │
│ The problem: Too many trades │
│ 2000 trades × 0.1% commission = \$2000 in fees │
│ │
└─────────────────────────────────────────────────────┘
The challenge isn't prediction - it's trading discipline.
What's Next?
You have two options:
Option A: Learn More Theory First
- Trading for RL People - Deeper trading concepts
- RL for Traders - Deeper RL concepts
Option B: Jump Into Practice
- Your First Run - Detailed walkthrough
- First Training - Train a real agent
Quick Glossary
| Term | Simple Definition |
|---|---|
| Agent | The AI that makes trading decisions |
| Episode | One complete trading simulation |
| Commission | Fee paid for each trade |
| P&L | Profit and Loss (money made/lost) |
| Reward | Learning signal given to agent |
| BSH | Buy/Sell/Hold action scheme |
| PBR | Position-Based Returns (reward method) |
| PPO | The RL algorithm we use |
| Overfitting | Memorizing training data, failing on new data |
| Overtrading | Trading too often, paying too much in fees |
Key Takeaways
- Trading = buying low, selling high (hopefully)
- RL = learning from rewards
- TensorTrade = simulated market + RL agent
- The challenge = not prediction, but trading frequency
- Commission = the profit killer
Welcome to TensorTrade! Start with the foundations tutorials.