Tiny Project 1: Using Reinforcement Learning to Trade Stocks

Github: https://github.com/Kiiiiii123/TradeWithRL.

Introduction

As mentioned in previous blogs, stock exchange market is an excellent environment for reinforcement learning research. However, this tiny project is just for fun… 😂

OpenAI Gym Stock Trading Environment

We implement the env.StockTradingEnv class in CustomEnv.py, inheriting from gym.Env class.

Observation Space

Our trading strategy observes the parameters of every stock, such as opening price, closing price, volume, etc. Since some parameters have large values, to guarantee the convergence of network training, the observation values must be normalized into [-1, 1].

Parameters	Description	Details
date	trading date	Format: YYYY-MM-DD
code	stock code	Format: sh.600000; sh: Shanghai; sz: Shenzhen
open	opening price	Accuracy: 4 decimal places; Unit: yuan
high	top price	Accuracy: 4 decimal places; Unit: yuan
low	minimum price	Accuracy: 4 decimal places; Unit: yuan
close	closing price	Accuracy: 4 decimal places; Unit: yuan
preclose	closing price yday	Accuracy: 4 decimal places; Unit: yuan
volume	volume of trade	Unit: share
amount	stock turnover	Accuracy: 4 decimal places; Unit: yuan
adjustflag	adjust status	no adjust, forward adjust, backward adjust
turn	turnover rate (%)	Accuracy: 6 decimal places
tradestatus	stock trading status	1: normal trading; 0: suspended
pctChg	change (%)	Accuracy: 6 decimal places
peTTM	rolling price-earnings ratio	Accuracy: 6 decimal places
psTTM	rolling price-sales ratio	Accuracy: 6 decimal places
pcfNcfTTM	rolling price-cash ratio	Accuracy: 6 decimal places
pbMRQ	P/B ratio	Accuracy: 6 decimal places

Action Space

We assume that there are only three operations to choose from: buy, sell and hold. We define the action as an array of length 2. The discrete value of action[0] represents different stock operations, the continuous value of action[1] represents a buy or sell percentage.

action[0]	Description
1	buy: action[1]
2	sell: action[1]
3	hold

Note that when action[0] = 3, our trading strategy will not buy nor sell stock, meanwhile, the value of action[1] is meaningless. Our trading agent learns about this during the training process.

Reward Function

When trained in a stock trading environment, the strategy is most concerned with its current profitability, so we use the current profit as our reward function.

self.net_worth = self.balance + self.shares_held * current_price
# profit
reward = self.net_worth - INITIAL_ACCOUNT_BALANCE
reward = 1 if reward > 0 else reward = -100

In order to accelerate the training process of our strategy network and get a profitable strategy, when the profit is negative, a large penalty is given to the network (-100).

Trading Strategy

Since the action[1] value is continuous, PPO, the optimization algorithm based on policy-gradient is used. We just use the python implementation in stable-baselines.

Experiment

Installation

# create and active virtual environment
virtualenv -p python3.6 venv
source ./venv/bin/activate
# dependencies
pip install -r requirements.txt

Dataset

Our stock data comes from baostock, a free and open-source stock data platform. We can download stock data (pandas DataFrame) through its python API.

>> pip install baostock -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

>> python get_stock_data.py

In get_stock_data.py, the stock data of the past 20 years were divided into training set and the data of the last month as the test set. The test set is necessary for verifying the effectiveness of our reinforcement learning strategy.

1990-01-01 ~ 2019-11-29	2019-12-01 ~ 2019-12-31
training set	test set

Results

Single-Stock
- Initial Balance: 10000
- Stock Code: sz.002714 (Muyuan Food stuff Co,Ltd)
- Training Set: stockdata/train/sz.002714.牧原股份.csv
- Test Set: stockdata/test/sz.002714.牧原股份.csv
- Simulate 20 days, the final profit of about 75

Multi-Stock

We choose 1002 stocks and train the trading strategies with the training data. Run visualize_bacth_testing.py, then we can get the backtesting result of our strategies as below:

Github: https://github.com/Kiiiiii123/TradeWithRL.

Introduction

As mentioned in previous blogs, stock exchange market is an excellent environment for reinforcement learning research. However, this tiny project is just for fun… 😂

OpenAI Gym Stock Trading Environment

We implement the env.StockTradingEnv class in CustomEnv.py, inheriting from gym.Env class.

Observation Space

Our trading strategy observes the parameters of every stock, such as opening price, closing price, volume, etc. Since some parameters have large values, to guarantee the convergence of network training, the observation values must be normalized into [-1, 1].

Action Space

We assume that there are only three operations to choose from: buy, sell and hold. We define the action as an array of length 2. The discrete value of action[0] represents different stock operations, the continuous value of action[1] represents a buy or sell percentage.

Note that when action[0] = 3, our trading strategy will not buy nor sell stock, meanwhile, the value of action[1] is meaningless. Our trading agent learns about this during the training process.

Reward Function

When trained in a stock trading environment, the strategy is most concerned with its current profitability, so we use the current profit as our reward function.

In order to accelerate the training process of our strategy network and get a profitable strategy, when the profit is negative, a large penalty is given to the network (-100).

Trading Strategy

Since the action[1] value is continuous, PPO, the optimization algorithm based on policy-gradient is used. We just use the python implementation in stable-baselines.

Experiment

Installation

Dataset

Our stock data comes from baostock, a free and open-source stock data platform. We can download stock data (pandas DataFrame) through its python API.

In get_stock_data.py, the stock data of the past 20 years were divided into training set and the data of the last month as the test set. The test set is necessary for verifying the effectiveness of our reinforcement learning strategy.

Results

Single-Stock

Initial Balance: 10000

Stock Code: sz.002714 (Muyuan Food stuff Co,Ltd)

Training Set: stockdata/train/sz.002714.牧原股份.csv

Test Set: stockdata/test/sz.002714.牧原股份.csv

Simulate 20 days, the final profit of about 75

Multi-Stock

We choose 1002 stocks and train the trading strategies with the training data. Run visualize_bacth_testing.py, then we can get the backtesting result of our strategies as below: