ECHO-GL: Earnings Calls-driven Heterogeneous Graph Learning for Stock Movement Prediction
February 26, 2024 · View on GitHub
Abstract
Stock movement prediction serves an important role in quantitative trading. Despite advances in existing models that enhance stock movement prediction by incorporating stock relations, these prediction models face two limitations, i.e., constructing either insufficient or static stock relations, which fail to effectively capture the complex dynamic stock relations because such complex dynamic stock relations are influenced by various factors in the ever-changing financial market. To tackle the above limitations, we propose a novel stock movement prediction model, ECHO-GL, based on stock relations derived from earnings calls. ECHO-GL not only constructs comprehensive stock relations by exploiting the rich semantic information in the earnings calls but also captures the movement signals between related stocks based on multimodal and heterogeneous graph learning. Moreover, ECHO-GL customizes learnable stock stochastic processes based on the post earnings announcement drift (PEAD) phenomenon to generate the temporal stock price trajectory, which can be easily plugged into any investment strategy with different time horizons to meet investment demands. Extensive experiments on two financial datasets demonstrate the effectiveness of ECHO-GL on stock price movement prediction tasks together with high prediction accuracy and trading profitability.
About This Repo
This repository includes:
- code for the Earnings Calls-driven HeterOgeneous Graph Learning (ECHO-GL) model in our paper "ECHO-GL: Earnings Calls-driven Heterogeneous Graph Learning for Stock Movement Prediction."
<!--, [paper](paper link).--> - constructed earnings call-driven heterogeneous graphs (E-Graph in our paper), which model the complex stock relations derived from earnings calls.
Environment
Python version and packages required to install for executing the code.
Python >=3.8
PyTorch >=2.0.1
torchsde >= 0.2.5
Data
All data, including stock price data and constructed E-Graph, are under the data folder.
Note that, for pre-processed earnings call data, we adopted two widely studied earnings call datasets Qin's [1] and MAEC [2], both of which have provided pre-processed data.
Stock price data
We collect dividend-adjusted closing prices from Yahoo Finance). Collected price data is under the historical_price folder.
E-Graph
In our paper, we introduce an earnings call-driven heterogeneous dynamic graph (termed as E-Graph) that portrays comprehensive stock relations in the current market.
E-Graph encompasses four types of nodes(stock price node(P), earnings call text sentence node(S), topic node(O), and entity node(E)) and four types of edges(P-S, S-O, S-E, and E-E).
The specific E-Graph construction algorithm has been shown in Section 4.1 in our paper. The constructed E-Graph data is under the E-Graph folder.
Code
| Script | Function |
|---|---|
| ECHO_GL.py | ECHO-GL model |
| container.py | ECHO-GL model container for training |
| run_ECHO_GL.py | Train a model of ECHO_GL |
Note that, since ECHO-GL is implemented in an integrated quantitative system under development, the quantitative system cannot be open-sourced for the time being. Therefore, There is only the code of ECHO-GL's model, which can not run at present. I hope the code can help you better understand ECHO-GL's paper.
After the quantitative system development is completed, we will provide the complete code as soon as possible. I believe this code will come out in the near future.