The Bottle Rocket Pattern In The Stock Market

06 Oct 2017

An interesting day-trading pattern is observed in the stock market. We call it the Bottle-Rocket Pattern. The easiest way to describe the pattern is with an example. Figures 1 and 2 show charts of Teladoc, Inc. (TDOC) that traded on May 9, 2017.

Our approach to analyzing the Bottle Rocket pattern is to use Machine Learning. This document will compare the following systems:

Tensorflow: https://www.tensorflow.org/
H2O: https://www.h2o.ai/
scikit-learn: http://scikit-learn.org/stable/
CNTK: https://www.microsoft.com/en-us/cognitive-toolkit/

Statement of the problem

The challenge is to determine:

If there is information in the trade data that foreshadows the beginning of a Bottle Rocket pattern, and can Machine Learning predict a successful outcome? By this we mean the altitude of the bottle rocket (e.g. price of the stock) has continued upward for at least 2% more.
Can a Machine Learning algorithm detect the pattern early enough (in real time) to take advantage of the pattern?

Design of the database

The database, from which the training and testing datasets were derived, contains all the stocks on the major exchanges that have:

a total volume of at least 500,000 shares,
a market capitalization of at least $50 million dollars,
and trade at least 600 times in 10 second intervals

at the time when the stock is added to the database.

The format of the trade files is: trade-volume, price and time. The time field is in seconds since the market’s opening. The data is added every ten seconds, so there are a possible 23,400 trading intervals per day (6.5 hours x 60 minutes/hour x 60 seconds/minute). The dots on Figures 1 and 2 are the summaries of each 10-second interval. We show the last 120 intervals (blue dots) towards yesterday’s close to put today’s opening activity in perspective.

Training and Testing Dataset format

As new 10-second interval data comes in, five predictors (also known as “features” in Machine Learning terminology) are computed. The training and testing dataset consist of these five predictor columns, plus a response variable, which is the actual outcome.

If we are doing “classification”, we use a response column (labeled “Altitude”) that contains “1” for a successful pattern (the rocket proceeded upward for another 2%), and “0” for a pattern that failed (the rocket fizzled out).

If we are doing “regression”, we use a response column (labeled “Gain”) that contains the actual percent gain. For example, the gain in the TDOC trade was 9.2%, as show in Figure 2. The response column for “regression” analysis is, therefore, floating point numbers.

The first 8 rows of the training dataset (train.csv) is:

Table 1. Training and Testing dataset format

Cleaning the data

We clean the data under manual supervision, using the HedgeTools application. Outliers are removed before the training/testing dataset were created. Therefore, it is not necessary to do any outlier removal on the dataset.

The Candidate Filter

The five predictor functions act as a filter that is applied before a trained Machine Learning algorithm is used. This is done for practical reasons. We must recognize a pattern within 10 seconds. The trained algorithms that we currently use require calculations that last from 1-3 seconds. We need to prune the candidate list down from about 2,000 to about 3. The trained algorithm is then used on the 3 candidates. The design and implementation of the Predictor filters are very simple (and hence fast). Also, the filter must be course enough to not filter out desirable patterns. This, of course, will result in letting a lot of bad patterns through the filter. This results in a training dataset that is out-of-balance by about a factor of 8.

Speed is everything. In the case of a Deep Neural Network, the processing time is much worse that one second. The average is about 3.5 seconds! This puts extra importance on the accuracy of the prediction.

The Thrust Predictor

The amount of fuel burning in a Bottle Rocket is analogous to the trading volume of a stock. This is best illustrated by the chart show in Figure 3.

The volume for today is shown a the red line. All volumes in the same interval periods during the previous 10 days is shown as gray lines, and the average volume of the last 10 days is shown as the black line. Figure 3 clearly shows that today’s volume is a significant increase over the volume in the proceeding ten days.

The Velocity Predictor

This predictor is a function of the percent change in the price. The velocity is the slope of red regression curve through the percent change in the price (blue dots).

The OnBalRun Predictor

This predictor is a function of the On-Balance percent price change (blue dots).

The vwapGain Predictor

This predictor is a function of volume-weighted price. The price data is the last 48 intervals that was taken 8 minutes into the market). This predictor is shown in Figure 6.

Summary

Each row in the dataset is a result of supervised analysis that is done each day. As we detect Bottle Rocket patterns during the trading day (using the HedgeTools application), we log the candidates in a file. This log file is then used to create the dataset.

We began this dataset just after we learned that Donald Trump was elected President. We conjectured that the stock market would be different from any market we have seen before. Trading patterns come and go, and we were worried that the Bottle Rocket pattern may disappear. Even though the Trump market is notable by low volume (and a low VIX), the pattern still occurs. Professor Robert J. Shiller of Yale University may have an explanation as to why the Bottle Rocket pattern endures. But, more on that latter. Now, on to the Machine Learning analysis.