Hidden Markov Models - From Speech Recognition To Finance 3.0
- Adaptive Alph
- Jun 6, 2020
- 11 min read
Market Regimes
When human emotion is added to a dynamic process such as the financial market, unpredictable events will take place, as changes in macroeconomic variables, government policies and regulatory environments can cause new market regimes to emerge. These various market regimes will establish new relationships in the correlation, variation and covariance of asset returns, which limits the effectiveness of traditional time series analysis, which depends on stationarity (that relationships between variables stays the same). Say that a company releases a negative earnings forecast. First level thinking would suggest that the company’s stock price should decrease in value. However, a second level thinker would not judge absolute earnings and instead compare the earnings release with the expected value and the current market regime, as a negative earnings report could have a positive impact on the stock price, but perhaps only if the earnings forecast was less negative than expected and if the current market regime would allow it. A higher than 3-dimensional analysis requires computers and precisely as Einstein’s physics improves upon Newtonian physics for predicting processes occurring in the Cosmos and Quantum physics later improves upon Einstein for prediction within sub-atomic levels, learning computers are able to improve upon human prediction because of a computer’s ability to quickly analyze tons of information and complex non-linear relationships. Two pioneers within financial machine learning (ML) are Peter Brown and Robert Mercer. According to Gregory Zuckerman’s excellent book, The Man Who Solved The Market, Brown and Mercer are two extremely different personalities, but they created financial magic as a team. When technology is involved teamwork is important for idea generation, hypothesis testing and most of all to avoid bugs in the code! Prior to joining Renaissance Technology (RenTech) in the early 1990s to create complex financial prediction models, Brown and Mercer were part of the speech recognition group at IBM, which still is a leading AI company in the technology industry. Currently, the most popular application for speech recognition is a hybrid model combination of the hidden Markov model (HMM) and deep neural networks (DNN), which are both two advanced ML techniques. Sophisticated HMM or DNNs also happens to be great tools for advanced time series prediction. My guess is that Peter Brown and Robert Mercer transferred their knowledge within speech recognition to financial market prediction and together these two geniuses took RenTech to the next level.

Robert Mercer (Left) and Peter Brown (Right)
Global World means Non-Linear Relationships
The financial market is a dynamic beast constantly growing larger as capital, labor and technology together spurs on economic growth across the globe. As growth accelerates, more people, lifted from poverty, join the economy leading to a more competitive and complex global marketplace. As a result, the strategies used by quantitative investors to generate excess market returns must increase in sophistication to keep up with the global competition. Research has shown that simple models based on trend following, mean reversion and econometric modeling used to extract alternative risk premia such as value, growth, momentum and size, do not work as well as they have in the past. To prevent alpha decay, investors must invent new approaches or further increase complexity of their existing models to capture alpha or so-called excess return. To stay ahead of the curve, Hedge funds reinvest their profits in both R&D and skilled workers. Currently, the most successful quantitative firms have modeled themselves after RenTech, which adapted the “stay ahead of the curve” approach in the early 90s by moving away from the successful discretionary trading approach utilized by trading giants such as Paul Tudor Jones and George Soros. What separates RenTech from other systematic trading firms is that RenTech created an adaptive portfolio that adjusted its forecasts on its own without human intervention. Trusting the computer to optimize allows for capturing none-linear relationships among a portfolio consisting of multiple signals generated by many types of models. Jordan Ellenberg’s excellent book on the power of mathematical thinking, demonstrates in Figure 1 how non-linearity impacts our lives on a daily basis.
Figure 1

Figure 1 and 2 are representations from Jordan Ellenberg’s excellent Book, “the power of mathematical thinking,” and describes the importance of non-linear thinking. When conservative pundits argue that socialism or “Swedishness” is always bad, they are thinking linearly. The negatively sloped line in Figrure 1 shows the level of prosperity on the Y-axis and Swedishness on the X-axis. Figure 1 concludes that the more Swedish a country is, the less prosperous that country will be. However, this is an example of false logic because even the most hardcore conservatives understand that a government is important for providing basic needs to its people. Figure 2 below accurately describes the relationship between Swedishness and prosperity as a no-linear parabola. In theory, there is a max peak that has just perfect mix of libertopia and Sweden. Machine Learning can abstract this concept into higher dimensions so that we can get closer to the peak for our mathematical prediction models.
Figure 2

Machine Learning
RenTech effectively created a ML based portfolio when Simons and company decided to make the Medallion fund adaptive in the early 90s. RenTech’s flagship program has therefore learned from forecasting errors since the 90s and thus improved the portfolio for over 30 years with new data continuously being fed into the optimization step. For an adaptive portfolio to exist back in the 90s, RenTech must have invested heavily in technology so Brown and Mercer got lucky, as they joined at the right time to utilize their modeling skills to develop more powerful quantitative strategies. ML is a great tool to find complex patterns hidden in market data, as a computer is not limited to 3-dimensions. Researchers such as Brown and Mercer must therefore rely on statistical significance when building ML models rather than human intuition to determine if patterns are spurious or consistent. Brown and Mercer became successful researchers in financial market prediction because they possessed a combination of computer science, mathematics and perhaps also some financial knowledge. While working together at IBM, Brown and Mercer most certainly used HMM and perhaps even Neural Networks to detect patterns in language. The auto correct on your phone is an example of a combination between a NN and an HMM predicting what your trying to say to a friend when texting them. Brown and Mercer then took their skills in language detection and applied it to financial markets. One obvious application of speech recognition, which Zuckerman writes in his book is that RenTech uses ML models to detect anomalies in financial report filings by public companies in the stock market,
Hidden Markov Models
The combination of a hidden Markov model (HMM), Bayes theorem and the Baum-Welch algorithm plus the increase in computing power in the 90s revolutionized quantitative finance. By applying Bayes theorem within the HMM framework, which is calculating the inversion probability of events, the HMM can infer probabilistic outcomes for any HMM process and one of these processes is financial market forecasting based on underlying market observations. These underlying observations can both be known and unknown parameters. The relationships of underlying observations and hidden states can partially be solved for thanks to the Baum-Welch algorithm. One of the creators of the Baum-Welch algorithms, Lenny Baum was actually Jim Simons first investment partner at his original firm Monemetrics, which later became RenTech. Prior to joining RenTech, Lenny Baum had developed the Baum-Welch algorithm to help forecasting patterns in the short-term space for other chaotic environments than the financial markets. However, when Baum first joined Rennesaince, he actually developed simple trend-models and not ML models like HMM because the technology to use Baum-Welch algorithm for financial market prediction was not easily available until Brown and Mercer joined in the 90s. Quants are interested in forecasting price moves so the hidden state in the HHM could be the likelihood of the stock market going up or down in the future and the observations could be patterns in data such as historical prices, volatility, economic indicators and other hidden variables.
Figure 3

Look at the transition and emission matrix below to see what all the probabilities mean.
First - A Simple Hidden Markov Model
Figure 3 depicts perhaps the simplest version of a hidden Markov model, as there are only two hidden weather states, sunny or rainy (S or R). The likelihood of estimating the current hidden weather state can be increased from random chance based on the observed emotion of person X, which is either happy or grumpy (H or G). For the purpose of our example, we can find out if person X is happy or grumpy by asking him. From Figure 3, we can also deduct that if person X says he is happy then the probability of sunny given happy or P(S given H) = 0.8. Based on the law of probability theory, we can then conclude that the probability of sunny given grumpy or P(S given G) = 0.2, as these two conditional probabilities must sum to 1. Based on the same logic, the P(R given H) = 0.4 and the P(R given G) = 0.6. These four conditional probabilities are so-called emission probabilities and in Figure 3 they point towards the green and red emoji (see also the emission matrix below). The probabilities to the right and left of the sun and the rain cloud displayed in Figure 3 are the transition probabilities. For example, if the weather is sunny today then the probability that the weather is also sunny tomorrow is 0.8 and the probability of rainy tomorrow is 0.2. Likewise, if the weather is rainy today then the probability of sunny tomorrow is 0.4 and the probability of rainy is 0.6. Finally, we have initial probability of our hidden weather state, which is the probability that it is sunny or rainy on a random day without the observed emotion of person X.
Figure 4

Figure 4 is the matrix version of Figure 1. These probabilities can be updated in a ML fashion based on the Baum-Welch algorithm below.
Getting The Transition, Emission And Initial probabilities
Worth repeating is that the transition probability is the odds of the next hidden state based on the current hidden state. By analyzing historical data, we can estimate this transition probability. For example, through investigating a time series describing the number of days with sun and rain, we possess data on the historical transition probability. We can then measure how often there is sun two days in a row or rain two days in a row in order to develop a predictive algorithm for the weather. To measure the emission probabilities, we would match the happy or grumpy emotion of person X with the historical weather time series. The combination of the emission and transition probability is the foundation for our HMM. If the emotion input of person X is missing, we can calculate the probability of the weather being sunny or rainy on any random day by solving equations (1)-(4). When the first 3-steps in the equation system are solved, we get the initial probability in step (4), which is P(S) = 0.67 and P(R) = 0.33.
(1) S=0.8S + 0.4R (2) 1=S+R (By law of probability that they events should sum to (1)
(3) S=2R
(4) Plug (3) into (2) and we get R = 1/3
Figure 5

To find out if today is S or R without the emotional input of person X our best prior probability is S = 2/3 or 0.67 and R = 1/3 or 0.33, which we solved for in the equation system earlier. However, if we know that X is H then from our example the updated posterior probability is 0.8. You can see that because if you remove all the sad days, there will be ten green emojis left. Out of those ten emojis 8 of them is under the suns and 2 under the rain, which equals 8/10. In this simple example, we can increase the strength of the prediction signal by using our HMM to get more information in combination with Bayes law to get a posterior probability that is greater than the prior.
Best guess for weather over two random days
For weather prediction over two consecutive days, we must use the emission probabilities figured out in the emission matrix to increase our chance of forecasting the correct hidden state. Say as an example that we know from data that person X was happy yesterday and is grumpy today (HG). For these two days there are then four different weather outcomes (SS, SR, RS, RR). To make the best guess about the weather states over these two days, we will calculate the probability for each weather combination knowing that person X’s emotion is HG. To achieve this computation, we need to calculate the conditional probabilities for all four weather transition scenarios. Please note that there are smarter algorithms for this manual computation such as the Viterbi algorithm when the chain of hidden state events is greater than 2 days. However, for our example, we analyze the SR combination first. The first day, we are unable to use the transition probability and must instead rely on the 0.67 random day probability that yesterday was sunny. We can then infer based on the transition matrix that the probability of rainy today is 0.2. Remember that we know that person X is HG, that P(H given S) =0.8 and that P(R given G) = 0.6, so all we have to do for the SR combination is to multiply the probabilities together (0.67*0.8*0.2*0.6) to get 0.064. If the same calculation is performed for the SS combination, we get a higher number, which is 0.085 and that is our maximum likelihood estimation or so called best guess. In other words, given that person X is HG, then the best weather combination is SS. So in this example, we would guess that the hidden states for yesterday and today are both Sunny. This guess is much better than chance (ALMOST MAGIC).
The Obvious Problem
The SS weather conclusion is the optimal solution to our weather problem, as it maximizes the probability of being correct with our assumptions given that the mood of person X is HG. However, the input parameters in our HMM are a big model weakness because how do we know that the emission, transmission and initialization probabilities are estimated correctly? In the real world, there are most likely many factors that impact the emission probability of person X and the transition probability of a hidden state to the next. A further complexity is that the transmission and emission probabilities evolve with time. For example, if person X goes to a psychologist, he might feel less grumpy about rainy days in the future or perhaps global warming makes sunny weather more likely. A certainty, however, is that the more factors that impact person X, the more complex it will be to figure out the correct emission probabilities. Making predictions about the hidden states in the financial market based on underlying observations is an even more complex process than figuring out the weather. There is a multitude of person X in the market and all of them have a different behavioral response to GDP numbers, interest rates, political uncertainty and other factors impacting the economy. The transition probabilities of stock prices also change all the time based on changes in underlying observations. In fact, being able to correctly predict the stock market’s hidden state tomorrow is the Holy Grail in finance, but using ML is closer than anything else.

Lenny Baum, a legendary mathematician!
The Partial Solution: Baum-Welch Algorithm
To make practical use of the HMM, the transition, emission and initial probabilities must be estimated correctly because when analyzing a dynamic noisy process like the financial markets, these probabilities are not perfectly measurable. Also, if the input parameters for a HMM are estimated based on garbage data, then a financial HMM will make trash predictions. This learning problem, the failure of understanding the probabilistic relationship between the observed and hidden states, needed to be solved in order to leverage HMM for financial market prediction. Lenny Baum, who also happens to be one of Jim Simon’s earliest investment partners at RenTech, and his research partner, Lloyd Welch, therefore developed an algorithm that could partially solve the learning problem. This is extremely helpful in financial markets, as making money requires the algorithm to be correct 51% of the time so a partial solution is good enough. What is even more interesting is that Brown and Mercer have published a lot of research within speech pattern recognition based on HMM and they most likely took advantage of the Baum-Welch algorithm for their financial models at RenTech, which might be one of the main reasons why RenTech is known as the greatest hedge fund in history.
Conclusion
Peter Brown and Robert Mercer revolutionized finance by creating large-scale dynamic probability models to forecast the financial markets. Unlike our simple weather model, which is something academia has been able to model for a long time, the financial market is on another level and therefore more complex to model. There are almost an infinite number of observations together explaining the market regime, which in turn determines how the market move up and down at any given moment. The dynamics between observed variables with themselves, other variables and the hidden states constantly changes and the Baum-Welch algorithm is therefore used to update the emission and transition probabilities so that is possible to continuously make the optimal probabilistic prediction. Brown and Mercer were early to develop these types of HMMs or recursive models that could use Bayes theorem and the Baum-Welch algorithm to handle a large amount of data in order to make informed short-term forecasts in the markets. Later in their career, Brown and Mercer developed more ingenious ways to forecast financial markets such as Neural Networks, which are more flexible than HMM models. In the end, researchers at RenTech understood the power of applying more complex models to find patterns that exist based on non-linear interaction.
Thanks!
Stay adaptive!
Comments