Basketball, Move 37, Creativity and Neural Networks Under Finance 3.0

Adaptive Alph
Jul 8, 2020
16 min read

Neural Network Evolution Versus NBA Player Transformation

The evolution of a neural network algorithm can be compared to the successful transformation of an aging NBA player. Just like an NBA player must adapt his game to keep up with his competitors, as father time is undefeated, a superior financial algorithm such as a self-improving neural network should learn and evolve to prevent alpha decay (decreasing returns due to competition). A perfect example of a successful NBA player transformation is Michael Jordan. Sports fans know him as perhaps the most athletic athlete of all time, but his legendary status was not solidified until the game winning jump shot against Utah Jazz, which lead to the NBA championship in 1998. However, when Jordan entered the NBA his shooting ability was his Achilles heel and he therefore needed to practice many hours to make shooting a natural objective function. With each practice shot, he marginally improved the likelihood of making the next, which is precisely how a neural network improves when practicing on new data. In NBA history, there are other famous shooters such as Stephen Curry, Larry Bird and Reggie Miller that were natural shooters and much greater than Jordan, but few players have improved their jumper like Jordan during their careers, and that is because Jordan had a work ethic second to none. Spanning his 15-year NBA career, Jordan shot below 20% from 3-point range five times, four of which happened during his first four NBA years. In the 1995-96 season, he averaged almost a league leading 42.7% from 3-point range and his average over his last career years was closer to 30%. Much like a quantitative researcher creating a successful recursive ML model, Jordan’s increased shooting efficiency resulted from continuous tweaking of his game and an innate ability to marginally improve. This ability to recursively improve and capture non-linear relationships makes artificial neural networks perfect for financial time series regression.

Michael Jordan's most famous game winner against Utah Jazz in the 1998 NBA finals

The Foundation Of Predictive Financial Neural Networks

The human brain is the foundation for neural network based decision-making, but instead of neurons determining our next action a financial neural network algorithm depends upon probabilistically charged perceptrons for navigation. These perceptrons fire when reaching certain thresholds, which then activates the next layer in the neural network. If enough perceptrons activates, the output layer of a neural network sends a signal to execute a buy or sell trade in a financial instrument. The main difference in forecasting capacity between classic linear regression and a neural network is that the latter tweaks its parameters in a non-linear fashion after receiving new data and adapts to changing market regimes to make more accurate predictions, while the former regression technique has fixed parameters hoping that pre-existing market anomalies persist. The edge for a machine learning model such a neural network is therefore that it can learn to capture new patterns and anomalies emerging from the data. Nothing describes a neural networks creativity better than "Move 37" in the complex game of GO, which was a move in game 5 by the famous neural network "AlphaGO" against the GO world champion Lee Sedol. "Move 37" left the world champion stunned to the point where had to spend at least 15 minutes to think about his next move. According to expert GO players, only a machine could have the foresight to make this "Move 37" so please Google the move on Youtube for a full description.

Neurons process and transmit information and are a core component of the brain

Machine Learning In Asset Management

The utilization of ML in business is broad, as learning algorithms can solve problems for all businesses ranging from simple back office tasks to more complex infrastructure changes. In the asset management industry, ML can potentially impact at least five parts of the business including portfolio construction, risk management, capital management, execution and marketing. The most important business component for an asset manager is portfolio construction, as this is the largest contributor to the bottom line and therefore the most lucrative application area of ML. The portfolio construction stream can be further divided into four different sub-streams. The first sub-stream is asset price prediction, which is straight forward, as researchers apply ML methodologies to predict future values of securities. The second sub-stream is to predict hard or soft financial events. These events could, for example, be earnings surprises, regime changes, and corporate defaults or M&A transactions. The third sub-stream is forecasting values that are not directly related to the price of securities, which involves factors such as future revenue, volatility, firm valuation and credit rating. The final use case for financial ML is optimal execution, position sizing and portfolio optimization, which are examples of using ML to solve traditional optimization and simulation problems. The first three streams involves developing the trading strategy, while the final stream is everything else, as traditional optimization includes weight optimization, optimal execution, risk management and capital management. The four portfolio construction sub-streams use various techniques varying in both complexity and use case. In the asset management industry, utilized ML techniques include algorithms categorized as supervised learning, unsupervised learning and reinforcement learning. An artificial neural network is part of the supervised learning algorithm category in which the end-goal or purpose of the algorithm is clear. In both finance and basketball the objective function is to maximize the risk adjusted return, which is just making high probability asset allocations in finance and taking equivalent high probability shots in basketball. My opinion is that advanced trading programs should mix in neural network based algorithms in the portfolio to maximize risk-adjusted returns.

Figure 1

Figure 1 is a diagram depicting two different types of machine learning techniques in supervised and unsupervised learning. The application of these two techniques varies with the data. If the data used to accomplish a task is discrete, then ML algorithms are typically used to categorization or clustering algorithms. However, financial data is continuous and in finance predictions are based on continuous time series data that updates

What is an Artificial Neural Network?

1. Input Layer

Complex neural networks consist of multiple layers and simpler versions have at least two layers, but most common is that a neural network consists of three types of layers. Each layer consists of connected perceptrons similar to that of a tournament tree in tennis. Each perceptron is designed to receive information from perceptrons in the previous layer at the same time in a probabilistic and recursive fashion. The first layer is the input layer and contains features that extract important information from raw data of a financial asset time series. Since most artificial neural networks in finance are supervised, the researcher decides based on experience those features that can extract the most relevant information from the data. The method of deciding optimal features is called feature extraction and reduces raw data to a few parameters that hopefully explains past behavior of an asset. Including features such as trend-indicators, volatility and other price derivatives distills information into a more concentrated manageable form so that the neural network generates optimal model output. Domain knowledge is therefore key to become a skilled feature extractor.

Figure 2

Figure 2 is a classic neural network with three separate layers. The input layer consists of feature perceptrons scanning financial data for information. After extracting the most valued information from the data, each individual perceptron sends a signal to every perceptron in the next layer. This middle layer is the so-called hidden layer, which transforms the signal from the input layer to find non-linear complex patterns in the data. The final step in the neural network algorithm is for each perceptron in the hidden layer to pass along a refined input signal to the output layer, which then commits to an action to either buy or sell an asset based on this refined signal. If there is a prediction error, the output layer will send back an error signal travelling backwards and updating the perceptron weights in the hidden and input layers so that the network can make a more accurate prediction next time.

2. Hidden Layer

The second type of layer is the hidden layer, which connects to the input layer, other hidden layers and finally the output layer. The hidden layers are complex and their output is a bit of a black box. What happens is that after the features/perceptrons in the input layer receives information from the financial time series (data) each input sends a weighted probability signal to each node in next layer. The strength of that input signal depends on the level of activation generated by the feature. If a feature/perceptron feels a strong pattern emerging from the time series, then the perceptron will send a stronger signal to the perceptrons in the hidden layer. The signal from each perceptron/feature in the input layer is sum-weighted to create a unified signal for each perceptron in the hidden layer. The strength of this sum-weighted signal also depends on the relationship of each individual perceptron in the hidden layer to that of the perceptrons in the input layer (Look at Figure 2). The sum-weighted strength received by the hidden layer will therefore vary from perceptron to perceptron, which makes sense, as different perceptrons will ultimately generate different end-actions by the neural network. However, a perceptron in the hidden layer is more likely to be activated if it receives a high sum-weight from the input layer. Important to note is that the sum-weighted input is first sent through a non-linear activation function. This special function is called a sigmoid and can handle sum-weighted inputs ranging from zero to infinity. By having a function that can handle both large and small numbers, the sigmoid can capture non-linear relationships. However, for practical reasons, the sigmoid normalizes that sum-weighted signal to range from 0 to 1 so that the sum of all signals adds up to 1 in order to follow the mathematical laws of probability. By following the law of probability, the neural network function becomes continuous and it has been proven that neural networks can theoretically estimate all continuous functions with arbitrary accuracy and this includes financial time series prediction. The sum-weighted signals are then sent through the hidden layers and depending emerging patterns, certain perceptrons will fire, while others will not. However, if the activation level in each layer remains high throughout the network, one can conclude that the neural network has identified a significant pattern and the output layer will commit to an action.

Figure 3

Figure 3 displays how a single perceptron in the output layer receives information from multiple perceptrons/inputs in a previous input or hidden layer. The input perceptron/feature represented by x is multiplied by its weight coefficient represented by θ. The next step is to sum these x’s and θ’s for each perceptron to get the total X value. That X value is plugged into the sigmoid function and depending on the strength of that output signal the neural network will make a trading decision. If the prediction is wrong, the gradient descent algorithm will adjust the θ so that the next prediction is more accurate.

3. Output layer

The final part of a neural network is the output layer. The output of a neural network is determined by the sum-weighted signals received by the output layer from the final hidden layer in the neural network. In finance, the neural network decision is binary, which means either adding or reducing to a current position in the portfolio to match whatever position the neural network model has forecasted to be optimal. However, no neural network is perfect and like any mathematical model, the neural network will make forecast errors, especially when training the model in the beginning. At first, the researcher will not know the optimal weights associated to each perceptron and will therefore initialize the training process by assigning random weights to the input features to figure out the optimal weights in a recursive fashion. The goal of the training process is to minimize the forecast error by finding the global minimum of a convex function (ypred – y actual). Imagine that we are standing on a mountain peak and our objective is to climb down the mountain by following the most efficient path. The first step in our mission is to determine the optimal direction for descension. Our next action is to repeat step one until we have successfully reached the bottom of the mountain. In ML terminology, this method of climbing down the mountain in a step-by-step fashion is called gradient descent. This gradient descent approach can be applied mathematically to a number of problems to minimize distances for climbers and also statistical prediction errors. The objective for minimizing the height function is so that climbers can get down the mountain quickly and for Jordan the objective function is to minimize the probability of missing a jumper, while the objective function for quantitative researchers is to minimize the prediction error to accomplish a high risk adjusted return for their investors. For all minimization problems, the gradient points us in a direction that maximally decreases the error. In math lingo, the gradient is known as the first order derivative of an equation and as the slope of a curve. For a climber, the gradient slope is the optimal path for descension. In financial ML, the goal is to minimize the prediction error of a financial asset rather than that of a climber’s height function. The process of finding this global/local minimum is iterative and because of the gradient, the model shifts the perceptron weights to improve the forecast for each new data point received by the neural network. The gradient descent algorithm basically fine-tunes the weights recursively with new training data to assign optimal probabilities to the perceptrons in all layers of the neural network. Ultimately, the gradient approach creates a neural network that adapts with time and hopefully generates a model with higher predication power compared to that of stationary classical regression model.

Figure 4

Figure 4 shows the true power of a deep neural network algorithm. Figure 4 differs slightly from Figure 2 for two reasons. Firstly, Figure 4 is a classification network, as the goal is to create an algorithm that separates blue data points from red data points, and Figure 4 is therefore considered a discrete/supervised neural network rather than a continuous/supervised neural network used in finance. Secondly, Figure 4 consists of two hidden layers and any neural network with more than 2 hidden layers is a deep neural network because the algorithm can find more complex patterns. In the above picture, one can clearly see how the neural network starts with input X and Y. These two inputs are then passed along through the non-linear activation function to create two regression lines. These two regression lines combined are not complex enough to achieve the blue and red separation objective. However, if another hidden layer is added with a higher power function such as a quadratic in the second hidden layer, then it is possible to send a signal through the activation function to the final output layer in which the data points can be separated perfectly.

What are the advantages Neural Networks?

Three decision rules guiding the major investing approaches in finance are the 80-20%, 10-90% and the 51-49% decision rules. Value investors follow the 80-20% rule and typical venture capital investors follow the 10-90% rule. Due to the limitations of our human brain, these discretionary investors must either be correct more often than not or make outsized gains compared to frequent losses when they are correct. Value investors might perhaps make ten stock picks at any given moment and hope to be correct on eight of them, which is 80% correct and 20% wrong. Venture capital investors, however, invest in ten start-ups hoping that one of these early investments will payoff massively in the future and thus overcoming frequent losses. The 51-49% decision rule is for systematic investors, as their systems only need a slight edge to become profitable. This is because machines can trade an infinite amount of instruments, as technology lacks the limitations of the human brain. A systematic investor applying neural network technology can push this 51-49% rule to perhaps 53-47%, which radically changes the investment performance of a systematic investor. The reasons neural network models increase the odds for successful trades are actually straightforward, as the neural network framework is designed to optimize in accordance with probability theory. A big misconception is that neural networks are overfit to historical data and fail to make accurate predictions in the future, but that is only for poor neural networks. It is true that higher complexity neural networks tend to have a greater fit to historical data, which is not necessarily great, as patterns might rime, but not exactly repeat. Figure 5 demonstrates this overfitting concept in terms of variance-bias trade off between simple neural networks and their more complex counterparties. A proper neural network is, however, created to overcome well-known challenges in the financial markets such as a low signal to noise ratio, non-stationary (changes a lot), and new anomalies that will present themselves when market participants remove old market anomalies. Even more important to note is that proper neural network models have the capacity to overcome these difficulties much better than their classical regression counterparties. First, neural networks have features that separates signal from the noise in the input layer and if certain features become more important in the future, the neural network will place more weight on these strong features, while the parameters in classic regression remains the same. Second, if a new market regime is established or if relationships between financial instruments change, a classic stationary model will make forecast errors, while the neural network will rely on the gradient to recursively self-improve to make proper forecasts in accordance with the current market regime. Finally, throughout financial market history there has been a number of persisting of anomalies ranging from the January effect, low price to book ratios, neglected stocks and small firm out performance. However, these anomalies are eradicated when enough investors are aware of them. When investors line up to take advantage of the same anomaly, new anomalies are created. A neural network is optimized to find these new anomalies, probably in higher dimensional space, that presents themselves in the data that might go unnoticed by other investors that are not relying on complex machine learning techniques. Unlike traditional model, the neural network has a set of hidden layers, which combines function to find these complex non-linear relationships.

Figure 5

Building models with high predictive power based on historical data in statistical analysis is always a balancing act, as overfitting to training data tends to correlate with complexity. A relatively more complex model is more likely to find false positives in historical training data, which means finding patterns that are not really predictive on future data. The blue line represents a model’s prediction accuracy on historical training data, while the red line demonstrates the same model’s performance on unseen test/ future live data. Based on the downward sloping blue line, the models prediction error on historical training data continuously decreases with higher model complexity. However, when a model is extremely complex, the U-shaped red line demonstrates that the same complex model is the opposite of accurate on test/live data, as it will have high variance, which means that after a certain complexity point the model overfitts to training data. Based on the same red line, one can also see that a simple model will instead have a high bias, which also leads poor prediction results. Both high variance and high bias leads to inaccurate models and the researcher must strike the optimal balance to achieve the perfect level of model complexity. The proper balance is accomplished through a statistical technique called regularization, which includes ensemble techniques such as randomly removing 50% of the nodes in the neural network when training and averaging the prediction from multiple models to generate better results all else equal. If all “models” in these two steps are almost equally good and have a correlation less than one then our ensemble improves the risk adjusted return per mathematical definition.

Peter Brown and Robert Mercer - The First Adaptive Portfolio

Related to neural networks is creating a portfolio construction of models that automatically self-corrects to optimally allocate risk to various markets. The goal of an automatic portfolio construction is to replace discretionary decisions by an investment committee and instead let the system decide which models that should have a greater influence on all trades. If you want to learn more about quantitative investing and how ML can increase the performance of a quantitative portfolio, a great read is George Zuckerman’s book, “The Man Who Solved The Market”, about Renaissance Technology, which is the greatest hedge fund in investment history. It was when the founder and polymath Jim Simons hired IBM computer and speech recognition scientists, Peter Brown and Robert Mercer, that Renaissance took off for the moon. Brown and Mercer realized that Renaissance lacked the coding skills necessary to create an adaptive unified portfolio of models and their specialty at IBM was recursive code to detect patterns in speech. To successfully adopt an adaptive portfolio construction, Rennesaince Technology invested heavily in Brown and Mercer’s coding ideas, which combined all trading signals from each individual model to this unified portfolio. Brown and Mercer’s portfolio approach makes for a much smoother optimization process because when the optimizer receive a signal from one of the models, the optimizer will consider that particular signal in combination with signals from all other models as well before executing a trade. If two models run counter signals, meaning that one signal is long and the other signal is short the same market, the optimization framework will have assigned weights to the two different signals based on historical success. Important to note is that past success in terms of the weighting process is not always based on which signal that has made the most historical profits. For example, even if a model signal has a lower Sharpe ratio (risk adjusted return), there are other key characteristics that make a signal attractive within a portfolio context. Those key characteristics could be time horizon, correlation with the portfolio, trading costs, leverages and risk parameters among others constraints considered relevant by the optimization algorithm. The main weakness of having a unified portfolio is the difficulty of explaining the portfolio exposure, as the optimization process assigns weights to different signals based on dimensions beyond human imagination. We are programmed to think in three dimensions and when further dimensions are added its all abstraction to us. Explainability is the most debated parts of ML, as complex exposures are hard and sometimes impossible to explain. However, ML and neural network devotees argue that anything guided by statistical significance is based on more sound logic such as mathematics rather than constructed human explanations of factors such as value, growth and size.

Peter Brown and Robert Mercer

Conclusion

Today, there are many ML algorithms that fall under different categories such as supervised learning, unsupervised learning and reinforcement learning. Artificial neural networks are supervised algorithms constructed to imitate human decisions making. Like most ML algorithms in finance, neural networks are regression based, which means they apply time series analysis to identify patterns in historical data and compare these patterns to the current market environment. However, not all ML algorithms, including neural networks, are used to identify trading patterns or for portfolio construction. ML algorithms are also useful for risk management, capital management, execution and marketing, but portfolio construction is perhaps the most profitable utilization of ML in asset management because that is how investment firms generates a profit. Portfolio construction consists of four sub-streams and neural networks are especially useful for time series prediction. Neural networks can come in lots of different sizes and shapes, but most commonly is that they have three different layers. Each layer has a specific purpose ranging from the input layer, which extracts the most valuable information, to the hidden layer finding complex non-linear patterns emerging based on high information signals from the input layer. Finally, the output layer makes the appropriate probabilistic trading decision. If there is an error in the forecast, the gradient will back-propagate the errors so that the next prediction by the neural network becomes more accurate as weights to various perceptrons in each layer is updated. A complex neural network model might perhaps overfit to historical data and a talented researcher will therefore apply appropriate regularization techniques to avoid potential overfitting issues as described by the variance-bias tradeoff. We already know that ML has been applied successfully in finance through Renaissance Technology so the future for asset price prediction using adaptive ML models is bright!

Cheers :)

//Stay Adaptive!

Basketball, Move 37, Creativity and Neural Networks Under Finance 3.0

Recent Posts

Comments

Subscribe Form