Algorithm Optimisation for Machine Learning
INTRODUCTION
Machine learning optimisation is an important part of all machine learning models. Whether used to classify an image in facial recognition software or cluster users into like-minded customer groups, all types of machine learning model will have undergone a process of optimisation. In fact, machine learning itself can be described as solving an optimisation problem, as an optimisation algorithm the driving force behind most machine learning models. The iterative learning performed by these models are an example of the process of optimisation. Models will learn and improve to minimum the degree of error within a loss function.
The aim of machine learning optimisation is to lower the degree of error in a machine learning model, improving its accuracy at making predictions on data. Machine learning is generally used to learn the underlying relationship between input and output data, learned from a set of training data. When facing new data in a live environment, the model can use this learned approximated function to predict an outcome from this new data. For example, models are trained to perform classification tasks, labelling unseen data into learned categories. Another example is in machine learning regression tasks, which is the prediction of continuous outcomes like forecasting stock market trends.
In both examples, the aim of machine learning optimisation is to minimise the degree of error between the real output and the predicted output. This is known as the loss function, a measurement of the difference between the real and predicted value. Machine learning optimisation aims to minimise this loss function. Optimisation algorithms are used to streamline this process beyond the capacity of any manual process. These algorithms use mathematical models to iteratively optimise a machine learning model.
This guide focuses mainly on the optimisation of model hyperparameters using an optimisation algorithm. Usually, the actual effects of different combinations of hyperparameters aren’t known. Optimisation algorithms are therefore leveraged to test and optimise combinations to create the most effective model configurations and learning processes. This guide explores the process of algorithm optimisation, including what it is, why it’s important, and examples of the different optimisation algorithms used in practice.
What is algorithm optimisation for machine learning?
Algorithm optimisation is the process of improving the effectiveness and accuracy of a machine learning model, usually through the tweaking of model hyperparameters. Machine learning optimisation uses a loss function as a way of measuring the difference between the real and predicted value of output data. By minimising the degree of error of the loss function, the aim is to make the model more accurate when predicting outcomes or classifying data points.
Optimising machine learning models usually focuses on tweaking the hyperparameters of the model. As the name suggests, machine learning is the process of a model or system learning from the data itself, often with very little human oversight. Hyperparameters are the elements of a model that are set by the developer or data scientist before training begins. These impact the learning process, and as a result can be tweaked to improve model efficiency.
An example of a hyperparameter would be the setting of the total amount of clusters or categories a model should classify data into. Other examples may be setting the rate of learning, or setting the structure of the model. Hyperparameters are configured before the model is trained, in contrast to model parameters which are found during the training phase of the machine learning lifecycle. It should be tweaked so the model can perform its given task in the most effective way possible.
Hyperparameter tuning or machine learning optimisation aims to improve the effectiveness of the model, and minimise the aforementioned loss function. The power of optimisation algorithms can be leveraged to find the most effective hyperparameter settings and configurations. Manually testing and tweaking hyperparameters would be a time-consuming task, which would prove impossible with black box models. Instead, optimisation algorithms are used to select and assess the best possible combinations of hyperparameters.
What is the need for optimisation algorithms?
The concept of hyperparameters in machine learning must first be clarified before understanding the need for optimisation algorithms. The model training process deals with achieving the best possible model parameters. For example, during training the weightings of features within the model can be established. Hyperparameters on the other hand are set before the training process by the developer or data scientist. These are model parameters used to configure the overall learning process of the model.
Examples of hyperparameters include the setting of the learning rate or the count of clusters used by the model. Optimised hyperparameters are therefore vital as it ensures the model is at peak efficiency, reducing the loss function and improving the effectiveness of the model as a whole. Each hyperparameter can be tweaked or changed until the most optimum configuration is achieved. This means the model is as effective and accurate as possible.
Manual optimisation of hyperparameters can take a huge amount of time and resources, as a data scientist must cycle through different combinations and configurations. Optimisation algorithms are therefore used to streamline the process, effectively finding the optimum configuration of model hyperparameters. An optimisation algorithm will work through many different iterations to find the optimum configuration of the model, beyond what is possible by a human.
Another common issue within black box machine learning is that it can often by impossible to understand the effect of hyperparameters on the wider model. In these cases, manual optimisation by a developer wouldn’t be possible. Optimisation algorithms are leveraged to improve model configurations even when derivatives are unknown.
The technique of cross validation is usually used to test these optimised hyperparameters on new and unseen data. The process sees the model process unseen testing data, indicating whether the model is overfit to the training data. This helps to gauge the model’s ability to generalise when facing new data, an important consideration for any machine learning model. As a result, optimisation algorithms are an integral part of the machine learning process.
What are the different algorithm optimisation techniques?
There are many different approaches within optimisation algorithms, with different variants of each technique. Although there are different techniques, the aim is generally the same: to minimise the loss or cost function. Through optimisation, the difference between the estimated and real value is reduced. Optimisation algorithms use different techniques to test and evaluate combinations of hyperparameters, to find the optimal configurations in terms of model performance. The algorithms are often used within the model itself to improve its effectiveness in light of its target function too.
A way of grouping the many different optimisation algorithms is whether the derivative of the target function that is being optimised can be established. If the function is differentiable, the derivative can be used within the optimisation algorithm as a valuable piece of extra information. The algorithm will use a derivative to improve the direction or focus of its optimisation. But in some cases, derivatives may not be accessible or available. In other cases, noisy data may cause derivatives to become unhelpful. Derivative-free optimisation techniques are used by optimisation algorithms that avoid using derivatives altogether, using just the function values instead.
Optimisation algorithms for differentiable functions
For machine learning model functions that are differentiable, the function’s derivative can be leveraged during the optimisation process. The derivative can inform the direction or selection of each iteration of hyperparameter combinations. The result is a much more focused search area. This can mean optimisation algorithms can perform more effectively when compared to derivative-free optimisation algorithms.
Common optimisation algorithms when the function is differentiable include:
- Gradient Descent
- Fibonacci Search
- Line search
Gradient Descent
Gradient descent is a common technique in machine learning optimisation. The gradient is measured and multiplied by the learning rate hyperparameter, which is optimised to minimise the loss function. It’s a common approach within the training of machine learning algorithms too. Gradient descent is in the wider group of first-order algorithms, which use the gradient or first derivative to move through the search space.
Fibonacci Search
Fibonacci Search is a type of optimisation technique in the wider group of bracketing algorithms. It’s generally used to find the minimum or maximum of values, and moves through the search area within a specific range or bracket. Each step in the sequence narrows the bracket of an optimum value, effectively narrowing the search area in each iteration. A similar technique is the golden-section search, which again narrows its boundaries in the search for an optimum value.
Line Search
The line search technique uses a descent direction to iteratively refine a target function. The approach performs a bracketed search along a line after the direction of movement is selected. Each iteration will be optimised against the target function until no more optimisation is achievable. It’s part of a wider group of optimisation algorithms called Local Descent Algorithms.
Derivative-free optimisation algorithms
In some cases it can be challenging or impossible to identify derivative information of the machine learning model’s function. This could be down to a significant resource expense or if the data is particularly noisy so that derivatives aren’t useful. In the case of black box machine learning, derivatives may be difficult to define or identify. They can also be difficult to establish in simulation-based learning environments.
Derivative-free optimisation algorithms use only the values found in the objective functions. This approach is usually less effective or efficient compared to optimisation algorithms that use derivatives. This is because the algorithm has less information to inform the optimisation process.
Common examples of derivative-free optimisation algorithms include:
- Evolution algorithms
- Bayesian optimisation
- Random search
Evolution algorithms
Evolution algorithms are a common approach when optimising deep neural networks. The technique mirrors genetic or evolutionary selection processes to combine and assess hyperparameter combinations. Hyperparameters are combined through different iterations, with the most successful combinations forming each generation. Hyperparameters are combined, tested and evaluated, with each iteration informing the next round of testing. This way, each iteration becomes more and more optimised and effective, mirroring the process of natural selection.
Bayesian optimisation
Bayesian optimisation is one of the most popular approaches for derivative-free optimisation algorithms. The technique refines the hyperparameter combinations with each iteration, by focusing on combinations which best meet the target function. This approach avoids a resource or time-intensive blanket approach in which all combinations of hyperparameters may be combined and tested. Instead, Bayesian optimisation is a sequence of refinements, with the model selecting hyperparameters with the most value to the target function.
Random searches
Random searches is a straightforward and commonly used approach to machine learning optimisation. Each hyperparameter configuration is randomly searched and combined to discover the most effective combinations. It can be used to discover emerging or new hyperparameter combinations because of the randomised nature of the search. If each hyperparameter configuration is mapped to a grid, random searches technique will randomly search and combine the values.
This is unlike Bayesian optimisation, which is more focused in its approach when used in optimisation algorithms. Random searches will usually be limited to a specific number of sequences or iterations. If left unlimited, random searches may take a huge amount of time to complete. It’s generally used to find the best combination of hyperparameters, avoiding the need for any manual checks.
Machine learning deployment for every organisation
Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.
With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.
Deploy machine learning in your organisations effectively and efficiently.