AI Learning Types - Supervised Learning
Supervised learning uses labelled data to predict the outcome
Supervised learning involves training AI systems on labelled data, with predefined input-output pairs. The AI system learns to predict the output based on the input by learning from the labelled data it is given. It minimising the difference between its predictions and the actual output. Supervised learning is useful for tasks such as image classification, speech recognition, and natural language processing.
Whilst supervised learning is a powerful technique, it comes with a significant drawback - it needs a lot of labelled data. For instance, if you want to train an AI system to identify cats, you might need to provide it with 1,000 or even 10,000 pictures of cats That's a massive amount of cat photos to input into the system!
Tesla cars were trained with over 1 million video clips before the neural network began working well. The neural network started excelling after being trained on 1.5 million video clips.
Neural Networks in Supervised Learning
Supervised learning leverages neural networks by feeding them labelled data, allowing the network to make predictions, assessing the accuracy of those predictions, and iteratively adjusting the network's parameters to improve its performance.
A neural network is a series of interconnected nodes or "neurons" organised into layers: an input layer, one or more hidden layers, and an output layer. Each connection between nodes has an associated weight, which is adjusted during training.
Models continuously evaluate and adjust their predictions using mathematical techniques. The goal is always to get as close as possible to the true outputs provided in the labelled data. The ultimate goal is for the neural network to generalise well, meaning it can make accurate predictions on new, previously unseen data.
Training Data
In supervised learning, you provide the neural network with labelled data. This means for every input (e.g., an image), there's an associated known output (e.g., a label indicating what's in the image).
Forward Propagation
When you input data into a neural network, it undergoes a process called forward propagation. The input data passes through each layer of the network, undergoing transformations via weights and activation functions, until it produces an output.
Calculating Loss
The output from the forward propagation is compared to the known label from the training data, and the difference (or error) is calculated using a loss function. This function quantifies how far off a model's predictions are from the actual results. For instance, if a model predicts that an image is 80% likely to be a cat, but the image is actually of a dog, the loss function will produce a high value indicating a significant error.
Backpropagation:
When the model makes a prediction, and the error is computed using the loss function, backpropagation determines how each neuron in the network contributed to the error. It then adjusts the weights of the network to reduce the error in its predictions. This involves computing the gradient of the loss function with respect to each weight by applying the chain rule, which indicates in which direction, and by how much, each weight should be adjusted.
Weight Adjustment:
Using an optimisation algorithm, typically gradient descent, the weights in the network are adjusted in the direction that reduces the error.
Gradient descent can be thought of this as a hiker trying to find the lowest point in a valley by always taking steps downhill. In the context of supervised learning, the "terrain" is the loss function, and the "lowest point in the valley" represents the optimal model parameters that produce the smallest prediction error.
The model doesn't just go through this process once. It processes the labelled data multiple times, each time adjusting its parameters to reduce the error. With each iteration, the model aims to decrease the value produced by the loss function, meaning its predictions are getting closer to the actual outputs.
After many iterations, the model's adjustments to its parameters become smaller and smaller, indicating that it's found the best (or near-best) parameters for its predictions. At this point, we say the model has "converged" to an optimal solution.
Model Evaluation
After training, the neural network is evaluated on unseen data (test data) to assess its performance. If the network has been trained well, it should be able to make accurate predictions on this new data.
Example - Supervised Learning at Moorfields Eye Hospital
Moorfields have years of experience with AMD - which includes a lot of eye scans divided into scans which have AMD and scans which dont. Patients who are referred to Moorfields will have already had 2D and 3D pictures taken of the back of their eye by their high street opticians. So Moorfields take the OCT scans - and use AI to filter and triage the patients, get them in front of a doctor sooner, give them an earlier diagnosis, earlier treatment, and potentially save their sight.
Moorfields Eye Hospital developed a neural network model to analyse an eye scan to determine the presence of AMD. To train this model, the team fed it numerous eye scans, each labeled either "AMD" or "AMD-free." The model's underlying algorithm had to learn on its own how to distinguish between the scans, by extracting and recognising the relevant features indicative of AMD.
Model Evaluation : The team tested that the algorithm worked as they expected, by passing unlabelled images into the algorithm and seeing if it diagnosed the same images with AMD as the specialists had. The results showed that the algorithm could triage these images with 94% accuracy. Some of the cases that the algorithm got wrong were actually very challenging, and upon review, the team realized that the algorithm's answer was valid, while their gold standard was open to debate.
Because every consultant needs to feel confident accepting the AI diagnosis or overruling it if they dont agree, they need to know the reasoning behind it. So the team actually built two neural networks, one to identify the disease features and another to use those features to make a diagnosis. The first neural network highlights areas that don't look normal or appear suspicious, and the second network explains what is happening in those areas together with its recommendation, expressed as a percentage so that clinicians can see the systems confidence in its analysis.