For each machine- and deep learning algorithms, we need:
Input data - samples and their properties. E.g., images represented by color pixels. Proper data representation is crucial
Examples of the expected output - expected sample annotations
Performance evaluation metrics - how well the algorithm's output matches the expected output. Used as a feedback signal to adjust the algorithm - the process of learning
Creates layer-by-layer increasingly complex representations of the input data maximizing learning accuracy
Intermediate representations learned jointly, with the properties of each layer being updated depending on the following and the previous layers
White, B.W.; Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Am. J. Psychol. 1963
Widespread belief that gradient descent would be unable to escape poor local minima during optimization, preventing neural networks from converging to a global acceptable solution
During 1980s, 1990s, deep neural networks were largely abandoned
In 2006, deep belief networks revived interest to deep learning
In 2012, Krizhevsky et al. presented a convolutional neural network that significantly improved image recognition accuracy
GPU technologies enabled further development
Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006
^y=g(∑mi=1xiwi)
https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
^y=g(w0+∑mi=1xiwi)
^y=g(w0+XTW)
https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
https://www.datasciencecentral.com/profiles/blogs/how-to-configure-the-number-of-layers-and-nodes-in-a-neural
Deep learning models are formed by multiple layers
The multi-layer perceptron (MLP) with more than 2 hidden layers is already a Deep Model
Most frequently used layers
Parameters of the neural network (weights and biases) are first randomly initialized
Small random subsets, so-called batches, of input–target pairs of the training data set are iteratively used to make small updates on model parameters to minimize the loss function between the predicted values and the observed targets
This minimization is performed by using the gradient of the loss function computed using the backpropagation algorithm
Need to represent infinitely many real numbers with a finite number of fig patterns
The approximation error is always present and can accumulate across many operations
Underflow occurs when numbers near zero are rounded to zero
Overflow occurs when numbers with large magnitude are approximated as ∞ or −∞
Activation function takes the sum of weighted inputs as an argument and returns the output of the neuron
a=f(N∑i=0wixi)
where index 0 correspond to the bias term ( x0=b, w0=1 ).
Other functions: binary step function, linear (i.e., identity) activation function, exponential and scaled exponential linear unit, softplus, softsign
https://towardsdatascience.com/complete-guide-of-activation-functions-34076e95d044
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
We want to find the network weights that achieve the lowest loss
W∗=argminW1nn∑i=1L(f(x(i);W),y(i)) where W={W(0),W(1),...}
An optimization technique - finds a combination of weights for best model performance
Full batch gradient descent uses all the training data to update the weights
Stochastic gradient descent uses parts of the training data
Gradient descent requires calculation of gradient by differentiation of cost function. We can either use first-order differentiation or second-order differentiation
Richards, Blake A., Timothy P. Lillicrap, Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, et al. “A Deep Learning Framework for Neuroscience.” Nature Neuroscience 2019 - Box 1, Learning and the credit assignment problem
Initialize weights randomly ∼N(0,σ2)
Loop until convergence
Return weights
where η is a learning rate. Right selection is critical - too small may lead to local minima, too large may miss minima entirely. Adaptive implementations exist
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/model_optimization.html
Forward propagation computes the output by passing the input data through the network
The estimated output is compared with the expected output - the error (loss function) is calculated
Backpropagation (the chain rule) propagates the loss back through the network and updates the weights to minimize the loss. Uses chain rule to recursively calculate gradients backward from the output
Each round of forward- and backpropagation is known as one training iteration or epoch
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. “Learning Representations by Back-Propagating Errors,” 1986
Assuming sigmoid activation function σ(f), at Layer L1, we have:
a10=σ([w100⋅x0+b100]+[w101⋅x1+b101])
a11=σ([w110⋅x0+b110]+[w111⋅x1+b111])
https://www.analyticsvidhya.com/blog/2020/04/comprehensive-popular-deep-learning-interview-questions-answers/
At Layer L2, we have:
^y=σ([w200⋅a10+b200]+[w201⋅a11+b201])
https://www.analyticsvidhya.com/blog/2020/04/comprehensive-popular-deep-learning-interview-questions-answers/
Back-propagation - A common method to train neural networks by updating its parameters (i.e., weights) by using the derivative of the network’s performance with respect to the parameters. A technique to calculate gradient through the chain of functions
∂J(W)∂w1=∂J(W)∂^y∗∂^y∂z1∗∂z1∂w1
Review https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams. “Learning Representations by Back-Propagating Errors”, 1986, 4.
A series of 10-15 min videos by deeplizard
Analytics Vidhya tutorial: Step-by-step forward and backpropagation, implemented in R and Python: https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
https://en.wikipedia.org/wiki/Vanishing_gradient_problem
Vanishing & Exploding Gradient Explained | A Problem Resulting From Backpropagation
https://en.wikipedia.org/wiki/Vanishing_gradient_problem
Vanishing & Exploding Gradient Explained | A Problem Resulting From Backpropagation
Review the complete infographics at https://www.asimovinstitute.org/neural-network-zoo/
For each machine- and deep learning algorithms, we need:
Input data - samples and their properties. E.g., images represented by color pixels. Proper data representation is crucial
Examples of the expected output - expected sample annotations
Performance evaluation metrics - how well the algorithm's output matches the expected output. Used as a feedback signal to adjust the algorithm - the process of learning
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |