Deep Learning - /s (strangemonad's notes)

## disconnected thoughts A neural net can approximate any function - Is that a single layer, or multiple? - Minsky and Papert showed limitations of a single layer perceptron (e.g. a single layer can't learn XOR) but that multiple layers of perceptrons can overcome this limitation. 2 layers seems to be able to approximate any mathematical function. - The [[Universal Approximation Theorem]] shows the expressive power of neural nets as equivalent to [[computable functions]] - Data transformations - Feature extraction - Classification and regression - ... Because a deep NN can approximate any function, you in theory don't have to spend as much time feature engineering, you can have the NN learn the features from the inputs as well relationships and interactions. - How can we align parts of a NN model? E.g. I don't want to have it learn arbitrary features, I want to align it to concepts we can understand an reason about or I want to combine an NN with formal methods? ## Loss and Metrics Loss functions are meant for the optimizer and represent how "wrong" the model's prediction was. Metrics are meant to expose to the end user to get a sense other aspects of the quality of the model and how the training is progressing / converging. An example of a loss function might be error rate (i.e. the ratio of correct vs incorrect classifications). A 0% error rate means the model has predicted everything correctly. ## Epoch A training round where we've run through the entire training set. ## Questions - How do you extract theories from deep NNs - Could vanishing gradients be a feature, could you make a NN self pruning - Information theoretic view of NNs - How can I incorporate or learn partial PGMs - If you hard code early convolutional layers to be more idealized filters, do you get better or worse results. - How much data do you really need? - Couldn't you have less data but weight it's importance (I.e. what we do socially when we pattern match on what we think are widely accepted good ides or how we use aesthetics to pattern match?)