Since 2012, deep neural networks are having an impressive practical success in many domains, yet their theoretical properties are not well understood. I will discuss why does neural network optimization, which based on local greedy steps, tend to converge to:
1) A global minimum, while many local minima exist.
2) A specific “good” global minimum in which the network function is surprisingly “simple” (while many “bad” global minima exist).
|Title||Why Neural Networks converge to “simple” solutions?|
|Study materials||About the deep learning era|