Came across this wonderful explanation of why the neural network with hidden layer are universal approximators. Although not very helpful for practical purpose gives an intuitive feel of why neural network give reasonable results.
The basic idea is to analyze a sigmoid function as you change
b . In particular effect on as one varies
b. It is been shown with an animation that two sigmoids sum can be seen as a step function in some space. Having multiple pairs of step function can be used to approximate a 1d function. Extension of this idea to 2D has also been shown.
See details here : http://neuralnetworksanddeeplearning.com/chap4.html