Fork me on GitHub

Xavier 论文疑惑(论文标题:Understanding the difficulty of training deep feedforward neural networks)

9bda5476dd7f4c8eb37448b00bfd8941-QQ20190619173439.png

1.为什么不是W推向0来实现?

2.对于tanh,为什么第一层先于后层饱和?
7d34cbc1fa9d4bca9bc6ce1c02e2cb76-QQ20190619173904.png