Sigmoid neurons simulating perceptrons, part I
因为把w,b放大c倍并不会改变整个的正负,激活函数为感知机,只考虑正负性。所以,整个网络的结果不会发生改变。当然如果c是个负数,正负性改变了,网络也就悲剧了。
Sigmoid neurons simulating perceptrons, part II
当c趋于无穷的时候,wx+b要么是正无穷(sigmoid=1)要么是负无穷(sigmoid=0),所以感知机与sigmoid等价。但是如果有的点,把c扩大n倍,sigmoid会输出0.5,这就与感知机不一样了
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 33 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
采用二进制编码
第一位,0,2,4,6,8对其权重很小。1,3,5,7,9权重很大
第二位,0,1,4,5,8,9很小,2,3,6,7很大
第三位,0,1,2,3,8很小,4,5,6,7,9很大
第四位,8,9很大,其他很小
Prove the assertion of the last paragraph.
使用柯西不等式 ΔC≈∇C⋅Δv
成立当且仅当 两者线性相关,所以假设成立
I explained gradient descent when C is a function of two variables, and when it's a function of more than two variables. What happens when C is a function of just one variable? Can you provide a geometric interpretation of what gradient descent is doing in the one-dimensional case?
就是沿着斜率一直往下走
An extreme version of gradient descent is to use a mini-batch size of just 1. That is, given a training inName one advantage and one disadvantage of online learning, compared to stochastic gradient descent with a mini-batch size of, say, 2020.
优点:快啊!
缺点:没那么精确
第二章
Backpropagation with a single modified neuron
修改f的导数,其他一样
Backpropagation with linear neurons
如果激活函数是线性的,那么所有求导数的行为都=1
所以对output层我们有e=偏导对z,对后面的层我们直接回传,w与b更新的方法是