一、神经网络的正向及反向传播过程

以一个实例逐渐展开人工神经网络的计算过程为例，图 2-7-1是一个简单的神经网络拓扑图，第一层是输入层，包含两个神经元 [math]i_1[/math]和 [math]i_2[/math]，截距项为[math]b_1[/math]；第二层为隐含层，包含两个神经元 [math]h_1[/math]和 [math]h_2[/math] ，截距项为[math]b_2[/math] ；第三层为输出层，包含两个输出项 [math]o_1[/math] 和 [math]o_2[/math] ，每条线上的数字表示神经元之间传递的权重值，记为 [math]w_i[/math] ；激活函数默认 Sigmoid 函数。[br][center][img]https://s21.ax1x.com/2025/02/17/pEM9vfs.jpg[/img][br]图 2-7-1 一个简单的人工神经网络示意[/center]

各层的初始数据如下：[br][br] 输入层：[math]i_1=0.05，i_2=0.1，b_1=0.35；[/math][br][br] 隐含层： [math]w_1=0.15，w_2=0.2，w_3=0.25，w_4=0.3，b_2=0.6[/math]；[br][br] 输出层： [math]o_1=0.01，o_2=0.99，w_5=0.4，w_6=0.45，w_7=0.5，w_8=0.55[/math]；[br][br] 目标：使输入数据 [math]i_1和i_2[/math]（即 0.05 和 0.1）通过神经网络传导后，其输出尽可能接近输出[math]o_1[/math] 和 [math]o2[/math]（即 0.01 和 0.99）。

（一）前向传播[br][br] 1. 输入层→隐含层[br][br] 隐含层的计算过程如图 2-7-2 所示（以 h1为例）。[br][center][img]https://s21.ax1x.com/2025/02/17/pEMCp60.jpg[/img][br] 图 2-7-2 隐含层的计算过程[/center]

首先计算输入层各神经元对隐含层该神经元（以 [math]h_1[/math]为例）的输入数据 [math]Net_{h1}[/math] ，然后计算经过激活函数（以 Sigmoid 函数为例）加工后的数据 [math]Out_{h1}[/math] 。[br][br] 计算神经元 [math]h_1[/math]的输入加权和 [math]Net_{h1}[/math] ：[br][br] [math]Net_{h1}=w_1i_1+w_2i_2+b_1\cdot1=0.15×0.05+0.2×0.1+0.35×1=0.3775[/math]

计算神经元 h1的输出数据 [math]Out_{h1}[/math]：[br][br] [math]Out_{h1}=\frac{1}{1+e^{-Net_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.593269992[/math][br] [br] 按此方法，可以计算出神经元 h2 的输出数据 [math]Out_{h2}=\text{ 0.596 884 378}[/math] 。

[br] 2. 隐含层→输出层[br][br] 输出层的计算过程如图 2-7-3 所示（以[math]o_1[/math] 为例）。[br][center][img]https://s21.ax1x.com/2025/02/17/pEMCPmT.jpg[/img][br] 图 2-7-3 输出层的计算过程[/center]

[math]Net_{o1}=w_5.Out_{h1}+w_6Out_{h2}+b_2\cdot1[/math][br][br] [math]Net_{o1}=0.4\times0.593269992+0.45\times0.596884378+0.6\times1=1.105905967[/math][br][br] [math]Out_{01}=\frac{1}{1+e^{-net_{o1}}}=\frac{1}{1+e-1.105905967}=0.75136507[/math][br][br] 同理，[math]Out_{o2}=0.772928465[/math]。[br] [br] 至此，一个人工神经网络的前项传播过程就结束了。我们得到的输出值为[0.751 365 07，0.772 928 465]，与我们的期望值[0.01，0.99]相差还很远，现在要做的是对误差进行反向传播，更新权值，直至得到符合要求的输出。

（二）反向传播[br][br] 1. 计算总误差（Square error）[br][br] [math]E_{total}=\sum\frac{1}{2}\left(\text{target output}\right)^2[/math] [br][br] 因为有两个输出，所以要分别计算 o1 和 o2 的误差，总误差为两者之和。[br][br] [math]E_{o1}=\frac{1}{2}(target_{o1}-output_{o1})^2=\frac{1}{2}\text{\times(0.01- 0.751 365 07)}^2=\text{0.274 811 083}[/math][br][br] [math]E_{o2}=\text{0.023 560 026}[/math][br] [br] [math]E_{total}=E_{o1}+E_{o2}=0.274811083+0.023560026=0.298371109[/math][br][br] 2. 隐含层→输出层的权值更新[br][br] 以权重参数 w5 为例，如果我们想知道 w5 对整体误差产生了多少影响，可以用整体误差对w5 求偏导。就本例来讲，不再展现复杂的目标函数公式（包括正则化），仅就推导过程中的当下函数应用展示迭代过程，公式如下：[br][br] [math]\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial w_5}[/math]

为了更清楚地说明这一链式公式，以图 2-7-4 直观演示隐含层到输出层误差是如何进行反向传播的。[br][center][img]https://s21.ax1x.com/2025/02/17/pEMCuX6.jpg[/img][/center] 图 2-7-4 隐含层到输出层的反向传播

现在逐一计算三项偏导的推导过程：[br] [br] ① 计算 [math]\frac{\partial E_{total}}{\partial out_{o1}}。[/math][br] [br] [math]E_{total}=\frac{1}{2}\left(\text{target_{o1}-outpu}t_{o1}\right)^2+\frac{1}{2}\left(\text{target_{o2}-output_{o2} }\right)^2[/math][br][br] [math]\frac{\partial E_{total}}{\partial out_{o1}}=2\times\frac{1}{2}\left(target_{o1}-output_{o1}\right)^{2-1}\times\left(-1\right)+0[/math][br][br] [math]\frac{\partial E_{total}}{\partial out_{o1}}=-\left(target_{o1}-output_{o1}\right)=-\text{(0.01- 0.751 365 07) =0.741 365 07}[/math][br][br] ② 计算[math]\frac{\partial out_{o1}}{\partial net_{o1}}。[/math][br][br] [math]out_{o1}=\frac{1}{1+e^{-nei_{o1}}}[/math][br][br] 容易推导，Sigmoid 函数的导数形式为 [math]f(x)[1-f(x)][/math]。[br][br] [math]\frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})=0.75136507\times(1-0.75136507)=0.186815602[/math][br][br] ③ 计算[math]\frac{\partial net_{o1}}{\partial w_5}[/math]。[br][br] [math]net_{o1=}w_{5\cdot}out_{h1}+w_5\cdot out_{h2}+b_2\cdot1[/math][br][br] [math]\frac{\partial net_{o1}}{\partial w_5}=1\cdot out_{h1}\cdot w^{\left(1-1\right)}_5+0+0=out_{h1}=\text{0.593 269 992}[/math][br][br] ④ 三式相乘：[br] [br] [math]\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial w_5}[/math][br][br] [math]\frac{\partial E_{total}}{\partial w_5}=\text{0.741 365 07\times 0.186 815 602 \times0.593 269 992= 0.082 167 041}[/math]

下面归纳这一层反向传播的通用公式：[br][br] [math]\frac{\partial E_{total}}{\partial w_5}=-\left(target_{o1}-out_{o1}\right)\cdot out_{o1}\left(1-out_{o1}\right)\cdot out_{h1}[/math][br][br] 用 [math]\delta_{o1}[/math]表示 [math]o_1[/math] 输出层的误差，即[br][br] [math]\delta_{o1}=\frac{\partial E_{total}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}=\frac{\partial E_{total}}{\partial net_{o1}}[/math][br][br] [math]\delta_{o1}=-\left(target_{o1}-out_{o1}\right)\cdot out_{o1}\left(1-out_{o1}\right)[/math][br][br] 因此，整体误差[math]E_{total}[/math] 对 [math]w_5[/math] 的偏导公式可以写成：[br] [br] [math]\frac{\partial E_{total}}{\partial w_5}=\delta_{o1}out_{h1}[/math][br][br] 根据梯度下降法， [math]w_5[/math] 的更新值为[br][br] [math]w_5^+=w_5-\eta\frac{\partial E_{total}}{\partial w_5}=\text{0.4 -0.5 \times0.082 167 041 =0.358 916 48}[/math][br] [br] 其中，[math]\eta[/math] 为学习速率，在本次计算中暂取 0.5。[br][br] 按通用公式，可计算出输出层其他三个参数的值为[br][br] [math]w_6^+=\text{0.408 666 186}[/math][br] [br] [math]w_7^+=\text{0.511 301 270}[/math][br] [br] [math]w_7^+=\text{0.561 370 121}[/math]

3. 输入层→隐含层（如果是多个隐含层，则为隐含层→隐含层）的权值更新[br][br] 其权值更新的方法与输出层的计算思路基本一致，但需要注意的是，在输出层计算总误差对 [math]w_5[/math] 的偏导时，是从[math]out_{o1}\longrightarrow net_{o1}\longrightarrow w_5[/math] 进行推导的，但是在隐含层之间（或输入层与隐含层）的权值更新时，以 [math]w_1[/math]为例，是 [math]out_{h1}\longrightarrow net_{h1}\longrightarrow w_1[/math] ，而 [math]out_{h1}[/math] 会接受 [math]E_{o1}[/math] 和[math]E_{o2}[/math]两个地方传来的误差，所以其误差要逐一向前传导，如图 2-7-5 所示。[br][center][img]https://s21.ax1x.com/2025/02/17/pEMChHU.jpg[/img][br] 图 2-7-5 输出层总误差的计算[/center]

[math]\frac{\partial E_{total}}{\partial w_1}=\frac{\partial E_{total}}{\partial out_{h1}}\cdot\frac{\partial out_{h1}}{\partial net_{h1}}\cdot\frac{\partial net_{h1}}{\partial w_1}[/math][br] [br] [math]\frac{\partial E_{total}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{h1}}+\frac{\partial E_{o2}}{\partial out_{h1}}[/math][br][br] ① 计算 [math]\frac{\partial E_{total}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{h1}}+\frac{\partial E_{o2}}{\partial out_{h1}}[/math][br] [br] [math]\frac{\partial E_{o1}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{o1}}\cdot\frac{\partial net_{o1}}{\partial out_{h1}}[/math][br][br] [math]\frac{\partial E_{o1}}{\partial net_{o1}}=\frac{\partial E_{o1}}{\partial out_{o1}}\cdot\frac{\partial net_{o1}}{\partial net_{o1}}=\text{0.741 365 07 \times0.186 815 602 =0.138 498 562}[/math][br][br] [math]net_{o1}=w_5\cdot out_{h1}+w_6\cdot out_{h2}+b_2\cdot1[/math][br] [br] [math]\frac{\partial net_{o1}}{\partial out_{h1}}=w_5=0.4[/math][br][br] [math]\frac{\partial E_{o1}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial out_{h1}}=\text{0.138 498 562 \times0.4 =0.055 399 425}[/math][br][br] [math]\frac{\partial E_{o2}}{\partial out_{h1}}=-\text{0.019 049 119}[/math][br][br] 则有[math]\frac{\partial E_{total}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{h1}}+\frac{\partial E_{o2}}{\partial out_{h1}}=\text{0.055 399 425 +(- 0.019 049 119) =0.036 350 306}[/math]

② 计算[math]\frac{\partial out_{h1}}{\partial net_{h1}}[/math]。[br] [br] [math]out_{_{h1}}=\frac{1}{1+e^{-net_{h1}}}[/math][br][br] [math]\frac{\partial out_{h1}}{\partial net_{h1}}=out_{h1}\left(1-out_{h1}\right)=\text{0.593 269 99\times (1 -0.593 269 99)= 0.241 300 709}[/math]

③ 计算[math]\frac{\partial net_{h1}}{\partial w_1}[/math]。[br][br] [math]net_{h1}=w_1\cdot i_1+w_2\cdot i_2+b_1\cdot1[/math][br] [br] [math]\frac{\partial net_{h1}}{\partial w_1}=i_1=0.05[/math]

④ 将三者相乘：[br] [br] [math]\frac{\partial E_{total}}{\partial w_1}=\frac{\partial E_{total}}{\partial out_{h1}}\cdot\frac{\partial out_{h1}}{\partial net_{h1}}\cdot\frac{\partial net_{h1}}{\partial w_1}=\text{0.036 350 306 \times0.241 300 709 \times0.05}=\text{0.000 438 568}[/math][br][br] 下面归纳这一层反向传播的通用公式。为了简化公式，用 [math]\delta_{h1}[/math]表示隐含层 [math]h_1[/math]的误差：[br][br] [math]\frac{\partial E_{total}}{\partial w_1}=\left(\sum\frac{\partial E_{total}}{\partial out_o}\cdot\frac{\partial out_o}{\partial net_o}\cdot\frac{\partial net_o}{\partial out_{h1}}\right)\cdot\frac{\partial out_{h1}}{\partial net_{h1}}\cdot\frac{\partial net_{h1}\partial net_{h1}}{\partial w_1}[/math][br][br] [math]\frac{\partial E_{total}}{\partial w_1}=\left(\sum\delta_0\cdot w_{ho}\right)\cdot out_{h1}\left(1-out_{h1}\right)\cdot i_1[/math][br] [br] [math]\frac{\partial E_{total}}{\partial w_1}=\delta_{h1}i_1[/math]

最后，更新 [math]w_1[/math]的权值：[br] [math]w_1^+=w_1-\eta\cdot\frac{\partial E_{total}}{\partial w_1}=\text{0.15- 0.5\times 0.000 438 568== 0.149 780 716}[/math][br][br] 按通用公式，可计算出输出层其他三个参数的值为[br] [br] [math]w^+_2=\text{0.199 561 43}[/math][br] [br] [math]w_3^+=\text{0.249 751 14}[/math][br] [br] [math]w^+_4=\text{0.299 502 29}[/math]

至此，误差反向传播就完成了，然后把更新的权值加入模型机中再次计算，如此不停迭代，直到达到我们要求的迭代阈值或误差值。在此例中，第一次迭代之后，总误差 Etotal 由0.298 371 109 下降至 0.291 027 924。迭代 10 000 次后，总误差为 0.000 035 085，输出为[0.015 912 196，0.984 065 734]，原输入为[0.01，0.99]，证明效果是不错的。[br][br] 需要说明的是：[br][br] 第一，此例是只有一个隐含层的神经网络，并且各层的神经元数量都为 2，在实际应用中，特征层的提取有时是相当复杂的，比如对图像的识别，对音频或视频的识别等，需要不止一个隐含层才有可能获得满意的模型，其正向传播或反向传播的计算量相当大，对计算机的算力要求也是相当高，采用的激活函数及目标函数也不尽相同，但基本模式及核心原理是一致的。[br][br] 第二，此例所呈现的输入层神经元 [math]i_1[/math]和 [math]i_2[/math] ，映射到大数据集，即表示训练数据的特征变量，样本数据有几个特征变量，就有几个输入层的神经元。[br][br] 第三，在正向传播计算中，对于多个样本值，不用每输入一个样本就去变换参数，而是输入一批样本（叫作一个 Batch 或 Mini-Batch），需要求得这些样本的梯度平均值后，根据这个平均值改变参数，也就是说，每个样本只进行前向传播和反向传播一次，然后计算梯度平均值，再进行下一轮计算。[br][br] 第四，神经网络既可以用于回归分析，也可以作为分类工具加以使用。一般来讲，如果做回归分析，其输出层以一个神经元为主，即只需要输出我们所关心的相关值即可。如果做分类分析（人工神经网络的骨干应用层面），以二分类为例，分类如图 2-7-6 所示，设[math]f(x,y)=z=\omega_1x+\omega_2y+b[/math]，则记 [math]\omega_1x+\omega_2y+b-z>0[/math]为 A 类，[math]\omega_1x+\omega_2y+b-z<0[/math]为 B 类。[br][center][img]https://s21.ax1x.com/2025/02/17/pEMPDr6.png[/img][br] 2-7-6 人工神经网络的回归分析[/center]

可以将输出层设为一个神经元，将 A 类训练为 1，将 B 类训练为 0。也可设为两个神经元，将 A 类训练为[1，0]，B 类训练为[0，1]。如有九分类，在 Matlab 中，多维输出层可以表达为[br][br] [1，0，0，0，0，0，0，0，0]→1，[0，1，0，0，0，0，0，0，0]→2，[0，0，1，0，0，0，0，0，0]→3，[0，0，0，1，0，0，0，0，0]→4，[0，0，0，0，1，0，0，0，0]→5，[0，0，0，0，0，1，0，0，0]→6，[0，0，0，0，0，0，1，0，0]→7，[0，0，0，0，0，0，0，1，0]→8，[0，0，0，0，0，0，0，0，1]→9。

一、神经网络的正向及反向传播过程

Information: 一、神经网络的正向及反向传播过程