（10-2-01）TensorFlow深度信念网络实战：深度信念网络应用实战：程序缺陷预测(1)

最新推荐文章于 2024-07-08 21:47:14 发布

码农三叔

最新推荐文章于 2024-07-08 21:47:14 发布

阅读量756

点赞数 23

分类专栏：《零基础学习TensorFlow》文章标签： tensorflow 网络人工智能 python

本文链接：https://blog.csdn.net/asd343442/article/details/139969476

版权

《零基础学习TensorFlow》专栏收录该内容

65 篇文章 2 订阅

订阅专栏

10.2 深度信念网络应用实战：程序缺陷预测

经过本章前面内容的学习，已经基本了解了深度信念网络的基础知识。在本节的内容中，将通过具体实例的实现过程，详细讲解在在TensorFlow中使用深度信念网络的方法。本实例的功能是，使用 DBN 模型分别进行ANN、LogisticRegression、GaussianNB 和 RandomForestClassifier 的训练和集成，用于在有缺陷的代码和干净的代码之间进行分类。本项目程程序保存在ipynb文件中，将在谷歌的Colaboratory中调试运行。

10.2.1 使用DBN模型进行ANN处理

ANN是人工神经网络（Artificial Neural Network）的简称，是指由大量的处理单元(神经元) 互相连接而形成的复杂网络结构，是对人脑组织结构和运行机制的某种抽象、简化和模拟。以数学模型模拟神经元活动，是基于模仿大脑神经网络结构和功能而建立的一种信息处理系统。编写文件JustInTime_SW_Pred_v1.ipynb使用DBN模型进行ANN处理，具体实现流程如下：

（1）导入谷歌的库colab，查看驱动器的位置，代码如下：

from google.colab import drive
drive.mount('/content/drive/')

执行后会输出：

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).

（2）导入需要用到的库，代码如下：

import tensorflow as tf
import sklearn.metrics
from sklearn.metrics import log_loss, accuracy_score
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

（3）导入CSV文件，并获取文件中的数据数目，代码如下：

df= pd.read_csv('/content/drive/My Drive/dataset/bugzilla.csv')
def normalize(x):
  x = x.astype(float)
  min = np.min(x)
  max = np.max(x)
  return (x - min)/(max-min)

def view_values(X, y, example):
    label = y.loc[example]
    image = X.loc[example,:].values.reshape([-1,1])
    print(image)

print("Shape of dataframe: ", df.shape)   #train

执行后会输出：

Shape of dataframe:  (4620, 17)

（4）查看前5行数据中的信息，代码如下：

df.head()

执行后会输出：

class RBM(object):
    
    def __init__(self, input_size, output_size, 
                 learning_rate, epochs, batchsize):
        #定义超参数
        self._input_size = input_size
        self._output_size = output_size
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.batchsize = batchsize
        
        #使用零矩阵初始化权重和偏差
        self.w = np.zeros([input_size, output_size], dtype=np.float32)
        self.hb = np.zeros([output_size], dtype=np.float32)
        self.vb = np.zeros([input_size], dtype=np.float32)

（5）创建受限玻尔兹曼机类RBM，然后定义需要的超参数，代码如下：

    #向前传递
    def prob_h_given_v(self, visible, w, hb):
        return tf.nn.sigmoid(tf.matmul(visible, w) + hb)
    #向后传递
    def prob_v_given_h(self, hidden, w, vb):
        return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(w)) + vb)
    #采样函数
    def sample_prob(self, probs):
        return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))

（6）分别创建向前传递函数、向后传递函数和采样函数，其中h为隐藏层，v为可见层。代码如下：

#向前传递

def prob_h_given_v(self, visible, w, hb):

return tf.nn.sigmoid(tf.matmul(visible, w) + hb)

#向后传递

def prob_v_given_h(self, hidden, w, vb):

return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(w)) + vb)

#采样函数

def sample_prob(self, probs):

return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))

（7）创建训练函数train()，为了在训练时更新权重，我们设置执行收缩发散功能，并将误差定义为MSE（均方误差）。代码如下：

    def train(self, X):
        _w = tf.placeholder(tf.float32, [self._input_size, self._output_size])
        _hb = tf.placeholder(tf.float32, [self._output_size])
        _vb = tf.placeholder(tf.float32, [self._input_size])
        
        prv_w = np.zeros([self._input_size, self._output_size], dtype=np.float32)
        prv_hb = np.zeros([self._output_size], dtype=np.float32)
        prv_vb = np.zeros([self._input_size], dtype=np.float32)
        
        cur_w = np.zeros([self._input_size, self._output_size], dtype=np.float32)
        cur_hb = np.zeros([self._output_size], dtype=np.float32)
        cur_vb = np.zeros([self._input_size], dtype=np.float32)
        v0 = tf.placeholder(tf.float32, [None, self._input_size])
        h0 = self.sample_prob(self.prob_h_given_v(v0, _w, _hb))
        v1 = self.sample_prob(self.prob_v_given_h(h0, _w, _vb))
        h1 = self.prob_h_given_v(v1, _w, _hb)
      
        positive_grad = tf.matmul(tf.transpose(v0), h0)
        negative_grad = tf.matmul(tf.transpose(v1), h1)
        
        update_w = _w + self.learning_rate * (positive_grad - negative_grad) / tf.to_float(tf.shape(v0)[0])
        update_vb = _vb +  self.learning_rate * tf.reduce_mean(v0 - v1, 0)
        update_hb = _hb +  self.learning_rate * tf.reduce_mean(h0 - h1, 0)
        #还将误差定义为MSE
        err = tf.reduce_mean(tf.square(v0 - v1))
        
        error_list = []

        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            
            for epoch in range(self.epochs):
                for start, end in zip(range(0, len(X),  self.batchsize),range(self.batchsize,len(X), self.batchsize)):
                    batch = X[start:end]
                    cur_w = sess.run(update_w, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})
                    cur_hb = sess.run(update_hb, feed_dict={v0: batch,  _w: prv_w, _hb: prv_hb, _vb: prv_vb})
                    cur_vb = sess.run(update_vb, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})
                    prv_w = cur_w
                    prv_hb = cur_hb
                    prv_vb = cur_vb
                error = sess.run(err, feed_dict={v0: X, _w: cur_w, _vb: cur_vb, _hb: cur_hb})
                print ('Epoch: %d' % epoch,'reconstruction error: %f' % error)
                error_list.append(error)
            self.w = prv_w
            self.hb = prv_hb
            self.vb = prv_vb
            return error_list

（8）编写函数rbm_output()，功能是从RBM已学习的生成模型生成新特征。代码如下：

    def rbm_output(self, X):
        
        input_X = tf.constant(X)
        _w = tf.constant(self.w)
        _hb = tf.constant(self.hb)
        _vb = tf.constant(self.vb)
        out = tf.nn.sigmoid(tf.matmul(input_X, _w) + _hb)
        hiddenGen = self.sample_prob(self.prob_h_given_v(input_X, _w, _hb))
        visibleGen = self.sample_prob(self.prob_v_given_h(hiddenGen, _w, _vb))
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            return sess.run(out), sess.run(visibleGen), sess.run(hiddenGen)

（9）开始进行训练，为了实现更平衡的数据帧，使用函数drop()删除不必要的字符串列。代码如下：

deleted = 0
for i in range(0, df.shape[0]-1):
  if deleted == 1230:
    break
  elif df.iloc[i].bug == 0:
    df = df.drop(df.index[i])
    deleted = deleted + 1

#删除不必要的字符串列
df = df.drop(['commitdate','transactionid'], axis=1)

#分割df
train_X = df.iloc[:,:-1].apply(func=normalize, axis=0)
train_Y = df.iloc[:,-1]

# df=df.drop(['transactionid'], axis=1)
print(df.head())

执行后会输出：

   ns  nm  nf   entropy        la  ...       npt  exp    rexp  sexp  bug
0   1   1   3  0.579380  0.093620  ...  0.666667  143  133.50   129    1
1   1   1   1  0.000000  0.000000  ...  1.000000  140  140.00   137    1
3   1   1   8  0.685328  0.016039  ...  1.000000  579  479.25   550    0
5   1   1  16  0.760777  0.018308  ...  0.750000  595  495.25   566    0
7   2   2  33  0.816160  0.095682  ...  0.727273  482  382.25   474    0

[5 rows x 15 columns]

（10）在训练时设置限制玻尔兹曼机的参数，代码如下：

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

outputList = []
error_list = []

#遍历每个RBM输入输出列表
for i in range(0, len(rbm_list)):
    print('RBM', i+1)
    #训练新的RBM
    rbm = rbm_list[i]
    err = rbm.train(inputX)
    error_list.append(err)

    #返回输出层
    #sess.run(out), sess.run(visibleGen), sess.run(hiddenGen)
    outputX, reconstructedX, hiddenX = rbm.rbm_output(inputX)
    outputList.append(outputX)
    inputX= hiddenX

执行后会输出：

deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
Epoch: 0 reconstruction error: 0.113313
Epoch: 1 reconstruction error: 0.117359
Epoch: 2 reconstruction error: 0.114953
Epoch: 3 reconstruction error: 0.111951
Epoch: 4 reconstruction error: 0.110032
Epoch: 5 reconstruction error: 0.111399
Epoch: 6 reconstruction error: 0.111426
Epoch: 7 reconstruction error: 0.108291
Epoch: 8 reconstruction error: 0.109852
Epoch: 9 reconstruction error: 0.107341
Epoch: 10 reconstruction error: 0.103858
Epoch: 11 reconstruction error: 0.107152
Epoch: 12 reconstruction error: 0.108607
Epoch: 13 reconstruction error: 0.102160
Epoch: 14 reconstruction error: 0.106119
Epoch: 15 reconstruction error: 0.107382
Epoch: 16 reconstruction error: 0.108130
......
Epoch: 145 reconstruction error: 0.125590
Epoch: 146 reconstruction error: 0.122321
Epoch: 147 reconstruction error: 0.125713
Epoch: 148 reconstruction error: 0.127262
Epoch: 149 reconstruction error: 0.123992

（12）使用for循环遍历上面的每个RBM输入输出列表，并循环绘制重建误差和RBM的可视化图，代码如下：

i = 1
for err in error_list:
    print("RBM",i)
    pd.Series(err).plot(logy=False)
    plt.xlabel("Epoch")
    plt.ylabel("Reconstruction Error")
    plt.show()
    i += 1

执行效果如图10-6所示。

图10-6 可视化图

码农三叔

关注

23
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
0
评论
（10-2-01）TensorFlow深度信念网络实战：深度信念网络应用实战：程序缺陷预测(1)

经过本章前面内容的学习，已经基本了解了深度信念网络的基础知识。在本节的内容中，将通过具体实例的实现过程，详细讲解在在TensorFlow中使用深度信念网络的方法。本实例的功能是，使用 DBN 模型分别进行ANN、LogisticRegression、GaussianNB 和 RandomForestClassifier 的训练和集成，用于在有缺陷的代码和干净的代码之间进行分类。本项目程程序保存在ipynb文件中，将在谷歌的Colaboratory中调试运行。
复制链接

扫一扫