10月26日，AlexNet学习汇总

最新推荐文章于 2025-03-05 07:00:00 发布

Pierce_KK

最新推荐文章于 2025-03-05 07:00:00 发布

阅读量469

点赞数

分类专栏： CNN初步与AlexNet

本文链接：https://blog.csdn.net/Pierce_KK/article/details/83421850

版权

CNN初步与AlexNet 专栏收录该内容

2 篇文章

订阅专栏

AlexNet网络的实现

关于AlexNet的一些介绍，https://www.cnblogs.com/gongxijun/p/6027747.html

里面有一些关于维度的计算，我还是没弄太清楚

这里还有一些更加详细了，可以深入的理解看一下。

https://www.cnblogs.com/alexanderkun/p/6917984.html

https://www.cnblogs.com/alexanderkun/p/6917985.html

https://www.cnblogs.com/alexanderkun/p/6918045.html

这是一个实践的框架，https://blog.csdn.net/MargretWG/article/details/70491745?locationNum=1&fps=1

是基于这个外国人的文章写的https://kratzert.github.io/2017/02/24/finetuning-alexnet-with-tensorflow.html

这里还有这样的一篇文章https://blog.csdn.net/btbujhj/article/details/73302970

里面讲述的内容也很详细，还有一些算法案例

、

1，init的用法

（知乎搬运）

定义类的时候，若是添加__init__方法，那么在创建类的实例的时候，实例会自动调用这个方法，一般用来对实例的属性进行初使化。比如：
class testClass:
def __init__(self, name, gender): //定义 __init__方法，这里有三个参数，这个self指的是一会创建类的实例的时候这个被创建的实例本身（例中的testman），你也可以写成其他的东西，比如写成me也是可以的，这样的话下面的self.Name就要写成me.Name。
self.Name=name //通常会写成self.name=name，这里为了区分前后两个是不同的东东，把前面那个大写了，等号左边的那个Name（或name）是实例的属性，后面那个是方法__init__的参数，两个是不同的）
self.Gender=gender //通常会写成self.gender=gender
print('hello') //这个print('hello')是为了说明在创建类的实例的时候，__init__方法就立马被调用了。

testman = testClass('neo,'male') //这里创建了类testClass的一个实例 testman, 类中有__init__这个方法，在创建类的实例的时候，就必须要有和方法__init__匹配的参数了，由于self指的就是创建的实例本身，self是不用传入的，所以这里传入两个参数。这条语句一出来，实例testman的两个属性Name，Gender就被赋值初使化了，其中Name是 neo，Gender 是male。

1.5 lambda函数的用法

re, 看懂代码应该就明白是什么意思了

# -*- coding: utf-8 -*-
# 匿名函数lambda
def sum(x,y):
    return x+y
print('common use',sum(3,5))

p = lambda x,y:x+y
# 匿名函数没有返回值，自己本身就是返回值
print('lambda uses',p(7,8))

common use 8
lambda uses 15

2，dropout & dropout rate

ropout是hintion最近2年提出的；为了防止模型过拟合，Dropout可以作为一种trikc供选择。在hinton的论文摘要中指出，在每个训练批次中，通过忽略一半的特征检测器（让一半的隐层节点值为0），可以明显地减少过拟合现象。这种方式可以减少特征检测器间的相互作用，检测器相互作用是指某些检测器依赖其他检测器才能发挥作用。

https://www.jianshu.com/p/b5e93fa01385

https://blog.csdn.net/stdcoutzyx/article/details/49022443

https://www.cnblogs.com/zyber/p/6824980.html

3. input_channels = int(x.get_shape()[-1])

get_shape函数（）

函数主要用于获取一个张量的维度，并且输出张量每个维度上面的值，如果是二维矩阵，也就是输出行和列的值，使用非常方便。

import tensorflow as tf;  
 
with tf.Session() as sess:
	A = tf.random_normal(shape=[3,4])


	print A.get_shape()
	print A.get_shape


输出：
(3, 4)
<bound method Tensor.get_shape of <tf.Tensor 'random_normal:0' shape=(3, 4) dtype=float32>>

注意：第一个输出是一个元祖，就是数值，而第二输出就是一个张量的对象，里面包含更多的东西，在不同的情况下，使用不同的方式。如果你需要输出某一个维度上面的值那就用下面的这种方式就好了。

A.get_shape()[0]

这就表示第一个维度。

4， lambda函数

( 也就是匿名函数！！！ )

在python中有一个匿名函数lambda，匿名函数顾名思义就是指：是指一类无需定义标识符（函数名）的函数或子程序。

# -*- coding: UTF-8 -*-
f = lambda x,y,z:x + y + z

print f(1,2,3)
print f(4,5,6)

输出：
6
15

使用lambda函数应该注意的几点：

lambda定义的是单行函数，如果需要复杂的函数，应该定义普通函数
lambda参数列表可以包含多个参数，如 lambda x, y: x + y
lambda中的表达式不能含有命令，而且只限一条表达式

5， with tf.variable_scope(name) as scope: 的用法

tf.variable_scope() 主要结合 tf.get_variable() 来使用，实现变量共享。

'''
Signature: tf.name_scope(*args, **kwds)
Docstring:
Returns a context manager for use when defining a Python op.
'''

这两篇博客说的已经非常详细了

https://www.cnblogs.com/adong7639/p/8136273.html

tf.get_variable 和tf.variable_scope
https://www.aliyun.com/jiaocheng/519743.html

6， weights = tf.get_variable

就是一个变量创建的函数：

tf.get_variable(name, shape, initializer): name就是变量的名称，shape是变量的维度，initializer是变量初始化的方式

使用tf.Variable时，如果检测到命名冲突，系统会自己处理。使用tf.get_variable()时，系统不会处理冲突，而会报错

基于这两个函数的特性，当我们需要共享变量的时候，需要使用tf.get_variable()。在其他情况下，这两个的用法是一样的

tensorflow中有两个关于variable的op，tf.Variable()与tf.get_variable()下面介绍这两个的区别

https://blog.csdn.net/u012436149/article/details/53696970

7, tf.split & tf.concat

tf.split( value, num_or_size_splits, axis=0, num=None, name='split' )

这个函数是用来切割张量的。输入切割的张量和参数，返回切割的结果。
value传入的就是需要切割的张量。
这个函数有两种切割的方式：

以三个维度的张量为例，比如说一个20 * 30 * 40的张量my_tensor，就如同一个长20厘米宽30厘米高40厘米的蛋糕，每立方厘米都是一个分量。

有两种切割方式：
1. 如果num_or_size_splits传入的是一个整数，这个整数代表这个张量最后会被切成几个小张量。此时，传入axis的数值就代表切割哪个维度（从0开始计数）。调用tf.split(my_tensor, 2，0)返回两个10 * 30 * 40的小张量。
2. 如果num_or_size_splits传入的是一个向量，那么向量有几个分量就分成几份，切割的维度还是由axis决定。比如调用tf.split(my_tensor, [10, 5, 25], 2)，则返回三个张量分别大小为 20 * 30 * 10、20 * 30 * 5、20 * 30 * 25。很显然，传入的这个向量各个分量加和必须等于axis所指示原张量维度的大小 (10 + 5 + 25 = 40)。

tf.concat( )和tf.stack( )

https://www.cnblogs.com/mdumpling/p/8053474.html

8, tf.nn.xw_plus_b

tf.nn.xw_plus_b((x, weights) + biases)

相当于tf.matmul(x, weights) + biases

#-*-coding:utf8-*-  
import tensorflow as tf  
x=[[1, 2, 3],[4, 5, 6]]  
w=[[ 7,  8],[ 9, 10],[11, 12]]  
b=[[3,3],[3,3]]  
result1=tf.nn.xw_plus_b(x,w,[3,3])  
result2=tf.matmul(x, w) + b
init_op = tf.initialize_all_variables()  

with tf.Session() as sess:  
    # Run the init operation.  
    sess.run(init_op)  
    print(sess.run(result1))  
    print(sess.run(result2))

结果为

[[ 61  67]
 [142 157]]
[[ 61  67]
 [142 157]]

调试部分的一些函数*******

Part 1

import os
import cv2
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt


#  mean of imagenet dataset in BGR
imagenet_mean = np.array([104., 117., 124.], dtype=np.float32)

#  add path of testImages
current_dir = os.getcwd()
image_dir = os.path.join(current_dir, 'images')

#  通过这部分代码便可以找到我们数据集的位置了  #

# 这里opencv的输入时BGR格式，并不是我们所熟悉的RGB格式，所以要注意在最后进行转换

os.getcwd（）

在Python中可以使用os.getcwd()函数获得当前的路径。

其原型如下所示：

os.getcwd()

该函数不需要传递参数，它返回当前的目录。需要说明的是，当前目录并不是指脚本所在的目录，而是所运行脚本的目录。

os.path.join （）

os.path.join(os.getcwd(),'data')就是获取当前目录，并组合成新目录

以下部分摘自https://www.cnblogs.com/donfaquir/p/9042673.html

在使用的过程中，我使用如下代码：

import os
path = "F:/gts/gtsdate/"
b = os.path.join(path,"/abc")

输出结果是：

'F:/abc'

并不是我期望的：

"F:/gts/gtsdate/abc"

原因是在os.path.join()第二个参数"/abc"起始字符是/。
删除该字符即可，也就是

b = os.path.join(path,"abc")

os.path常用方法介绍http://www.cnblogs.com/wuxie1989/p/5623435.html

Part 2

#get list of all images
img_files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.jpeg')]

#load all images
imgs = []
for f in img_files:
    imgs.append(cv2.imread(f))
    
#plot images
fig = plt.figure(figsize=(15,6))
for i, img in enumerate(imgs):
    fig.add_subplot(1,3,i+1)
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.axis('off')

#  到这里我们就将我们dataset中的图片读取了进来  #

img_files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.jpeg')]

熟悉一下这种写法

首先是路径的添加，使用os,path.join()函数，将f 添加到路径 image_dir的后面构成我们的 img_files

而 f 的生成则依赖一个for循环和一个if判断语句

f 是image_dir 这个文件夹中，以.jpeg为结尾的文件，也就是图像文件。（这里要注意，如果你强行将文件名更改为.jpeg是不能够读取到的，必须要详细的查看文件的属性才可以）

os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。

这个列表以字母顺序。它不包括 '.' 和'..' 即使它在文件夹中。listdir()方法语法格式如下：

os.listdir(path)--------（path -- 需要列出的目录路径）

返回为指定路径下的文件和文件夹列表。

for i, img in enumerate(imgs):

i 保存图像的编号，img则保存图像信息

enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标，一般用在 for 循环当中。

seq = ['one', 'two', 'three']
 for i, element in enumerate(seq):
    print i, element

结果： 
        0   one
        1   two
        2   three

plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

我们生活中大多数看到的彩色图片都是RGB类型，但是在进行图像处理时，需要用到灰度图、二值图、HSV、HSI等颜色制式，opencv提供了cvtColor()函数来实现这些功能。首先看一下cvtColor函数定义：

cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0 );

具体内容参见https://blog.csdn.net/keith_bb/article/details/53470170

Part 3

from alexnet import AlexNet
from caffe_classes import class_names

#placeholder for input and dropout rate
x = tf.placeholder(tf.float32, [1, 227, 227, 3])
keep_prob = tf.placeholder(tf.float32)


#create model with default config ( == no skip_layer and 1000 units in the last layer)
#  哈哈， 这最关键的一部居然如此的轻描淡写
model = AlexNet(x, keep_prob, 1000, [])


#define activation of last layer as score
score = model.fc8


#create op to calculate softmax 
softmax = tf.nn.softmax(score)

def __init__(self, x, keep_prob, num_classes, skip_layer, weights_path = 'DEFAULT'):

model = AlexNet ( x, keep_prob, 1000, [ ] )

我们根据__init__函数来看这个 model

self.X = x
self.KEEP_PROB = keep_prob
self.NUM_CLASSES = 1000
self.SKIP_LAYER = 空
weights_path == 'DEFAULT'

score = model.fc8 ### 将最后一层的activation 作为score

同样的，我们还是对比来看：

def fc ( x, num_in, num_out, name, relu = True ) :

self.fc8 = fc ( dropout7, 4096, self.NUM_CLASSES, relu = False, name='fc8' )

Part 4

with tf.Session() as sess:
    
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Load the pretrained weights into the model
    model.load_initial_weights(sess)
    
    # Create figure handle
    fig2 = plt.figure(figsize=(15,6))
    
    # Loop over all images
    for i, image in enumerate(imgs):
        
        # Convert image to float32 and resize to (227x227)
        img = cv2.resize(image.astype(np.float32), (227,227))
        
        # Subtract the ImageNet mean
        img -= imagenet_mean
        
        # Reshape as needed to feed into model
        img = img.reshape((1,227,227,3))
        
        # Run the session and calculate the class probability
        probs = sess.run(softmax, feed_dict={x: img, keep_prob: 1})
        
        # Get the class name of the class with the highest probability
        class_name = class_names[np.argmax(probs)]
        
        # Plot image with class name and prob in the title
        fig2.add_subplot(1,3,i+1)
        plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        plt.title("Class: " + class_name + ", probability: %.4f" %probs[0,np.argmax(probs)])
        plt.axis('off')

model.load_initial_weights(sess)

这里调用了实例model中的 load_initial_weights 函数，我们看一下他的原型

def load_initial_weights(self, session):

************************

# Convert image to float32 and resize to (227x227)
img = cv2.resize(image.astype(np.float32), (227,227))

astype（）的作用：

很多时候我们用numpy从文本文件读取数据作为numpy的数组，默认的dtype是float64

但是有些场合我们希望有些数据列作为整数, 如果直接改dtype='int'的话，就会出错！原因如上，数组长度翻倍了！！！

怎么办？用astype！

>>> b = np.array([1.23,12.201,123.1])
>>>
>>> b
array([   1.23 ,   12.201,  123.1  ])
>>> b.dtype
dtype('float64')
>>> c = b.astype(int)
>>> c
array([  1,  12, 123])
>>> c.dtype
dtype('int32')