http://blog.csdn.net/gavin__zhou/article/details/52142696
FCN原理
原理我已经在上篇博客说过,大家可以参考FCN原理篇
代码
FCN
有官方的代码,具体地址是FCN官方代码
不过我用的不是这个代码,我用的是别人修改官方的版本的代码,使用Chainer
框架实现的,Chainer
的源码链接:
Chainer框架源码,如果大家使用过Keras
的话,应该对它不会感到特别的陌生,Chainer: a neural network framework
好了,我使用的代码是FCN的Chainer implementation, 具体地址是FCN Chainer implementation
安装
安装很简单,直接pip
或者源码安装都可以,但是我在我的机器上装过几次,发现使用pip
的方式最后fcn.data_dir
这个变量的值会指向到你系统的Python下的dist-packages这个目录,但是这个目录需要root权限,所以不推荐使用pip
直接安装的方式; 关于此问题的说明见:
fcn.data_dir的问题
所以我最后使用的是源码安装的方式,这里推荐使用virtualenv
工具建立虚拟环境,实践中发现这是最不会出错的方式,推荐使用!
clone代码
Git clone https://github.com/wkentaro/fcn.git –recursive
使用virtualenv安装
sudo pip install virtualenv #安装virtualenv
创建虚拟目录
virtualenv test-fcn
cd test-fcn
激活虚拟环境
source ./bin/activate
克隆fcn代码
git clone https://github.com/wkentaro/fcn.git –recursive
cd fcn
安装fcn
python setup.py develop
demo
下载VOC2012
数据集,放入fcn-data-pascal-VOC2012路径下
1. 转换caffe model为Chainer model
./scripts/caffe_to_chainermodel.py
2. load model,进行分割
./scripts/fcn_forward.py –img-files data/pascal/VOC2012/JPEGImages/2007_000129.jpg
训练自己的数据
这个前后搞了快一个月,才把最终的训练搞定,其中艰辛很多,在这里写出来供大家参考
准备自己的数据集
数据集做成VOC2012
的segementClass
的样子,下图是示例,上面一张是原图,下面一张是分割图
但是每一种label指定的物体都有对应的具体的颜色,这个我们犯了很多错,最后跟踪代码找出来的,具体的每一类的RGB值如下:
Index | RGB值 |
---|---|
0 | (0,0,0) |
1 | (0,128,0) |
2 | (128,128,0) |
3 | (0,0,128) |
4 | (128,0,128) |
5 | (0,128,128) |
6 | (128,128,128) |
7 | (64,0,0) |
8 | (192,0,0) |
9 | (62,128,0) |
10 | (192,128,0 |
这里只列出10类的值,更多类的可以看下面这段代码:
<code class="language-python hljs has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">bitget</span><span class="hljs-params">(byteval, idx)</span>:</span> <span class="hljs-keyword">return</span> ((byteval & (<span class="hljs-number">1</span> << idx)) != <span class="hljs-number">0</span>) <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">labelcolormap</span><span class="hljs-params">(N=<span class="hljs-number">256</span>)</span>:</span> cmap = np.zeros((N, <span class="hljs-number">3</span>)) <span class="hljs-comment">#N是类别数目</span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> xrange(<span class="hljs-number">0</span>, N): id = i r, g, b = <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> xrange(<span class="hljs-number">0</span>, <span class="hljs-number">8</span>): r = np.bitwise_or(r, (bitget(id, <span class="hljs-number">0</span>) << <span class="hljs-number">7</span>-j)) g = np.bitwise_or(g, (bitget(id, <span class="hljs-number">1</span>) << <span class="hljs-number">7</span>-j)) b = np.bitwise_or(b, (bitget(id, <span class="hljs-number">2</span>) << <span class="hljs-number">7</span>-j)) id = (id >> <span class="hljs-number">3</span>) cmap[i, <span class="hljs-number">0</span>] = r cmap[i, <span class="hljs-number">1</span>] = g cmap[i, <span class="hljs-number">2</span>] = b cmap = cmap.astype(np.float32) / <span class="hljs-number">255</span> <span class="hljs-comment">#获得Cmap的RGB值</span> <span class="hljs-keyword">return</span> cmap <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_label_rgb_to_32sc1</span><span class="hljs-params">(self, label_rgb)</span>:</span> <span class="hljs-keyword">assert</span> label_rgb.dtype == np.uint8 label = np.zeros(label_rgb.shape[:<span class="hljs-number">2</span>], dtype=np.int32) label.fill(-<span class="hljs-number">1</span>) cmap = fcn.util.labelcolormap(len(self.target_names)) cmap = (cmap * <span class="hljs-number">255</span>).astype(np.uint8) <span class="hljs-comment">#转换为整数值</span> <span class="hljs-keyword">for</span> l, rgb <span class="hljs-keyword">in</span> enumerate(cmap): mask = np.all(label_rgb == rgb, axis=-<span class="hljs-number">1</span>) label[mask] = l <span class="hljs-keyword">return</span> label</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul>
按照此颜色表做图就没有问题,代码可以正确的读取分割的ground-truth结果
原始的图像放在fcn/data/pascal/VOC2012/JPEGImages
分割的图像放在fcn/data/pascal/VOC2012/SegmentationClass
之后在fcn/data/pascal/VOC2012/ImageSets/Segmentation
写train.txt
,trainval.txt
,val.txt
,写入需要进行相应任务的图片的编号
修改代码
fcn/scripts/fcn_train.py
<code class="language-python hljs has-numbering"><span class="hljs-comment"># setup optimizer</span> optimizer = O.MomentumSGD(lr=<span class="hljs-number">1e-10</span>, momentum=<span class="hljs-number">0.99</span>) <span class="hljs-comment">#这里的lr一定要小,大的话程序会报错,我使用的是1e-9</span> optimizer.setup(model) <span class="hljs-comment"># train</span> trainer = fcn.Trainer( dataset=dataset, model=model, optimizer=optimizer, weight_decay=<span class="hljs-number">0.0005</span>, test_interval=<span class="hljs-number">1000</span>, max_iter=<span class="hljs-number">100000</span>, snapshot=<span class="hljs-number">4000</span>, gpu=gpu, )</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li></ul>
fcn/fcn/pascal.py
<code class="language-python hljs has-numbering">target_names = np.array([ <span class="hljs-string">'background'</span>, <span class="hljs-string">'aeroplane'</span>, <span class="hljs-string">'bicycle'</span>, <span class="hljs-string">'bird'</span>, <span class="hljs-string">'boat'</span>, <span class="hljs-string">'bottle'</span>, <span class="hljs-string">'bus'</span>, <span class="hljs-string">'car'</span>, <span class="hljs-string">'cat'</span>, <span class="hljs-string">'chair'</span>, <span class="hljs-string">'cow'</span>, <span class="hljs-string">'diningtable'</span>, <span class="hljs-string">'dog'</span>, <span class="hljs-string">'horse'</span>, <span class="hljs-string">'motorbike'</span>, <span class="hljs-string">'person'</span>, <span class="hljs-string">'potted plant'</span>, <span class="hljs-string">'sheep'</span>, <span class="hljs-string">'sofa'</span>, <span class="hljs-string">'train'</span>, <span class="hljs-string">'tv/monitor'</span>, ]) <span class="hljs-comment">#修改成自己的,记得按照颜色表写</span> </code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li></ul>
fcn/fcn/util.py
<code class="language-python hljs has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">resize_img_with_max_size</span><span class="hljs-params">(img, max_size=<span class="hljs-number">500</span>*<span class="hljs-number">500</span>)</span>:</span> <span class="hljs-comment">#修改max_size,按照实际写</span> <span class="hljs-string">"""Resize image with max size (height x width)"""</span> <span class="hljs-keyword">from</span> skimage.transform <span class="hljs-keyword">import</span> rescale height, width = img.shape[:<span class="hljs-number">2</span>] scale = max_size / (height * width) resizing_scale = <span class="hljs-number">1</span> <span class="hljs-keyword">if</span> scale < <span class="hljs-number">1</span>: resizing_scale = np.sqrt(scale) img = rescale(img, resizing_scale, preserve_range=<span class="hljs-keyword">True</span>) img = img.astype(np.uint8) <span class="hljs-keyword">return</span> img, resizing_scale</code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li></ul>
fcn/fcn/models/fcn32s.py
<code class="language-python hljs has-numbering"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, n_class=<span class="hljs-number">21</span>)</span>:</span> <span class="hljs-comment">#修改类别n_class</span> self.n_class = n_class super(self.__class__, self).__init__( conv1_1=L.Convolution2D(<span class="hljs-number">3</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">100</span>), conv1_2=L.Convolution2D(<span class="hljs-number">64</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv2_1=L.Convolution2D(<span class="hljs-number">64</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv2_2=L.Convolution2D(<span class="hljs-number">128</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv3_1=L.Convolution2D(<span class="hljs-number">128</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv3_2=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv3_3=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv4_1=L.Convolution2D(<span class="hljs-number">256</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv4_2=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv4_3=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv5_1=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv5_2=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), conv5_3=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">1</span>), fc6=L.Convolution2D(<span class="hljs-number">512</span>, <span class="hljs-number">4096</span>, <span class="hljs-number">7</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>), fc7=L.Convolution2D(<span class="hljs-number">4096</span>, <span class="hljs-number">4096</span>, <span class="hljs-number">1</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>), score_fr=L.Convolution2D(<span class="hljs-number">4096</span>, self.n_class, <span class="hljs-number">1</span>, stride=<span class="hljs-number">1</span>, pad=<span class="hljs-number">0</span>), upscore=L.Deconvolution2D(self.n_class, self.n_class, <span class="hljs-number">64</span>, stride=<span class="hljs-number">32</span>, pad=<span class="hljs-number">0</span>), ) self.train = <span class="hljs-keyword">False</span></code><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul><ul style="" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li><li>17</li><li>18</li><li>19</li><li>20</li><li>21</li><li>22</li><li>23</li><li>24</li><li>25</li><li>26</li><li>27</li><li>28</li><li>29</li><li>30</li></ul>
训练
./scripts/fcn_train.py
-
其会在
fcn/data/
下创建一个目录叫做SegmentationClassDataset_db
,里面存放训练的图片的pickle数据,如果需要修改原始的训练图片则需要将此目录删除,否则默认读取此目录内的pickle数据作为图像的原始数据 -
会在
fcn
下创建snapshot
这个目录,里面有训练保存的model
,日志文件等,重新训练的话,建议删除此目录
使用自己训练的model
./scripts/fcn_forward.py -c path/to/your/model -i path/to/your/image
结果存放在fcn/data/forward_out
下