ConvNetJS源代码解析第一篇

最新推荐文章于 2024-05-10 10:01:26 发布

Richard_More

最新推荐文章于 2024-05-10 10:01:26 发布

阅读量9.1k

点赞数 4

分类专栏：神经网络文章标签： javascript 卷积神经网络

本文链接：https://blog.csdn.net/Richard_More/article/details/50588860

版权

神经网络专栏收录该内容

22 篇文章 2 订阅

订阅专栏

一. 背景

最近有文章评论Karpathy的基于JavaScript的卷积神经网络在Github下载量名列前茅。其官方主页，Github代码,以及正在更新的Blog。原因有两个，一个是JavaScript，这是一种脚本语言，其好处在于用户除了安装IE等浏览器，而不需要安装其他编译的package,就可以直接运行程序，其也更利于可视化。另外就是卷积神经网络成为处理图片的一个流行且有效的算法，TED上有一个Li FeiFei的How We Teach Computers to understand pictures。

以前，都是Python写，最近一个星期看了JavaScript.现在尝试以一个初学JavaScript的角度解析ConvNetJS.此文主要涉及JavaScript的代码结构，不太涉及太多的原理。比较适合有刚学JavaScript的读者。

另外，我看了一个星期的《JavaScript高级程序》这本书，本书是一个朋友推荐的，可以在网上搜到免费的pdf文件。在JavaScripts编程时，建议阅读书中的第七章的对象，以及第八章的函数表达式。如果是完全小白，那么最好阅读之前的六章。

二.JavaScript代码解析

下载Github代码,解压，打开文件夹src.其主要的类在这里实现。至于其官网上的compile编译，可做可不做，因为其编译过程大概意思就是讲src中的代码文件按照一定的顺序合并成build文件夹中convnet.js文件。至于其官网上面的花哨的例子只能在编译之后并且下载完需要数据（Github代码中并没有数据）以后才可以运行。因为这些例子都是基于convnet的API,因此要想自己搞清楚那些例子，必须从基础的convnet搞起。我会在后续的更新中说明这些例子。接下俩几篇文章的分析是src文件夹中的js文件。

我们按照官网的documentation的解析顺序，

convnet_init.js;convnet_util.js;convnet_vol.js;convnet_vol_util.js;

Layer(convnet_layers_dotproducts.js;convnet_layers_dropout.js;convnet_layers_input.js;convnet_layers_loss.js;

convnet_layers_nonlinearities.js;convnet_layers_normalization.js;convnet_layers_pool.js),convnet_net.js;

convnet_trainers.js。第一篇文章将分析前面四个，主要说明其代码的主要结构和使用的数据结构Vol。

2.1 convnet_init.js

只有一行代码：

 var convnetjs = convnetjs || { REVISION: 'ALPHA' };

这一行代码的意思就是创建名为convnetjs的对象，并定义了一个名为REVISION的属性，值为‘ALPHA’。后面的代码格式基于都是围绕convnetjs类的继续定义。

2.2 convnet_util.js

该代码继续定义了convnet类的一些公共属性。文件是一个闭包语句。闭包是一个函数，其中传递的参数就是我们在convnet_init.js中定义的convnetjs的类对象。使用闭包的形式是为了屏蔽在这个文件中定义一些局部变量。具体关于闭包的用法可以参见《JavaScript高级程序》中的第八章中闭包这一节。另外除了闭包，其中另外一个是函数表达式,比如定义randf()时，其作用是返回输入参数number类型变量a,b，并随机返回之间的数字。其中定义的函数为匿名函数，并将其赋值给randf,其中randf是一个函数类型的变量。这一点与C中函数指针类似。但是在C中需要事先声明变量的类型，但是在JS中，不需要事先声明。最后在完成一些的函数定义之后，建立convnetjs的属性并赋值。

(function(global) {
  "use strict";
   var randf = function(a, b) { return Math.random()*(b-a)+a; }
    .......

  global.randf = randf;
  global.randi = randi;
  global.randn = randn;
  global.zeros = zeros;
  global.maxmin = maxmin;
  global.randperm = randperm;
  global.weightedSample = weightedSample;
  global.arrUnique = arrUnique;
  global.arrContains = arrContains;
  global.getopt = getopt;
  global.assert = assert;
  
})(convnetjs);

以上代码是部分代码的截断。

2.3convnetjs_vol.js

Vol是全局类convnetjs的一个属性，其算法逻辑是基本的数据单元，里面定义了一些输入数据的信息，如输入数据的长度，宽度和深度。比如对于卷积神经网络大多处理的是图片，如25*25*RBG图片，因此必须使用三维矩阵存储该数据点。而对于一般的神经网络，可以把数据存储在一个维度上，因此其宽度和深度为1。因此Vol可以认为是整个代码体系的自定义的数据结构，就像C中定义的结构体。在Vol类的定义中分为两个文件完成，一个是convnetjs_vol.js，完成了类Vol定义的主体。另一个是convnet_vol_util.js则是对于上面定义类补充了两个公共函数方法。【下面代码中的中文注释为我自加的注释。】

(function(global) {
  "use strict";


  // Vol is the basic building block of all data in a net.
  // it is essentially just a 3D volume of numbers, with a
  // width (sx), height (sy), and depth (depth).
  // it is used to hold data for all filters, all volumes,
  // all weights, and also stores all gradients w.r.t. 
  // the data. c is optionally a value to initialize the volume
  // with. If c is missing, fills the Vol with random numbers.
  var Vol = function(sx, sy, depth, c) {
    // this is how you check if a variable is an array. Oh, Javascript :)
    if(Object.prototype.toString.call(sx) === '[object Array]') {                   // 这个长长的函数完成了对输入数据类型的判断，如果使用typeof，只能返回一个object
      // we were given a list in sx, assume 1D volume and fill it up
      this.sx = 1;
      this.sy = 1;
      this.depth = sx.length;
      // we have to do the following copy because we want to use
      // fast typed arrays, not an ordinary javascript array
      this.w = global.zeros(this.depth);
      this.dw = global.zeros(this.depth);
      for(var i=0;i<this.depth;i++) {
        this.w[i] = sx[i];
      }
    } else {
      // we were given dimensions of the vol
      this.sx = sx;
      this.sy = sy;
      this.depth = depth;
      var n = sx*sy*depth;
      this.w = global.zeros(n);
      this.dw = global.zeros(n);
      if(typeof c === 'undefined') {
        // weight normalization is done to equalize the output
        // variance of every neuron, otherwise neurons with a lot
        // of incoming connections have outputs of larger variance
        var scale = Math.sqrt(1.0/(sx*sy*depth));
        for(var i=0;i<n;i++) { 
          this.w[i] = global.randn(0.0, scale);
        }
      } else {
        for(var i=0;i<n;i++) { 
          this.w[i] = c;
        }
      }
    }
  }


  Vol.prototype = {
    get: function(x, y, d) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      return this.w[ix];
    },
    set: function(x, y, d, v) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      this.w[ix] = v; 
    },
    add: function(x, y, d, v) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      this.w[ix] += v; 
    },
    get_grad: function(x, y, d) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      return this.dw[ix]; 
    },
    set_grad: function(x, y, d, v) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      this.dw[ix] = v; 
    },
    add_grad: function(x, y, d, v) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      this.dw[ix] += v; 
    },
    cloneAndZero: function() { return new Vol(this.sx, this.sy, this.depth, 0.0)},
    clone: function() {
      var V = new Vol(this.sx, this.sy, this.depth, 0.0);
      var n = this.w.length;
      for(var i=0;i<n;i++) { V.w[i] = this.w[i]; }
      return V;
    },
    addFrom: function(V) { for(var k=0;k<this.w.length;k++) { this.w[k] += V.w[k]; }},
    addFromScaled: function(V, a) { for(var k=0;k<this.w.length;k++) { this.w[k] += a*V.w[k]; }},
    setConst: function(a) { for(var k=0;k<this.w.length;k++) { this.w[k] = a; }},


    toJSON: function() {
      // todo: we may want to only save d most significant digits to save space
      var json = {}
      json.sx = this.sx; 
      json.sy = this.sy;
      json.depth = this.depth;
      json.w = this.w;
      return json;
      // we wont back up gradients to save space
    },
    fromJSON: function(json) {
      this.sx = json.sx;
      this.sy = json.sy;
      this.depth = json.depth;


      var n = this.sx*this.sy*this.depth;
      this.w = global.zeros(n);
      this.dw = global.zeros(n);
      // copy over the elements.
      for(var i=0;i<n;i++) {
        this.w[i] = json.w[i];
      }
    }
  }


  global.Vol = Vol;
})(convnetjs);

同上面的代码结构类似，定义Vol的类，并将其赋值给convnetjs的属性Vol.与上面不同的是这里定义的是类Vol。关于类Vol的定义，我只能说是一个完美学习JS类定义的程序。建议对于一个JS的入门者，先看书中的第七章类的定义。在返回看这里的Vol的定义。废话不说了，转入正题：

Vol的定义就是两句话，第一句话的格式是 var Vol = function(sx, sy, depth, c) {this.....}这样的形式，查看书中p145中关于构造函数的模式。第二句话的格式是Vol.prototype = {
get: function(x, y, d) {
var ix=((this.sx * y)+x)*this.depth+d;
return this.w[ix];
},
set: function(x, y, d, v) {
var ix=((this.sx * y)+x)*this.depth+d;
this.w[ix] = v;
},

......

}这样的格式是定义类的一些公共属性。这些公共属性是指对于类的每一个实例，都会共享一个函数指针。这是必要的，因为对于每一个实例，如果都定义相应的函数，那将会浪费大量的内存。

其官方的关于Vol的说明如下：

The entire library is based around transforming 3-dimensional volumes of numbers. These volumes are stored in theVol class, which is at the heart of the library. The Vol class is a wrapper around:

a 1-dimensional list of numbers (the activations, in field .w)
their gradients (field .dw)
and lastly contains three dimensions (fields .sx, .sy, .depth).

// create a Vol of size 32x32x3, and filled with random numbers
var v = new convnetjs.Vol(32, 32, 3);
var v = new convnetjs.Vol(32, 32, 3, 0.0); // same volume but init with zeros
var v = new convnetjs.Vol(1, 1, 3); // a 1x1x3 Vol with random numbers
 
// you can also initialize with a specific list. E.g. create a 1x1x3 Vol:
var v = new convnetjs.Vol([1.2, 3.5, 3.6]);
 
// the Vol is a wrapper around two lists: .w and .dw, which both have 
// sx * sy * depth number of elements. E.g:
v.w[0] // contains 1.2
v.dw[0] // contains 0, because gradients are initialized with zeros
 
// you can also access the 3-D Vols with getters and setters
// but these are subject to function call overhead
var vol3d = new convnetjs.Vol(10, 10, 5);
vol3d.set(2,0,1,5.0); // set coordinate (2,0,1) to 5.0
vol3d.get(2,0,1) // returns 5.0

通过上面的代码分析，我们可以清楚的明白上面的例子。

2.4 . convnet_vol_util.js

(function(global) {
  "use strict";
  var Vol = global.Vol; // convenience

  // Volume utilities
  // intended for use with data augmentation
  // crop is the size of output
  // dx,dy are offset wrt incoming volume, of the shift
  // fliplr is boolean on whether we also want to flip left<->right
  var augment = function(V, crop, dx, dy, fliplr) {
    // note assumes square outputs of size crop x crop
    if(typeof(fliplr)==='undefined') var fliplr = false;
    if(typeof(dx)==='undefined') var dx = global.randi(0, V.sx - crop);
    if(typeof(dy)==='undefined') var dy = global.randi(0, V.sy - crop);
    
    // randomly sample a crop in the input volume
    var W;
    if(crop !== V.sx || dx!==0 || dy!==0) {
      W = new Vol(crop, crop, V.depth, 0.0);
      for(var x=0;x<crop;x++) {
        for(var y=0;y<crop;y++) {
          if(x+dx<0 || x+dx>=V.sx || y+dy<0 || y+dy>=V.sy) continue; // oob
          for(var d=0;d<V.depth;d++) {
           W.set(x,y,d,V.get(x+dx,y+dy,d)); // copy data over
          }
        }
      }
    } else {
      W = V;
    }

    if(fliplr) {
      // flip volume horziontally
      var W2 = W.cloneAndZero();
      for(var x=0;x<W.sx;x++) {
        for(var y=0;y<W.sy;y++) {
          for(var d=0;d<W.depth;d++) {
           W2.set(x,y,d,W.get(W.sx - x - 1,y,d)); // copy data over
          }
        }
      }
      W = W2; //swap
    }
    return W;
  }

  // img is a DOM element that contains a loaded image
  // returns a Vol of size (W, H, 4). 4 is for RGBA
  var img_to_vol = function(img, convert_grayscale) {

    if(typeof(convert_grayscale)==='undefined') var convert_grayscale = false;

    var canvas = document.createElement('canvas');
    canvas.width = img.width;
    canvas.height = img.height;
    var ctx = canvas.getContext("2d");

    // due to a Firefox bug
    try {
      ctx.drawImage(img, 0, 0);
    } catch (e) {
      if (e.name === "NS_ERROR_NOT_AVAILABLE") {
        // sometimes happens, lets just abort
        return false;
      } else {
        throw e;
      }
    }

    try {
      var img_data = ctx.getImageData(0, 0, canvas.width, canvas.height);
    } catch (e) {
      if(e.name === 'IndexSizeError') {
        return false; // not sure what causes this sometimes but okay abort
      } else {
        throw e;
      }
    }

    // prepare the input: get pixels and normalize them
    var p = img_data.data;
    var W = img.width;
    var H = img.height;
    var pv = []
    for(var i=0;i<p.length;i++) {
      pv.push(p[i]/255.0-0.5); // normalize image pixels to [-0.5, 0.5]
    }
    var x = new Vol(W, H, 4, 0.0); //input volume (image)
    x.w = pv;

    if(convert_grayscale) {
      // flatten into depth=1 array
      var x1 = new Vol(W, H, 1, 0.0);
      for(var i=0;i<W;i++) {
        for(var j=0;j<H;j++) {
          x1.set(i,j,0,x.get(i,j,0));
        }
      }
      x = x1;
    }

    return x;
  }
  
  global.augment = augment;
  global.img_to_vol = img_to_vol;

})(convnetjs);

Richard_More

关注

4
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
ConvNetJS源代码解析第一篇

一. 背景最近有文章评论Karpathy的基于JavaScript的卷积神经网络在Github下载量名列前茅。其官方主页，Github代码,以及正在更行的Blog。原因有两个，一个是JavaScript，这是一种脚本语言，其好处在于用户除了安装IE等浏览器，而不需要安装其他编译的package,就可以直接运行程序，其也更利于可视化。另外就是卷积神经网络成为处理图片的一个流行且有效的算法，TED
复制链接

扫一扫