Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create

Note: Many of the code snippets can take a very long time without GPU speedup. Please install the GPU version of GraphLab Create to follow along. 

One of machine learning’s core goals is  classification of input data. This is the task of taking novel data and assigning it to one of a pre-determined number of labels, based on what the classifier learns from a training set. For instance, a classifier could take an image and predict whether it is a cat or a dog. canvas_boosted_tree

The pieces of information fed to a classifier for each data point are called features, and the category they belong to is a ‘target’ or ‘label’. Typically, the classifier is given data points with both features and labels, so that it can learn the correspondence between the two. Later, the classifier is queried with a data point and the classifier tries to predict what category it belongs to. A large group of these query data-points constitute a prediction-set, and the classifier is usually evaluated on its accuracy, or how many prediction queries it gets correct.

There are many methods to perform classification, such as SVMs, logistic regression, deep learning, and more. To read about the different methods GraphLab Create supports, I invite you to read the API Documentation. Today, however, we’ll focus on deep learning methods, which have recently been shown to give incredible results on challenging problems. Yet this comes at cost of extreme sensitivity to model hyper-parameters and long training time. This means that one can spend months testing different model configurations, much too long to be worth the effort. dl_deeplearningclassifier

This blog post focuses on minimizing these pains, and exploring how GraphLab Create 1.1 makes deep learning Easy.

What is Deep Learning?

Before we start, let's explore the idea of deep learning. 'Deep learning' is a phrase being thrown around everywhere in the world of machine learning. In fact, it's even been in The New York Times. It seems to be helping make tremendous breakthroughs, but what is it? It's a methodology for learning high-level concepts about data, frequently through models that have multiple layers of non-linear transformations. Let's take a moment to analyze that last sentence. 'Learning high-level concepts about data' means that deep learning models take data, for instance raw pixel values of an image, and learns abstract ideas like 'is animal' or 'is cat' about that data. OK, easy enough, but what does having 'multiple layers of non-linear transformations' mean. Conceptually, all this means is that you have a composition of simple non-linear functions, forming a complex non-linear function, which can map things as complex as raw pixel values to image category. Let's illustrate a simple example of  this:


f(x)=cos(ax)

g(x)=exp(bx)

f(g(x))=cos(aexp(bx))


Notice how the composition of functions  f(g(x))  is much more complex than either  f(x)  or  g(x) .  Furthermore, by adjusting the values of  a  and  b  you can adjust the mapping between input and output. These values, called parameters, are what is learned in a deep learning model. This same idea of composition is used many, many times within deep learning models, and can enable learning very complex relationships between input and output. This complexity is what allows deep learning models to attain such amazing results.

The most common class of methods, and what GraphLab uses, within the deep learning domain are Deep Neural Nets (DNNs). Deep Neural Networks are simply artificial neural networks with many hidden layers. To learn more about artificial neural networks and deep learning, click here.

Typically, DNN’s are used for classification of input, and frequently for images. As I mentioned before, they are very good at this. So we should simply take whatever algorithm I had for image classification before and replace it with a DNN?

Not so fast.

Before you can do this, you have to choose how many layers your network has. And how many hidden units each layer has. And how to initialize the model parameter values (also known as weights). And how much L2-regularization to apply. There’s a lot more, too.  Basically, a deep learning model is a machine with many confusing knobs (called hyper-parameters, basically parameters that are not learned by the algorithm) and dials that will not work if set randomly. 

Making Deep Learning Easy with GraphLab Create

GraphLab Create allows you to get started with neural networks without being an expert by eliminating the need to choose a good architecture and hyper-parameter starting values. Based on the input data, theneuralnet_classifier.create() function chooses an  architecture to use and sets reasonable values for hyper-parameters. Let’s check this out on MNIST, a dataset composed of handwritten digits where the task is to identify the digit:

>>> data = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/mnist/sframe/train')
>>> model = graphlab.neuralnet_classifier.create(data, target='label')

Evaluating this model on the prediction data will tell us how well the model functioned:

>>> testing_data = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/mnist/sframe/test')
>>> model.evaluate(testing_data)

{'accuracy': 0.9803000092506409, 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 65
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  974  |
 |      2       |        0        |   3   |
 |      5       |        0        |   1   |
 |      6       |        0        |   7   |
 |      8       |        0        |   6   |
 |      9       |        0        |   5   |
 |      0       |        1        |   1   |
 |      1       |        1        |  1128 |
 |      2       |        1        |   1   |
 |      6       |        1        |   3   |
 |     ...      |       ...       |  ...  |
 +--------------+-----------------+-------+
 [65 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

We got 98.1% accuracy. This is deep learning made Easy!


When Deep Learning made Easy is not easy enough, Making Deep Learning Doubly Easy


Although GraphLab Create tries to choose a good architecture and hyper-parameters, this automatic process often isn't enough. Optimal settings are often extremely problem specific, and it’s impossible to determine them without good intuition, lots of experience, and many PhD students.

Yet, when good hyper-parameter settings come together, results are very strong. What’s more, it’s not uncommon for the task you wanted to solve to be related something that has already been solved. Take, for example, the task of distinguishing cats from dogs. The famous ImageNet Challenge, for which DNN’s are the state-of-the-art, asks the trained model to categorize input into one of 1000 classes (as Jay described in a previous post). Shouldn't features that distinguish between categories like lions and wolves should also be useful for discriminating between cats and dogs?

The answer is a definitive yes. It is accomplished by simply removing the output layer of the Deep Neural Network for 1000 categories, and taking the signals that would have been propagating to the output layer and feeding them as features to any classifier for our new cats vs dogs task. The training procedure breaks down something like this: 

  • Stage 1: Train a DNN classifier on a large, general dataset. A good example is ImageNet ,with 1000 categories and 1.2 million images. GraphLab hosts a model trained on ImageNet to allow you to skip this step in your own implementation. Simply load the model withgl.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
  • Stage 2: The outputs of each layer in the DNN can be viewed as a meaningful vector representaion of each image. Extract these feature vectors from the layer prior to the output layer on each image of your task. 
  • Stage 3: Train a new classifier with those features as input for your own task.  

At first glance, this seems even more complicated than just training the deep learning model . However, Stage 1 is re-usable for many different problems, and GraphLab is hosting the model so you don't have to train it yourself.  Stage 2 is easy to do with GraphLab's API (as shown below), and Stage 3 is typically done with a simpler classifier than a deep learning model so it's easy to build yourself. In the end, this pipeline results in not needing to adjust hyper-parameters, faster training, and better performance even in cases where you don't have enough data to train a convention deep learning model. What's more, this technique is effective even if your Stage 3classification task is relatively unrelated to the task Stage 1 is trained on. 

dl_deeplearningsimpleclassifier

This idea was first explored by Donahue et al. (2013), and was used for the Dogs vs Cats competition as described for nolearn's ConvNetFeatures. In our NeuralNetworkClassifer API, we put this functionality into the.extract_features() method. Let’s explore the GraphLab Create API on the Cats vs. Dogs dataset. To get a feel for what we're trying to accomplish, a few sample images from the dataset are shown below:

cat.302cat.318
dog.6038dog.6179

 

First, lets load in the model trained on ImageNet. This corresponds to the end of Stage 1 in our pipeline:

>>> pretrained_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')

Now, let's load in the cats vs dogs images. We resize because the original ImageNet model was trained on 256 x 256 x 3 images:

>>> cats_dogs_sf = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cats_vs_dogs/cats_dogs_sf')
>>> cats_dogs_sf['image'] = graphlab.image_analysis.resize(cats_dogs_sf['image'], 256, 256, 3)

And extract features, per Stage 2 of our pipeline:

>>> cats_dogs_sf['features'] = pretrained_model.extract_features(cats_dogs_sf)
>>> cats_dogs_train, cats_dogs_test = cats_dogs_sf.random_split(0.8)

And now, let's train a simple classifier as described by Stage 3

>>> simple_classifier = graphlab.classifier.create(cats_dogs_train, features = ['features'], target = 'label')

And now, to see how our trained model did, we evaluate it:

>>> simple_classifier.evaluate(cats_dogs_test)
{'accuracy': 0.9545091779728652, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  2406 |
 |      0       |        1        |   73  |
 |      1       |        0        |  155  |
 |      1       |        1        |  2378 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns]}

We get ~96% accuracy! I don’t know about you, but that feels like a pretty good number. For comparisons sake, let’s try using just the .create() method.

>>> model = gl.neuralnet_classifier.create(cats_dogs_train, target='label', features = ['image'] )
>>> model.evaluate(cats_dogs_test)
{'accuracy': 0.6049019694328308, 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  922  |
 |      1       |        0        |  415  |
 |      0       |        1        |  1600 |
 |      1       |        1        |  2163 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns]}

Accuracy is a disappointing 59%. Clearly, combining a simple classifier with the extracted features helped tremendously. And you STILL didn’t have to tune architecture or hyper-paramters. You don’t even have to take the time to train a NeuralNet classifier, you can just repurpose one that already existed. Sounds like if using .create() was Easy, then using .extract_features() is Doubly Easy!

Making Doubly Easy also Doubly Powerful


It’s always important to make sure any machine learning technique is consistent in its usefulness, and that its success is not afluke. In order to do that, I tested it on the CIFAR-10 dataset developed by Alex Krizhevsky. The CIFAR-10 dataset has 50000 training images and 10000 prediction images divided into 10 classes. Each images is of size 32x32.  A few examples from each category are shown below:

Screen_Shot_2014-12-08_at_2.00.06_PM

Let's repeat the procedure we just went through for the Cats vs Dogs dataset:

>>>  cifar_train = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cifar_10/cifar_10_train_sframe')
>>> cifar_test = graphlab.SFrame('http://s3.amazonaws.com/GraphLab-Datasets/cifar_10/cifar_10_test_sframe')
# preprocess
>>> cifar_train['image'] = graphlab.image_analysis.resize(cifar_train['image'], 256, 256, 3)
>>> cifar_test['image'] = graphlab.image_analysis.resize(cifar_test['image'], 256, 256, 3)
# Stage 2
>>> cifar_train['features'] = pretrained_model.extract_features(cifar_train)
>>> cifar_test['features'] = pretrained_model.extract_features(cifar_test)
# Stage 3
>>>  classifier = graphlab.classifier.create(cifar_train, features=['features'], target='label')
# Evaluate
>>> classifier.evaluate(cifar_test)

And evaluate:

{'accuracy': 0.9478, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 100
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  733  |
 |      0       |        1        |   25  |
 |      0       |        2        |   76  |
 |      0       |        3        |   19  |
 |      0       |        4        |   13  |
 |      0       |        5        |   7   |
 |      0       |        6        |   8   |
 |      0       |        7        |   26  |
 |      0       |        8        |   58  |
 |      0       |        9        |   23  |
 |     ...      |       ...       |  ...  |
 +--------------+-----------------+-------+
 [100 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

We get almost 95% accuracy! In fact, the results are better than any published result and are on par with the winning results from the Kaggle competitionHuman performance is about 94%, to give some perspective. Clearly, feature extraction makes deep learning not only Doubly Easy, but also Doubly Powerful.

Deep learning Models are powerful, and are now easier to use than ever before. Download GraphLab Create , load in our ImageNet model, and tell us your deep learning success stories! 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值