Spark-2.4 Deep Learning Pipelines (Keras)Image Claasifer

spark 同时被 3 个专栏收录
43 篇文章 1 订阅
13 篇文章 0 订阅
10 篇文章 0 订阅

(原文链接)-这是Spark2018 Submit 的一个演讲Demo, 针对Keras图片分类和使用Spark做分类的方法做了讲解,供学习使用。

keras_dlp_image_classifier(Python)

 Import Notebook

Part 1: Exploring and Classifying Images with Pretrained Models

We will use Keras with TensorFlow as the backend, and download VGG16 from Keras.

VGG16

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions, VGG16
import numpy as np
import os

vgg16Model = VGG16(weights='imagenet')

Function to predict category

def predict_images(images, m):
  for i in images:
    print ('processing image:', i)
    img = image.load_img(i, target_size=(224, 224))
    #convert to numpy array for Keras image formate processing
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = m.predict(x)
    # decode the results into a list of tuples (class, description, probability
    print('Predicted:', decode_predictions(preds, top=3)[0], '\n')

Classify African and Indian Elephants

Load and predict images using VGG16 pretrained model

African Elephants    

elephants_img_paths = ["/dbfs/brooke/spark-summit-sf/elephants/" + path for path in os.listdir("/dbfs/brooke/spark-summit-sf/elephants/")]
predict_images(elephants_img_paths, vgg16Model)

processing image: /dbfs/brooke/spark-summit-sf/elephants/african_elephant_1.jpg Downloading data from https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json

...

processing image: /dbfs/brooke/spark-summit-sf/elephants/indian_elephant_1.jpeg

Predicted: [('n01871265', 'tusker', 0.63030255), ('n02504013', 'Indian_elephant', 0.3172723), ('n02504458', 'African_elephant', 0.052417696)]

processing image: /dbfs/brooke/spark-summit-sf/elephants/indian_elephant_3.jpeg

Predicted: [('n03980874', 'poncho', 0.27748922), ('n02504013', 'Indian_elephant', 0.15591854), ('n03884397', 'panpipe', 0.118131705)]

hotdog_img_paths = ["/dbfs/brooke/spark-summit-sf/hotdog/" + path for path in os.listdir("/dbfs/brooke/spark-summit-sf/hotdog/")]
predict_images(hotdog_img_paths, vgg16Model)

processing image: /dbfs/brooke/spark-summit-sf/hotdog/hamburger_1.jpeg

Predicted: [('n07697313', 'cheeseburger', 0.98272187), ('n07693725', 'bagel', 0.0061851414), ('n07613480', 'trifle', 0.005066536)]

processing image: /dbfs/brooke/spark-summit-sf/hotdog/hotdog_1.jpeg

Predicted: [('n07697537', 'hotdog', 0.9997451), ('n07697313', 'cheeseburger', 8.424303e-05), ('n07615774', 'ice_lolly', 4.0920193e-05)] ...

DeepImagePredictor

Let's make these predictions in parallel on our Spark cluster!

from pyspark.ml.image import ImageSchema
from sparkdl.image import imageIO
from sparkdl import DeepImagePredictor

nerds_df = ImageSchema.readImages("brooke/spark-summit-sf/nerds/")

predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="VGG16", decodePredictions=True, topK=5)
predictions_df = predictor.transform(nerds_df).cache()
predictions_df.count()

INFO:tensorflow:Froze 32 variables. Converted 32 variables to const ops. INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Out[5]: 8

display(predictions_df)
imagepredicted_labels
[{"class":"n03045698","description":"cloak","probability":0.39211166},{"class":"n03787032","description":"mortarboard","probability":0.091029376},{"class":"n03404251","description":"fur_coat","probability":0.08471853},{"class":"n04371774","description":"swing","probability":0.056981083},{"class":"n04370456","description":"sweatshirt","probability":0.028172707}]
[{"class":"n04350905","description":"suit","probability":0.41138414},{"class":"n02916936","description":"bulletproof_vest","probability":0.10152984},{"class":"n03763968","description":"military_uniform","probability":0.09318812},{"class":"n04591157","description":"Windsor_tie","probability":0.07702819},{"class":"n02669723","description":"academic_gown","probability":0.03608404}]
[{"class":"n03630383","description":"lab_coat","probability":0.23492548},{"class":"n04591157","description":"Windsor_tie","probability":0.09487544},{"class":"n03838899","description":"oboe","probability":0.049194943},{"class":"n04350905","description":"suit","probability":0.043708242},{"class":"n03832673","description":"notebook","probability":0.041520175}]
 [{"class":"n04350905","description":"suit","probability":0.25813994},{"class":"n01440764","description":"tench","probability":0.03799466},{"class":"n03838899","description":"oboe","probability":0.03496751},{"class":"n02883205","description":"bow_tie","probability":0.033893984},{"class":"n03394916","description":"French_horn","probability":0.03332546}]
[{"class":"n03595614","description":"jersey","probability":0.3530513},{"class":"n04370456","description":"sweatshirt","probability":0.13232166},{"class":"n03942813","description":"ping-pong_ball","probability":0.097091846},{"class":"n03141823","description":"crutch","probability":0.018438742},{"class":"n04270147","description":"spatula","probability":0.017245641}]
[{"class":"n03000247","description":"chain_mail","probability":0.14295407},{"class":"n02672831","description":"accordion","probability":0.10376813},{"class":"n02787622","description":"banjo","probability":0.069579415},{"class":"n02804610","description":"bassoon","probability":0.061210092},{"class":"n03838899","description":"oboe","probability":0.058611386}]
[{"class":"n03630383","description":"lab_coat","probability":0.6628539},{"class":"n04317175","description":"stethoscope","probability":0.12459004},{"class":"n04370456","description":"sweatshirt","probability":0.038363792},{"class":"n04039381","description":"racket","probability":0.0132558},{"class":"n03595614","description":"jersey","probability":0.008680802}]
[{"class":"n04350905","description":"suit","probability":0.17892571},{"class":"n02883205","description":"bow_tie","probability":0.0947369},{"class":"n04591157","description":"Windsor_tie","probability":0.08621924},{"class":"n04162706","description":"seat_belt","probability":0.07562429},{"class":"n03630383","description":"lab_coat","probability":0.06052583}]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Let's change the model

inception = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3", decodePredictions=True, topK=5)
inception_df = inception.transform(nerds_df).cache()
inception_df.count()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels.h5

0us/step INFO:tensorflow:Froze 378 variables. Converted 378 variables to const ops. INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Out[7]: 8

display(inception_df)
imagepredicted_labels
[{"class":"n04350905","description":"suit","probability":0.14165702},{"class":"n04479046","description":"trench_coat","probability":0.11945703},{"class":"n03404251","description":"fur_coat","probability":0.027757034},{"class":"n04370456","description":"sweatshirt","probability":0.024130477},{"class":"n02837789","description":"bikini","probability":0.021668304}]
[{"class":"n02916936","description":"bulletproof_vest","probability":0.81811774},{"class":"n03787032","description":"mortarboard","probability":0.036147255},{"class":"n03763968","description":"military_uniform","probability":0.02605114},{"class":"n02669723","description":"academic_gown","probability":0.01372046},{"class":"n04350905","description":"suit","probability":0.008794671}]
[{"class":"n04479046","description":"trench_coat","probability":0.054543473},{"class":"n03838899","description":"oboe","probability":0.042113375},{"class":"n03630383","description":"lab_coat","probability":0.0317195},{"class":"n02787622","description":"banjo","probability":0.029045274},{"class":"n02804610","description":"bassoon","probability":0.026370155}]
[{"class":"n04350905","description":"suit","probability":0.7987358},{"class":"n02883205","description":"bow_tie","probability":0.053425536},{"class":"n04591157","description":"Windsor_tie","probability":0.011151478},{"class":"n02992529","description":"cellular_telephone","probability":0.0053684525},{"class":"n03763968","description":"military_uniform","probability":0.0039382246}]
[{"class":"n03595614","description":"jersey","probability":0.15574361},{"class":"n03942813","description":"ping-pong_ball","probability":0.05970348},{"class":"n04370456","description":"sweatshirt","probability":0.048369024},{"class":"n02804610","description":"bassoon","probability":0.034532476},{"class":"n03838899","description":"oboe","probability":0.03400313}]
[{"class":"n03763968","description":"military_uniform","probability":0.117212564},{"class":"n04350905","description":"suit","probability":0.035018962},{"class":"n02787622","description":"banjo","probability":0.033046678},{"class":"n04584207","description":"wig","probability":0.032433487},{"class":"n04317175","description":"stethoscope","probability":0.028688557}]
[{"class":"n03630383","description":"lab_coat","probability":0.36856785},{"class":"n04317175","description":"stethoscope","probability":0.037452906},{"class":"n03832673","description":"notebook","probability":0.03503557},{"class":"n04350905","description":"suit","probability":0.028838113},{"class":"n03787032","description":"mortarboard","probability":0.020943912}]
[{"class":"n03763968","description":"military_uniform","probability":0.9151895},{"class":"n03787032","description":"mortarboard","probability":0.012946242},{"class":"n04350905","description":"suit","probability":0.007009363},{"class":"n02669723","description":"academic_gown","probability":0.0068341326},{"class":"n02883205","description":"bow_tie","probability":0.0064667594}]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Part 2: Transfer Learning with Deep Learning Pipelines (DLP)

Deep Learning Pipelines provides utilities to perform transfer learning on images, which is one of the fastest (code and run-time-wise) ways to start using deep learning. Using Deep Learning Pipelines, it can be done in just several lines of code.

The idea behind transfer learning is to take knowledge from one model doing some task, and transfer it to build another model doing a similar task.

from pyspark.ml.image import ImageSchema
from pyspark.sql.functions import lit
from sparkdl.image import imageIO

img_dir = 'dbfs:/brooke/spark-summit-sf'
cats_df = ImageSchema.readImages(img_dir + "/cats").withColumn("label", lit(1))
dogs_df = ImageSchema.readImages(img_dir + "/dogs").withColumn("label", lit(0))

cats_train, cats_test = cats_df.randomSplit([.8, .2], seed=42)
dogs_train, dogs_test = dogs_df.randomSplit([.8, .2], seed=42)

train_df = cats_train.unionAll(dogs_train).cache()
test_df = cats_test.unionAll(dogs_test).cache()
display(train_df.select("image", "label"))
imagelabel
1
1
1
1
1
0
0
0
0
0

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

Build the MLlib Pipeline

Use DeepImageFeaturizer and LogisticRegression

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
from sparkdl import DeepImageFeaturizer 

featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3")
lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
p = Pipeline(stages=[featurizer, lr])

p_model = p.fit(train_df)

Evaluate the Accuracy

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

pred_df = p_model.transform(test_df).cache()
evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
print("Test set accuracy = " + str(evaluator.evaluate(pred_df.select("prediction", "label"))*100) + "%")

Test set accuracy = 100.0%

display(pred_df.select("image", "label", "probability"))
imagelabelprobability
1[1,2,[],[0.07983768538504338,0.9201623146149567]]
1[1,2,[],[0.0735124824751803,0.9264875175248197]]
1[1,2,[],[0.0688419453818859,0.9311580546181142]]
0[1,2,[],[0.9475188514834973,0.0524811485165027]]
0[1,2,[],[0.9026450467442289,0.0973549532557711]]
0[1,2,[],[0.7731177886783923,0.2268822113216076]]

 

 Show image preview 

Detected data types for which enhanced rendering is supported. For details, see the Databricks Guide.

 

 

 

  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 扫一扫,分享海报

参与评论 您还未登录,请先 登录 后发表或查看评论
©️2022 CSDN 皮肤主题:编程工作室 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值