如何使用Apache的Prediction IO Machine Learning Server构建推荐引擎

by Vaghawan Ojha

通过瓦哈万·欧哈(Vaghawan Ojha)

如何使用Apache的Prediction IO Machine Learning Server构建推荐引擎 (How to build a recommendation engine using Apache’s Prediction IO Machine Learning Server)

This post will guide you through installing Apache Prediction IO machine learning server. We’ll use one of its templates called Recommendation to build a working recommendation engine. The finished product will be able to recommend customized products depending upon a given user’s purchasing behavior.

这篇文章将指导您安装Apache Prediction IO机器学习服务器。 我们将使用其名为“推荐”的模板之一来构建一个有效的推荐引擎。 根据给定用户的购买行为,最终产品将能够推荐定制产品。

问题 (The Problem)

You’ve got bunch of data and you need to predict something accurately so you can help your business grow its sales, grow customers, grow profits, grow conversion, or whatever the business need is.

您拥有大量数据,并且需要准确地进行预测,以便可以帮助您的企业提高销售量,增加客户,增加利润,提高转化率或满足业务需求。

Recommendation systems are probably the first step everyone takes toward applying data science and machine learning. Recommendation engines use data as an input and run their algorithms over them. Then they output models from which we can make prediction about what a user is really going to buy, or what a user may like or dislike.

推荐系统可能是每个人走向应用数据科学和机器学习的第一步。 推荐引擎将数据用作输入,并在其上运行其算法。 然后,他们输出模型,通过这些模型,我们可以预测用户的实际购买意愿,或者用户可能喜欢或不喜欢的东西。

输入预测IO (Enter Prediction IO)

“Apache PredictionIO (incubating) is an open source Machine Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task.” — Apache Prediction IO documentation

“ Apache PredictionIO(孵化中)是一个开源的机器学习服务器,它建立在最新的开源堆栈之上,供开发人员和数据科学家为任何机器学习任务创建预测引擎。” — Apache Prediction IO文档

The very first look at the documentation makes me feel good because it’s giving me access to a powerful tech stack for solving machine learning problems. What’s more interesting is that Prediction IO gives access to many templates, which are helpful for solving the real problems.

初看文档会使我感觉很好,因为它使我能够使用功能强大的技术堆栈来解决机器学习问题。 更有趣的是,Prediction IO可以访问许多模板,这有助于解决实际问题。

The template gallery consists many templates for recommendation, classification, regression, natural language processing, and many more. It make use of technology like Apache Hadoop, Apache spark, ElasticSearch and Apache Hbase to make the machine learning server scaleable and efficient. I’m not going to talk much about the Prediction IO itself, because you can do that on your own here.

模板库包含许多用于推荐,分类,回归,自然语言处理等的模板。 它利用Apache Hadoop,Apache Spark,ElasticSearch和Apache Hbase等技术使机器学习服务器可扩展且高效。 我不会谈论Prediction IO本身,因为您可以在这里自行完成。

So back to the problem: I have a bunch of data from user purchase histories, which consists user_id, product_id and purchased_date. Using these, I need to make a customized prediction/recommendation to the user. Considering this problem, we’ll use a Recommendation Template with Prediction IO Machine Learning server. We’ll make use of Prediction IO event server as well as bulk data import.

回到问题所在:我从用户购买历史中获得了一堆数据,其中包括user_id,product_id和Purchased_date。 使用这些,我需要对用户进行定制的预测/推荐。 考虑到此问题,我们将使用带有预测IO机器学习服务器的推荐模板。 我们将使用Prediction IO事件服务器以及批量数据导入。

So let’s get ahead. (Note: This guide assume that you’re using Ubuntu system for the installation)

因此,让我们前进。 (注意:本指南假定您使用Ubuntu系统进行安装)

步骤1:下载Apache Prediction IO (Step 1: Download Apache Prediction IO)

Go to the home directory of your current user and Download The latest 0.10.0 Prediction IO apache incubator. I assume you’re in the following dir (/home/you/)

转到当前用户的主目录,然后下载最新的0.10.0 Prediction IO apache培养箱。 我假设您位于以下目录(/home/you/)

git clone git@github.com:apache/incubator-predictionio.git

Now go to the directory `incubator-predictionio` where we have cloned the Prediction IO repo. If you have cloned it in a different directory, make sure to be inside that dir in your terminal.

现在转到目录“ incubator-predictionio” ,我们在其中克隆了Prediction IO存储库。 如果已将其克隆到其他目录中,请确保将其放在终端的该目录中。

Now let’s checkout the current stable version of Prediction IO which is 0.10.0

现在,让我们签出Prediction IO的当前稳定版本0.10.0

cd incubator-predictionio # or any dir where you have cloned pio.git checkout release/0.10.0

步骤2:让我们分配预测IO (Step 2: Let’s Make A Distribution Of Prediction IO)

./make-distribution.sh

If everything went Ok, then you will get the message like this in your console:

如果一切正常,那么您将在控制台中收到以下消息:

However if you encountered something like this:

但是,如果遇到以下情况:

then you would have to remove .ivy2 dir in your home directory, by default this folder is hidden. You need to remove it completely and then run the ./make-distribution.sh again for the build to successfully generate a distribution file.

那么你就必须删除.ivy2你的home目录目录 ,默认情况下该文件夹是隐藏的。 您需要将其完全删除,然后再次运行./make-distribution.sh ,以使构建成功生成分发文件。

Personally I’ve faced this issue many times, but I’m not sure this is the valid way to get through this problem. But removing the .ivy2 folder and again running the make-distribution command works.

我个人已经多次遇到此问题,但是我不确定这是否是解决此问题的有效方法。 但是删除.ivy2文件夹,然后再次运行make-distribution命令即可。

步骤3:提取分发文件 (Step 3: Extract The Distribution File)

After the successful build, we will have a filename called PredictionIO-0.10.0-incubating.tar.gz inside the directory where we built our Prediction IO. Now let’s extract it into a directory called pio.

成功构建之后,我们将在构建Prediction IO的目录中拥有一个名为PredictionIO-0.10.0-incubating.tar.gz的文件名。 现在,将其提取到名为pio的目录中。

mkdir ~/piotar zxvf PredictionIO-0.10.0-incubating.tar.gz -C ~/pio

Make sure the tar.gz filename match the distribution file that you have inside the original predictionIo directory. If you forgot to check out the 0.10.0 version of Prediction IO, you’re sure to get a different file name, because by default the version would be the latest one.

确保tar.gz文件名与原始预测目录中的分发文件匹配。 如果您忘记签出Prediction IO的0.10.0版本,那么您肯定会获得不同的文件名,因为默认情况下该版本是最新的。

步骤4:准备下载依赖项 (Step 4: Prepare For Downloading Dependencies)

cd ~/pio
#Let’s make a vendors folder inside ~/pio/PredictionIO-0.10.0-incubating where we will save hadoop, elasticsearch and hbase.
mkdir ~/pio/PredictionIO-0.10.0-incubating/vendors

步骤5:下载并设置Spark (Step 5: Download and Setup Spark)

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz

If your current directory is ~/pio the command will download the spark inside pio dir. Now let’s extract it. Depending upon where you downloaded it, you might want to change the below command.

如果您当前的目录是~/pio该命令将在pio dir中下载spark。 现在让我们提取它。 根据下载位置,可能需要更改以下命令。

tar zxvfC spark-1.5.1-bin-hadoop2.6.tgz PredictionIO-0.10.0-incubating/vendors
# This will extract the spark setup that we downloaded and put it inside the vendors folder of our fresh pio installation.

Make sure you had done mkdir PredictionIO-0.10.0-incubating/vendors earlier.

确保您之前已完成mkdir PredictionIO-0.10.0-incubating/vendors

步骤6:下载并设置ElasticSearch (Step 6: Download & Setup ElasticSearch)

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz
#Let’s extract elastic search inside vendors folder.
tar zxvfC elasticsearch-1.4.4.tar.gz PredictionIO-0.10.0-incubating/vendors

步骤7:下载并设置Hbase (Step 7: Download and Setup Hbase)

wget http://archive.apache.org/dist/hbase/hbase-1.0.0/hbase-1.0.0-bin.tar.gz
#Let’s extract it.
tar zxvfC hbase-1.0.0-bin.tar.gz PredictionIO-0.10.0-incubating/vendors

Now let’s edit the hbase-site.xml to point the hbase configuration to the right dir. Considering you’re inside ~/pio dir, you could hit this command and edit the hbase conf.

现在,让我们编辑hbase-site.xml ,将hbase配置指向正确的目录。 考虑到您位于~/pio目录中,可以单击此命令并编辑hbase conf。

nano PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-site.xml

Replace the configuration block with the following configuration.

用以下配置替换配置块。

<configuration>  <property>    <name>hbase.rootdir</name>    <value>file:///home/you/pio/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/data</value>  </property>  <property>    <name>hbase.zookeeper.property.dataDir</name>    <value>/home/you/pio/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/zookeeper</value>  </property></configuration>

Here “you” signifies to your user dir, for example if you’re doing all this as a user “tom” then it would be something like file::///home/tom/…

这里的“ 您”表示您的用户目录,例如,如果您以用户“ tom”的身份进行所有操作,则该文件将类似于file :: /// home / tom /…。

Make sure the right files are there.

确保正确的文件在那里。

Now let’s set up JAVA_HOME in hbase-env.sh .

现在让我们在hbase-env.sh中设置JAVA_HOME。

nano PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-env.sh

If you’re unsure about which version of JDK you’re currently using, follow these step and make necessary changes if required.

如果不确定当前使用的是哪个版本的JDK,请按照以下步骤操作,并根据需要进行必要的更改。

We need Java SE Development Kit 7 or greater for Prediction IO to work. Now let’s make sure we’re using the right version by running:

我们需要Java SE Development Kit 7或更高版本才能运行Prediction IO。 现在,通过运行以下命令确保使用的版本正确:

sudo update-alternatives — config java

By default I’m using:

默认情况下,我使用:

java -version
openjdk version “1.8.0_121”
OpenJDK Runtime Environment (build 1.8.0_121–8u121-b13–0ubuntu1.16.04.2-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

If you’re using below 1.7, then you should change the java config to use a version of java that is equal to 1.7 or greater. You can change that with the update-alternatives command as given above. In my case the command sudo update-alternatives -config java outputs something like this:

如果您使用的是1.7以下版本,则应更改java配置,以使用等于或大于1.7的java版本。 您可以使用上述给定的update-alternatives命令更改它。 在我的情况下,命令sudo update-alternatives -config java输出如下内容:

If you have any trouble setting this up, you can follow this link.

如果您在设置时遇到任何麻烦,可以点击此链接

Now let’s export the JAVA_HOME path in the .bashrc file inside /home/you/pio.

现在,让我们在/home/you/pio.内的.bashrc文件中导出JAVA_HOME路径/home/you/pio.

Considering you’re on ~/pio dir, you could do this: nano .bashrc

考虑到您在~/pio目录下,可以执行以下操作: nano .bashrc

Don’t forget to do source .bashrc after you set up the java home in the .bashrc.

source .bashrc设置Java主页之后,不要忘记执行source .bashrc .bashrc

步骤8:配置预测IO环境 (Step 8: Configure the Prediction IO Environment)

Now let’s configure pio.env.sh to give a final touch to our Prediction IO Machine learning server installation.

现在,让我们配置pio.env.sh,以最终了解我们的Prediction IO Machine学习服务器安装。

nano PredictionIO-0.10.0-incubating/conf/pio-env.sh

We’re not using ProsgesSQl or MySql for our event server, So let’s comment out that section and have a pio-env.sh something like this:

我们没有为事件服务器使用ProsgesSQl或MySql,所以让我们注释掉该部分,并创建一个pio-env.sh像这样:

#!/usr/bin/env bash## Copy this file as pio-env.sh and edit it for your site's configuration.## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#
# PredictionIO Main Configuration## This section controls core behavior of PredictionIO. It is very likely that# you need to change these to fit your site.
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.SPARK_HOME=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6
POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-9.4-1204.jdbc41.jarMYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.37.jar
# ES_CONF_DIR: You must configure this if you have advanced configuration for#              your Elasticsearch setup. ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.4.4/conf
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO# with Hadoop 2. HADOOP_CONF_DIR=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6/conf
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO# with HBase on a remote cluster. HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
# Filesystem paths where PredictionIO uses as block storage.PIO_FS_BASEDIR=$HOME/.pio_storePIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/enginesPIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# PredictionIO Storage Configuration## This section controls programs that make use of PredictionIO's built-in# storage facilities. Default values are shown below.## For more information on storage configuration please refer to# http://predictionio.incubator.apache.org/system/anotherdatastore/
# Storage Repositories
# Default is to use PostgreSQLPIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_metaPIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_eventPIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_modelPIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
# Storage Data Sources
# PostgreSQL Default Settings# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly# PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc# PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio# PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio# PIO_STORAGE_SOURCES_PGSQL_PASSWORD=root
# MySQL Example# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio# PIO_STORAGE_SOURCES_MYSQL_USERNAME=root# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=root
# Elasticsearch Example PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.4.4
# ocal File System ExamplePIO_STORAGE_SOURCES_LOCALFS_TYPE=localfsPIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
# HBase ExamplePIO_STORAGE_SOURCES_HBASE_TYPE=hbasePIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.0.0

步骤9:在ElasticSearch配置中配置集群名称 (Step 9: Configure cluster name in ElasticSearch config)

Since this line PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster points to our cluster name in the ElasticSearch configuration, let’s replace a default cluster name in ElasticSearch configuration.

由于此行PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstcluster指向我们在ElasticSearch配置中的集群名称,因此让我们替换ElasticSearch配置中的默认集群名称。

nano PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.4.4/config/elasticsearch.yml

步骤10:导出预测IO路径 (Step 10: Export The Prediction IO Path)

Let’s now export the Prediction IO path so we could freely use the pio command without pointing to it’s bin every time. Run the following command in your terminal:

现在让我们导出Prediction IO路径,以便我们可以自由使用pio命令,而不必每次都指向它的bin。 在终端中运行以下命令:

PATH=$PATH:/home/you/pio/PredictionIO-0.10.0-incubating/bin; export PATH

PATH=$PATH:/home/you/pio/PredictionIO-0.10.0-incubating/bin; export PATH

步骤#11:授予预测IO 安装 权限 (Step #11: Give Permission To Prediction IO Installation)

sudo chmod -R 775 ~/pio

This is vital because if we didn’t give permission to the pio folder, the Prediction IO process won’t be able to write log files.

这很重要,因为如果我们不授予pio文件夹许可,则Prediction IO进程将无法写入日志文件。

步骤#12:启动预测IO服务器 (Step #12: Start Prediction IO Server)

Now we’re ready to go, let’s start our Prediction IO server. Before running this command make sure you exported the pio path described above.

现在我们可以开始了,让我们启动Prediction IO服务器。 在运行此命令之前,请确保已导出上述pio路径。

pio-start-all
#if you forgot to export the pio path, it won't work and you manually have to point the pio bin path.

If everything is Ok to this point, you would see the output something like this.

如果到目前为止一切正常,您将看到类似以下的输出。

Note: If you forget to give permission then, there will be issues writing logs and if your JAVA_HOME path is incorrect HBASE wouldn’t start properly and it would give you the error.
注意:如果您忘记授予权限,那么在编写日志时会出现问题,并且如果您的JAVA_HOME路径不正确,则HBASE无法正确启动,并且会给您错误。

步骤#13:验证过程 (Step #13: Verify The Process)

Now let’s verify our installation with pio status, if everything is Ok, you will get an output like this:

现在让我们以pio status验证安装,如果一切正常,您将获得如下输出:

If you encounter error in Hbase or any other backend storage, make sure everything was started properly.

如果您在Hbase或任何其他后端存储中遇到错误,请确保一切均已正确启动。

Our Prediction IO Server is ready to implement the template now.

我们的Prediction IO Server现在准备实施模板。

实施推荐引擎 (Implementing the Recommendation Engine)

A recommendation engine template is a Prediction IO engine template that uses collaborative filtering to make personalized recommendation to the user. It uses can be in E-commerce site, news site, or any application that collects user histories of event to give a personalized experiences to the user.

推荐引擎模板是Prediction IO引擎模板,它使用协作过滤向用户做出个性化推荐。 它可以在电子商务站点,新闻站点或任何收集事件的用户历史记录以向用户提供个性化体验的应用程序中使用。

We’ll implement this template in Prediction IO with few eCommerce user data, just to do an sample experiment with Prediction IO machine learning server.

我们将使用很少的电子商务用户数据在Prediction IO中实现此模板,仅用于Prediction IO机器学习服务器的示例实验。

Now let’s back to our home dir cd ~

现在让我们回到主目录cd ~

步骤14: 下载推荐模板 (Step 14: Download the Recommendation Template)

pio template get apache/incubator-predictionio-template-recommender MyRecommendation

It will ask for company name and author name, input subsequently, now we have a MyRecommendation Template inside our home dir. Just a reminder: you can put the template anywhere you want.

它将询问公司名称和作者名称,然后输入,现在我们的主目录中有一个MyRecommendation模板。 提醒一下:您可以将模板放置在所需的任何位置。

15. 创建我们的第一个预测IO应用程序 (15. Create Our First Prediction IO App)

Now let’s go inside the MyRecommendation dir cd MyRecommendation

现在让我们进入MyRecommendation目录cd MyRecommendation

After you’re inside the template dir, let’s create our first Prediction IO app called ourrecommendation.

进入模板目录后,让我们创建第一个Prediction IO应用程序,称为ourrecommendation

You will get output like this. Please remember that you can give any name to your app, but for this example I’ll be using the app name ourrecommendation.

您将得到这样的输出。 请记住,您可以给您的应用程序起任何名字,但是在本例中,我将使用应用程序名称ourrecommendation

pio app new ourrecommendation

This command will output something like this:

该命令将输出如下内容:

Let’s verify that our new app is there with this command:

让我们使用以下命令验证我们的新应用是否存在:

pio app list

Now our app should be listed in the list.

现在,我们的应用程序应在列表中列出。

步骤16:导入一些样本数据 (Step 16: Import Some Sample Data)

Let’s download the sample-data from gist, and put that inside importdata folder inside MyRecommendation folder.

让我们从gist下载示例数据 ,然后将其放入MyRecommendation文件夹中的importdata文件夹中。

mkdir importdata

Copy the sample-data.json file that you just created inside the importdata folder.

复制您刚刚在importdata文件夹中创建的sample-data.json文件。

Finally let’s import the data inside our ourrecommendation app. Considering you’re inside the MyRecommendation dir you can do this to batch import the events.

最后,让我们将数据导入我们的推荐应用程序中。 考虑到您位于MyRecommendation dir ,可以执行此操作以批量导入事件。

pio import — appid 1 — input importdata/data-sample.json

(Note: make sure the appid of ourrecommendation is same as of your appid that you just provided)

(注意:请确保我们推荐的appid与您刚提供的appid相同)

步骤17:建立应用程式 (Step 17: Build The App)

Before building the app, let’s edit engine.json file inside the MyRecommendation directory to replicate our app name inside it. It should look something like this:

在构建应用程序之前,让我们在MyRecommendation目录中编辑engine.json文件,以在其中复制我们的应用程序名称。 它看起来应该像这样:

Note: Don’t copy this, just change the “appName” in your engine.json.

注意:请勿复制此文件,只需在engine.json中更改“ appName”即可。

{  "id": "default",  "description": "Default settings",  "engineFactory": "orgname.RecommendationEngine",  "datasource": {    "params" : {      "appName": "ourrecommendation"    }  },  "algorithms": [    {      "name": "als",      "params": {        "rank": 10,        "numIterations": 5,        "lambda": 0.01,        "seed": 3      }    }  ]}

Note: the “engineFactory” will be automatically generated when you pull the template in our step 14, so you don’t have to change that. In my case, it’s my orgname, which I put in the terminal prompt during installation of the template. In you engine.json you just need to modify the appName, please don’t change anything else in there.

注意:在我们的第14步中提取模板时,“ engineFactory”将自动​​生成,因此您无需更改它。 就我而言,这是我的组织名称,在模板安装过程中将其放在终端提示中。 在engine.json中,您只需要修改appName,请不要在其中进行任何更改。

In the same dir where our MyRecommendation engine template lies, let’s run this pio command to build our app.

在MyRecommendation引擎模板所在的目录中,让我们运行此pio命令来构建我们的应用程序。

pio build

(Note: if you wanna see all the messages during the building process, you can run this pio build — verbose)

(注意:如果您想在构建过程中看到所有消息,则可以运行此pio build — verbose )

It can take sometimes to build our app, since this is the first time. From next time it takes less time. You should get an output like this:

由于这是第一次,有时可能需要构建我们的应用程序。 从下一次开始,将花费更少的时间。 您应该得到如下输出:

Our engine is now ready to train our data.

现在,我们的引擎已准备好训练我们的数据。

步骤18: 训练数据集 (Step 18: Train The dataset)

pio train

If you get an error like the one below in the middle of the training, then you may have to change number of iterations inside your engine.json and rebuild the app.

如果在培训过程中遇到类似以下的错误,则可能必须更改engine.json中的迭代次数并重新构建应用程序。

Let’s change the numItirations in engine.json which is by default 20 to 5:

让我们将engine.json中的numItirations更改为默认值20到5:

“numIterations”: 5,

Now let’s build the app with pio build, again do pio train. The training should be completed successfully. After finishing the training you will get the message like this:

现在,让我们使用pio build构建应用程序,再次执行pio train 。 培训应成功完成。 完成培训后,您将收到以下消息:

Please note that this training works just for small data, if you however want to try with large data set then we would have to set up an standalone spark worker to accomplish the training. (I will write about this in a future post.)

请注意,此培训仅适用于小数据,但是,如果您要尝试使用大数据集,则我们将必须设置一个独立的Spark工作者来完成培训。 (我将在以后的文章中对此进行介绍。)

步骤19: 部署并提供预测 (Step 19: Deploy and Serve the prediction)

pio deploy#by default it will take 8000 port.

We will now have our prediction io server running.

现在,我们将运行预测io服务器。

Note: to keep it simple, I’m not discussing about event server in this post, since it may get even longer, thus we’re focusing on simple use case of Prediction IO.

注意:为简单起见,本文中不再讨论事件服务器,因为它可能会更长,因此我们将重点放在Prediction IO的简单用例上。

Now let’s get the prediction using curl.

现在,让我们使用curl进行预测。

Open up a new terminal and hit:

打开一个新终端,然后单击:

curl -H “Content-Type: application/json” \-d ‘{ “user”: “user1”, “num”: 4 }’ http://localhost:8000/queries.json

In the above query, the user signifies to the user_id in our event data, and the num means, how many recommendation we want to get.

在上面的查询中,用户在事件数据中表示user_id,而num表示我们要获得多少推荐。

Now you will get the result like this:

现在您将获得如下结果:

{"itemScores":[{"item":"product5","score":3.9993937903501093},{"item":"product101","score":3.9989989282500904},{"item":"product30","score":3.994934059438341},{"item":"product98","score":3.1035806376677866}]}

That’s it! Great Job. We’re done. But wait, what’s next?

而已! 很好。 大功告成 但是,等等, 下一步是什么?

Important Notes:

重要笔记:

  • The template we used uses ALS algorithm with explicit feedback, however you can easily switch to implicit depending upon your need.

    我们使用的模板使用具有显式反馈的ALS算法 ,但是您可以根据需要轻松切换为隐式。

  • If you’re curious about Prediction IO and want to learn more you can do that on the Prediction IO official site.

    如果您对Prediction IO感到好奇并想了解更多信息,可以在Prediction IO官方网站上进行

  • If your Java version is not suitable for Prediction IO specification, then you are sure to run into problems. So make sure you configure this first.

    如果您的Java版本不适合Prediction IO规范,那么您肯定会遇到问题。 因此,请确保您首先配置它。
  • Don’t run any of the commands described above with sudo except to give permission. Otherwise you will run into problems.

    除非获得许可,否则不要使用sudo运行上述任何命令。 否则,您将遇到问题。

  • Make sure your java path is correct, and make sure to export the Prediction IO path. You might want to add the Prediction IO path to your .bashrc or profile as well depending upon your need.

    确保您的Java路径正确,并确保导出Prediction IO路径。 您可能还需要根据需要将Prediction IO路径添加到.bashrc或配置文件中。
2017/07/14更新:使用Spark训练真实数据集 (Update 2017/07/14: Using Spark To Train Real Data Sets)

We have the spark installed inside our vendors folders, with our current installation, our spark bin in the following dir.

我们已经将spark安装在我们的vendor文件夹中,并且当前安装是在以下目录中的spark bin。

~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbin

From there we have to setup a spark primary and replica to execute our model training to accomplish it faster. If your training seems to stuck we can use the spark options to accomplish the training tasks.

从那里,我们必须设置一个spark主对象和一个副本来执行我们的模型训练,以更快地完成它。 如果您的培训似乎停滞不前,我们可以使用spark选项来完成培训任务。

启动Spark主数据库 (Start the Spark primary)
~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbin/start-master.sh

This will start the spark primary. Now let’s browse the spark primary’s web UI by going into http://localhost:8080/ in the browser.

这将启动主火花。 现在,通过在浏览器中进入http:// localhost:8080 /来浏览spark primary的Web UI。

Now let’s copy the primary-url to start the replica worker. In our case the primary spark URL is something like this:

现在,让我们复制主URL以启动副本工作器。 在我们的例子中,主要的Spark URL是这样的:

spark://your-machine:7077 (your machine signifies to your machine name)

spark://您的机器:7077(您的机器表示您的机器名称)

~/pio/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6/sbin/start-slave.sh spark://your-machine:7077

The worker will start. Refresh the web ui you will see the registered worker this time. Now let’s run the training again.

工人将开始。 刷新Web用户界面,您这次将看到注册的工作者。 现在,让我们再次运行培训。

pio train -- --master spark://localhost:7077 --driver-memory 4G --executor-memory 6G

Great!

大!

Special Thanks: Pat Ferrel From Action ML & Marius Rabenarivo

特别鸣谢: Action ML和Marius Rabenarivo的Pat Ferrel

翻译自: https://www.freecodecamp.org/news/building-an-recommendation-engine-with-apache-prediction-io-ml-server-aed0319e0d8/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值