业务口堆叠和堆叠子卡_通过堆叠最酷的库来构建和部署简单的ml工具

业务口堆叠和堆叠子卡

The online courses are very well built to learn the concepts of Data Science and Machine Learning. But in the end, one always wonders if the role of a Data Analyst or Data Scientist is just to answer problems by coding things on his side 🤷‍♂️.

在线课程非常适合学习数据科学机器学习的概念。 但是最后,人们总是想知道数据分析师或数据科学家的作用仅仅是通过在他的一边编码事物来回答问题。

Let’s imagine the following discussion:

让我们想象以下讨论:

Ben: Hey, with the marketing team we’d like to know when is the best time to invest in advertising on product A next month. Is it possible to predict future Google search trends?

Ben嘿,我们希望与营销团队一起了解下个月何时是在产品A上投放广告的最佳时机。 是否可以预测未来的Google搜索趋势?

You: Hmmm yeah I think I can look on Google Trends to see how trends evolve over time. And we should be able to apply some forecasting methods to get an idea of the searches next month.

嗯,是的,我想我可以看看Google趋势,看看趋势如何随着时间演变。 而且,我们应该能够运用一些预测方法来了解下个月的搜索情况。

Ben: That would be perfect!

那太完美了!

You: Okay, I’m going to export the Google Trends data from some keywords. Then I’m going to code a small model and send you the graph of the predictions.

好的,我将从一些关键字导出Google趋势数据。 然后,我将编写一个小模型,并向您发送预测图。

Ben (the next day): Thanks for the report, we changed our mind a bit, can you do the same for the B product?

Ben (第二天):感谢您的报告,我们改变了主意,您是否可以对B产品做同样的事情?

You: Yeah, sure.

是的,当然。

Ben (2 hours later): And finally, we would like to see the forecasts for the whole quarter to know if it is not more judicious to wait a little bit.

Ben (2小时后):最后,我们想查看整个季度的预测,以了解稍等一下是否更明智。

You: Mmmh yes…

嗯...

Ben (2 hours later): And is it possible to compare C and D over the next 6 months?

Ben (2小时后):是否可以在接下来的6个月中比较C和D?

You: 😓

:😓

You know what I mean…

你知道我的意思…

Allowing other people to use, interact and modify a model without getting into the code makes things much more interesting for everyone.

允许其他人使用,交互和修改模型而不用进入代码,这对于每个人来说都变得更加有趣。

In this tutorial, we will see the different steps of building this trend prediction application with very simple tools. In the end it will look like this:

在本教程中,我们将看到使用非常简单的工具构建此趋势预测应用程序的不同步骤。 最后,它看起来像这样

Let’s see what it will look like 🙄
让我们看看它是什么样子

All the code and the GitHub repo is included in the tutorial.

所有代码和GitHub存储库都包含在本教程中。

On more complex problems it will of course be more difficult, but the mindset is the same: bring information through data directly where it is needed.

对于更复杂的问题,当然会更加困难,但是思路是一样的:直接在需要的地方通过数据带来信息

步骤#0:环境设置 (Step #0: Environment setting)

The first thing to do before going headlong is to prepare your work environment.

奋斗前要做的第一件事是准备工作环境

虚拟环境 (Virtual Environment)

Here (as in any project) we will be working with several packages. In order to have total control over the tools you use, it is always recommended to work in a virtual environment.

在这里(就像在任何项目中一样),我们将使用几个。 为了完全控制您使用的工具,始终建议在虚拟环境中工作

We will use Anaconda, a Python distribution that offers a package management tool called Conda. It will allow us to easily update and install the libraries we need for our developments.

我们将使用Anaconda ,这是一个Python发行版,提供了名为Conda的软件包管理工具。 这将使我们能够轻松更新和安装开发所需的库。

Download the latest version of Anaconda, then run the program “Anaconda Prompt”. We could also use the GUI, but we’re not going to spend too much time on this part.

下载最新版本的Anaconda ,然后运行程序“ Anaconda Prompt”。 我们也可以使用GUI,但是在这部分上我们不会花费太多时间。

From the Anaconda command prompt, configure a new environment, activate it and install the basic packages.

在Anaconda命令提示符下,配置一个新环境,将其激活并安装基本软件包。

conda create -n trends-predictor python
conda activate trends-predictor
conda install pandas numpy matplotlib

We will install the other packages as we go along.

我们将继续安装其他软件包。

To edit your code easily, choose your preferred IDE and configure it if needed with your virtual environment. To make it simple, I will use Spyder which is installed with anaconda. To launch it, type spyder in your command prompt.

要轻松地编辑代码,请选择首选的IDE并根据需要在虚拟环境中对其进行配置。 为简单起见,我将使用随Python安装的Spyder。 要启动它,请在命令提示符下键入spyder

Git / GitHub (Git/GitHub)

We’re starting to look good! One last thing, we initialize our git repository and link it to GitHub. I pass the details here, there are plenty of online tutorial.

我们开始看起来不错! 最后一件事,我们初始化git存储库并将其链接到GitHub 。 我在这里传递了详细信息,这里有很多在线教程。

Why is it used here? Spoiler: At the end we will deploy our application with Heroku, and it’s very simple to do it from a GitHub repo.

为什么在这里使用它? Spoiler :最后,我们将使用Heroku部署我们的应用程序,并且从GitHub存储库中进行操作非常简单。

From git bash, type:

git bash中,输入:

echo "# trends-predictor" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin <https://github.com/elliottrabac/trend-predictor.git>
git push -u origin master

Okay, now that everything is ready, it’s time to code. We will do this in several steps

好的,既然一切都准备就绪,就该编写代码了。 我们将分几个步骤进行操作

  1. Access data automatically

    自动访问数据

  2. Make predictions in the future

    做出未来的预测

  3. Make your tool accessible to everyone

    使所有人都能使用您的工具

步骤1:访问数据🧲(Step #1: Access the data 🧲)

There are plenty of ways to access data, and here we could download the files in csv, and then import them into our program. The problem is that it’s very manual…

有很多访问数据的方法,在这里我们可以在csv中下载文件,然后将它们导入到我们的程序中。 问题是这是非常手动的…

To automate all this, we can use scrapping tools. Either directly with queries that browse the HTML content of a page, or even reproducing the actions of a user with Selenium.

为了使所有这些自动化,我们可以使用剪贴工具。 直接使用浏览页面HTML内容的查询,甚至使用Selenium再现用户的操作。

But we will make it better/simpler. There is a package called “pytrends”, which is designed to pull google trends using python.

但是我们会使其更好/更简单。 有一个名为“ pytrends ”的软件包,旨在使用python拉动Google趋势。

💡 Tip: Always check if the work hasn’t already been done before coding something. There is probably already a library or a GitHub repo to use to do the job.

💡提示:在编写代码之前,请始终检查工作是否尚未完成。 可能已经有一个库或GitHub存储库可以用来完成这项工作。

First of all, we need to install that package. You could find the comprehensive documentation of the pytrends API here.

首先,我们需要安装该软件包。 您可以在此处找到pytrends API的全面文档。

Install the package with pip(in your virtual environment):

使用pip安装软件包(在您的虚拟环境中):

pip install pytrends

拉动Google趋势 (Pull Google Trends)

Let’s create a new python script and start by importing the necessary packages:

让我们创建一个新的python脚本,并从导入必要的包开始:

import pandas as pd
import pytrends
from pytrends.request import TrendReq

Using pytrend is very simple and well documented so we will go straight to the point.

使用pytrend非常简单并且有据可查,因此我们将直接进行讨论。

All you have to do is to establish the connection with Google (with TrendReq), build the query with the keywords you are looking for (in the form of a list), and apply the desired method. Here we want the evolution of searches as shown on Google Trends' Interest Over Time section, so we use: interest_over_time()

您要做的就是建立与Google的连接(使用TrendReq ),使用要查找的关键字构建查询(以列表的形式),然后应用所需的方法。 在这里,我们希望搜索演变(如Google趋势的“随着时间的兴趣”部分所示),因此我们使用: interest_over_time()

We will anticipate the next step (prediction) by deleting the isPartial column, and renaming the others.

我们将通过删除isPartial列并重命名其他步骤来预测下一步(预测)。

This returns a pandas.Dataframe with the data we need. Within a function, it looks like this:

这将返回一个pandas.Dataframe以及我们需要的数据。 在一个函数中,它看起来像这样:

import pandas as pd
import matplotlib.pyplot as plt


from pytrends.request import TrendReq


def get_data(keywords):
    keywords = [keywords]
    pytrend = TrendReq()
    pytrend.build_payload(kw_list=keywords)
    df = pytrend.interest_over_time()
    df.drop(columns=['isPartial'], inplace=True)
    df.reset_index(inplace=True)
    df.columns = ["ds", "y"]
    return df
  
  keyword = "Sunglasses"
  df = get_data(keyword)

步骤2:进行预测🔮(Step #2: Make forecasts 🔮)

Now it’s time to predict the future! We are here in the case of a time series forecasting.

现在是时候预测未来了! 我们是在时间序列预测的情况下。

Time series forecasting is the use of a model to predict future values based on previously observed values. — Wikipedia

时间序列预测是使用模型根据先前观察到的值预测未来值。 —维基百科

To predict the next values, we have a whole range of possible tools and concepts. We can use statistical methods such as ARMA, AMIRA, even SARIMAX models 🤯 Today, we also find very powerful Deep Learning models for this kind of problem, such as the classic MLP, CNN, RNN and their more advanced forms.

为了预测下一个值,我们提供了一系列可能的工具和概念。 我们可以使用统计方法,例如ARMA,AMIRA甚至SARIMAX模型。🤯今天,我们还发现了针对此类问题的非常强大的深度学习模型,例如经典的MLP,CNN,RNN及其更高级的形式。

We will keep it simple and efficient, we’ll use Facebook’s Prophet. Like any model, it has its advantages and disadvantages; but we are not here to debate about which algorithm to use (online courses are very good for that).

我们将使其简单高效,我们将使用Facebook的Prophet 。 像任何模型一样,它也有优点和缺点。 但是我们这里不讨论使用哪种算法(在线课程非常适合)。

We start by installing fbprophetin our work environment:

我们首先在工作环境中安装fbprophet

conda install -c conda-forge fbprophet

作出预测 (Make predictions)

We will create a new function make_pred()that takes as parameter the data set, and the length of the period to predict.

我们将创建一个新函数make_pred() ,该函数将数据集以及要预测的时间段长度作为参数。

In a very simple way, we create a new Prophet() object and fitthe model to the dataset. This one must always have two columns named ds and y (as we do it just before).

在一个非常简单的方法,我们创建了一个新的Prophet()对象,并fit模型的数据集。 该列必须始终具有名为dsy两列(就像我们之前所做的那样)。

We extend the column containing the dates with the make_futur_dataframe(). Then we predict the future values with predict(), which returns a dataframe with the predictions. Of course you can play here with some “hyperparameters”. All information is as always, in the documentation.

我们使用make_futur_dataframe()扩展包含日期的列。 然后,我们使用predict()预测将来的值,该函数将返回带有预测的数据帧。 当然,您可以在这里玩一些“超参数”。 所有信息一如既往,在文档中

绘制预测 (Plot the forecasts)

But don’t forget our friend Ben, the goal is to send him a good and understandable graph 📈.

但是不要忘记我们的朋友Ben,目标是向他发送一张清晰易懂的图表📈。

Here we use Prophet’s plot()method. And since a little detail doesn’t hurt, we also provide him some additional information with plot_components().

在这里,我们使用先知的plot()方法。 而且,由于一点细节都没有影响,我们还通过plot_components()为他提供了一些其他信息。

That’s it! Now we have our graphs with the forecasts for the desired keyword.

而已! 现在,我们有了带有所需关键字预测的图表。

import pandas as pd
import matplotlib.pyplot as plt


from fbprophet import Prophet


from pytrends.request import TrendReq


def get_data(keywords):
    keywords = [keywords]
    pytrend = TrendReq()
    pytrend.build_payload(kw_list=keywords)
    df = pytrend.interest_over_time()
    df.drop(columns=['isPartial'], inplace=True)
    df.reset_index(inplace=True)
    df.columns = ["ds", "y"]
    return df


def make_pred(df, periods):
    prophet_basic = Prophet()
    prophet_basic.fit(df)
    future = prophet_basic.make_future_dataframe(periods=periods)
    forecast = prophet_basic.predict(future)
    fig1 = prophet_basic.plot(forecast, xlabel="date", ylabel="trend", figsize=(10, 6))
    fig2 = prophet_basic.plot_components(forecast)
    forecast = forecast[["ds", "yhat"]]


    return forecast, fig1, fig2


keyword = "Smart Watch"
df = get_data(keyword)
forecast, fig1, fig2 = make_pred(df, 365)

If you are using Spyder IDE, you can see your dataframes and graphs in the Variable Explorer and Plot tab.

如果使用的是Spyder IDE,则可以在“变量资源管理器”和“绘图”选项卡中查看数据框和图形。

Image for post
Graphs in Spyder IDE
Spyder IDE中的图形

步骤#3:通过网络发送Send(Step #3: Send it on the web 🚀)

Now we get to the best part!

现在我们尽力了

You can make predictions on your own and send them as a report. But here we’re going to allow anyone to choose keywords, options on predictions, and all this in a UX friendly interface!

您可以自己进行预测,然后将其作为报告发送。 但是在这里,我们将允许任何人在UX友好界面中选择关键字,预测选项以及所有这些内容!

建立网路应用程式 (Create a web app)

We use a library that’s growing strongly, Streamlit (you’ll be able to check it with the trend prediction application at the end 😉)

我们使用一个快速增长的库Streamlit (您可以在末尾的趋势预测应用程序中对其进行检查😉)

No need for web development skills, no need to build an interface with Flask or Django, everything is done in a single script, in a few lines of code.

无需Web开发技能,无需与Flask或Django建立接口,所有事情都在一个脚本中完成,只需几行代码。

Start by installing the library.

首先安装库。

pip install streamlit

And import it into your script.

并将其导入脚本。

import streamlit as st

Without going into details, Streamlit takes the lines one by one and displays the Streamlit elements in the interface.

在不赘述的情况下,Streamlit逐行进行,并在界面中显示Streamlit元素。

We start by adding a title to our page with:

首先,向页面添加标题:

st.write("""
# Trend Predictor App :crystal_ball:
### This app predicts the **Google Trend** you want!
""")

Now we put some elements in the sidebar, this will allow the user to choose the parameters.

现在我们将一些元素放在侧栏中,这将允许用户选择参数。

  • keyword will be displayed as a text field and will take the value of the content, the default value here is “Sunglasses”.

    keyword将显示为文本字段,并将采用内容的值,此处的默认值为“太阳镜”。

  • pediodwill be displayed as a slider between 7 and 365 days with the default value at 100.

    pediod将显示为7到365天之间的滑块,默认值为100。

  • detailsis a boolean variable that is displayed as a checkbox.

    details是显示为复选框的布尔变量。

st.sidebar.write("""
## Pick a keyword and a forecasting period :dizzy:
""")keyword = st.sidebar.text_input("Keyword", "Sunglasses")
periods = st.sidebar.slider('Prediction time in days:', 7, 365, 90)
details = st.sidebar.checkbox("Show details")

To display the graphics we use:

要显示图形,我们使用:

st.pyplot(fig1)
if details: # if detail checkbox set to True
st.write("### Details :mag_right:")
st.pyplot(fig2)

There are a lot of great tools in this library and here again, the documentation is amazing!✨ We just add a decorator above our get_data()function:

这个库中有很多很棒的工具,这里的文档也很棒!✨我们只是在get_data()函数上方添加了一个装饰器

@st.cache(suppress_st_warning=True)

The @st.cache decorator indicates that Streamlit will perform internal magic so that any cached operation will run only once and cached for future use.

@st.cache装饰器指示Streamlit将执行内部魔术操作,因此任何缓存的操作将仅运行一次并缓存以备将来使用。

We will add some configuration elements to our page with:

我们将使用以下命令向页面添加一些配置元素:

st.beta_set_page_config(page_title="Trend Predictor",
page_icon=":crystal_ball",
layout='centered',
initial_sidebar_state='auto')

Not bad at all! Let’s see what it looks like locally?

一点也不差! 让我们看看本地的样子吗?

To do so, open a terminal in the folder where your script is located and type:

为此,请在脚本所在的文件夹中打开一个终端,然后键入:

streamlit run trend-prediction.py

🤩🤩🤩

🤩🤩🤩

import pandas as pd
import matplotlib.pyplot as plt
import streamlit as st


from fbprophet import Prophet


from pytrends.request import TrendReq


# get google trends data from keyword list
@st.cache
def get_data(keyword):
    keyword = [keyword]
    pytrend = TrendReq()
    pytrend.build_payload(kw_list=keyword)
    df = pytrend.interest_over_time()
    df.drop(columns=['isPartial'], inplace=True)
    df.reset_index(inplace=True)
    df.columns = ["ds", "y"]
    return df


# make forecasts for a new period
def make_pred(df, periods):
    prophet_basic = Prophet()
    prophet_basic.fit(df)
    future = prophet_basic.make_future_dataframe(periods=periods)
    forecast = prophet_basic.predict(future)
    fig1 = prophet_basic.plot(forecast, xlabel="date", ylabel="trend", figsize=(10, 6))
    fig2 = prophet_basic.plot_components(forecast)
    forecast = forecast[["ds", "yhat"]]


    return forecast, fig1, fig2


# set streamlit page configuration
st.beta_set_page_config(page_title="Trend Predictor",
                               page_icon=":crystal_ball",
                               layout='centered',
                               initial_sidebar_state='auto')


# sidebar
st.sidebar.write("""
## Choose a keyword and a prediction period :dizzy:
""")
keyword = st.sidebar.text_input("Keyword", "Sunglasses")
periods = st.sidebar.slider('Prediction time in days:', 7, 365, 90)
details = st.sidebar.checkbox("Show details")


# main section
st.write("""
# Trend Predictor App :crystal_ball:
### This app predicts the **Google Trend** you want!
""")
st.write("Evolution of interest:", keyword)


df = get_data(keyword)
forecast, fig1, fig2 = make_pred(df, periods)


st.pyplot(fig1)
    
if details:
    st.write("### Details :mag_right:")
    st.pyplot(fig2)

在网络上部署(Deploy it on the web)

One last effort before you can send the link to Ben 💪.

将链接发送给Ben before的最后努力。

You can finish your work by deploying the application on the web. And Heroku is the perfect tool to do this! Easy to use, free, fast.

您可以通过在Web上部署应用程序来完成工作。 Heroku是执行此操作的完美工具! 易于使用,免费,快速。

I’m not going to lengthen this tutorial with all the deployment, just type “Streamlit Heroku” on Google.

我不会在所有部署中加长本教程,只需在Google上键入“ Streamlit Heroku”即可。

🛑 The only thing to be careful! When you’re going to create your requirements.txtfile, you need to add an extra library. We installed Prophet with conda, but Heroku will need pystanas well. Think about adding it and it will work.

🛑唯一要注意的事! 当您要创建requirements.txt文件时,您需要添加一个额外的库。 我们在先知中安装了conda,但是Heroku也需要pystan 。 考虑添加它,它将起作用。

结论 (Conclusion)

In this tutorial we have seen that it is possible to combine different libraries that take care of each step of the building process. Of course this project is very “theoretical”: the predictions are basic and do not allow to adapt the parameters to the data set.

在本教程中,我们已经看到可以合并处理构建过程的每个步骤的不同库。 当然,该项目非常“理论”:预测是基本的,不允许将参数调整为数据集。

But the most important thing here is the global state of mind. This allows at least to test things quickly, to validate hypotheses and to ship things to your team!

但是这里最重要的是全球心态。 这样至少可以快速测试事物,验证假设并将事物运送给您的团队!

Happy learning!🥂 E.T.

学习愉快!🥂

About me 👨‍💻:

关于我👨

I am an engineering student whose greatest passion is learning new things. After 5 years in mechanical engineering, I learned data science through the incredible resources that can be found online. I try to give back and continue to learn by writing a few posts.

我是一名工程专业的学生,​​其最大的热情是学习新事物。 在机械工程领域工作了5年之后,我通过可在网上找到的大量资源学习了数据科学。 我尝试回馈,并通过写一些帖子继续学习。

Feel free to give me feedback on this article or contact me to discuss any topic!

欢迎给我有关本文的反馈或与我联系讨论任何主题!

翻译自: https://towardsdatascience.com/build-and-deploy-simple-ml-tools-by-stacking-coolest-librairies-f7cb94bad53d

业务口堆叠和堆叠子卡

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值