Yhat白皮书：实践中的数据科学

最新推荐文章于 2022-03-14 16:04:19 发布

cumei1658

最新推荐文章于 2022-03-14 16:04:19 发布

阅读量918

点赞数

文章标签：算法大数据 python 人工智能 java

原文链接：https://www.pybloggers.com/2017/01/yhat-whitepaper-data-science-in-practice/

版权

This blogpost is an excerpt of our most popular whitepaper about how data science gets applied in the real world. You can also download the full whitepaper PDF if you’d like.

这篇博客文章摘录了我们最流行的白皮书，该白皮书介绍了如何在现实世界中应用数据科学。您也可以根据需要下载完整的白皮书PDF。

关于什么 (What it’s About)

In this whitepaper we introduce five common applications of data science that build upon that definition and goal. We debunk the impression that data science is some type of obscure black magic and give you concrete examples of how it is applied in reality. You’ll learn how real companies are using data science to make their products and day- to-day operations better. Last but not least, we describe the data science life cycle and explain Yhat’s role in getting models into production.

在本白皮书中，我们介绍了基于该定义和目标的五个数据科学通用应用程序。我们揭穿了数据科学是某种晦涩的黑魔法的印象，并为您提供了如何在现实中应用它的具体示例。您将了解真正的公司如何使用数据科学来改善其产品和日常运营。最后但并非最不重要的一点是，我们描述了数据科学的生命周期，并解释了Yhat在将模型投入生产中的作用。

应用程序1：推荐系统 (Application 1: Recommender Systems)

Recommender systems, also known as recommender engines, are one of the most well known applications of data science. Recommender systems are a subclass of information filtering systems, systems that cut through the noise of all options and present users with just the subset of options they’ll find appealing. The data being filtered can range from products on an e-commerce site to dating matches that appear as you search for ‘the one.’

推荐系统，也称为推荐器引擎，是数据科学最著名的应用之一。推荐系统是信息过滤系统的子类，信息过滤系统可以消除所有选项的干扰，并向用户提供他们会觉得很有吸引力的选项子集。过滤的数据范围从电子商务网站上的产品到搜索“一个”时出现的日期匹配。

Recommender systems offer a more intelligent approach to information filtering than a simple search algorithm by introducing users to items they might not have otherwise discovered. Recommender systems generally take either a collaborative or content-based approach to filtering. Collaborative filtering considers a user’s previous behavior, as well as the behavior of similar users. Content- based filtering provides recommendations based on discrete attributes or assigned characteristics.

与简单的搜索算法相比，推荐系统通过向用户介绍他们可能不会发现的项目，从而提供了一种比简单搜索算法更智能的信息过滤方法。推荐系统通常采用协作或基于内容的方法进行过滤。协作过滤考虑用户的先前行为以及相似用户的行为。基于内容的过滤基于离散属性或分配的特征提供建议。

Data scientists at energy software company Tendril opted for a hybrid approach that combines both collaborative and content- based filtering. Tendril provides analytics and consumer solutions to energy suppliers, including which energy products consumers would most likely consider. “We use Support Vector Regression models to predict household energy consumption to provide our clients with in-depth, personalized information about their customers,” explains Mark Gately, Data Analytics Manager at Tendril. “This detailed information is also used in recommendation models, which help match eligible customers with new or existing energy products.”

能源软件公司Tendril的数据科学家选择了一种将协作过滤和基于内容的过滤相结合的混合方法。 Tendril为能源供应商提供分析和消费者解决方案，包括消费者最有可能考虑的能源产品。 Tendril的数据分析经理Mark Gately解释说：“我们使用支持向量回归模型来预测家庭能耗，从而为我们的客户提供有关其客户的深入，个性化信息。” “此详细信息还用于推荐模型，可帮助使符合条件的客户与新能源或现有能源产品匹配。”

应用程序2：信用评分 (Application 2: Credit Scoring)

If you have ever applied for a credit card or a loan, you’re likely already familiar with the concept of credit scoring. What you may be less aware of is the set of decision management rules evaluating how likely an applicant is to repay debts behind the scenes.

如果您曾经申请过信用卡或贷款，那么您可能已经熟悉信用评分的概念。您可能不太了解的是一组决策管理规则，用于评估申请人在后台偿还债务的可能性。

The first general purpose credit scoring algorithm, now known as the FICO score, was introduced in 1989. The FICO score is still one of the most widely used models in the United States today, though peer-to-peer and direct lending organizations have focused on developing new techniques over the past few years. These new machine learning models and algorithms capture innovative factors and relationships that traditional loan scorecards couldn’t, like how applicants manage monthly cash flow or whether friends or community members would endorse the applicant.

第一个通用信用评分算法，现在称为FICO评分，于1989年推出。尽管P2P和直接借贷组织已将FICO评分视为当今美国使用最广泛的模型之一，在过去几年中开发新技术。这些新的机器学习模型和算法捕获了传统贷款记分卡无法提供的创新因素和关系，例如申请人如何管理每月现金流量或朋友或社区成员是否会认可申请人。

One such company is Ferratum Bank, a pioneer in financial technology and mobile consumer lending since 2005. “We developed complex statistical and machine learning models to enable smarter lending decisions,” explains Scott Donnelly, Director of Business Lending at Ferratum Bank. “By getting creative with our approach and adopting innovative technologies, we’ve been able to reinvent how both consumers and businesses obtain loans. This has allowed us to reach prospective customers that in the past may have been overlooked by traditional banking institutions.”

一家名为Ferratum Bank的公司就是其中之一。Ferratum Bank自2005年以来一直是金融技术和移动消费者贷款领域的先驱。“我们开发了复杂的统计和机器学习模型，以做出更明智的贷款决策，” Ferratum Bank商业贷款总监Scott Donnelly解释说。 “通过采用我们的方法进行创新并采用创新技术，我们已经能够彻底改变消费者和企业如何获得贷款的方式。这使我们能够接触到以前可能被传统银行机构忽视的潜在客户。”

应用程序3：动态定价 (Application 3: Dynamic Pricing)

You walk out of the store, arms full of groceries, only to realize that a torrential downpour began as you perused the produce inside. You struggle to retrieve your phone, check your favorite ride app and are dismayed to find…a 2.1x surge!? Welcome to your first lesson on dynamic pricing.

您走出商店，手里拿着杂货，却发现在您仔细阅读里面的农产品时就开始了倾盆大雨。您很难取回手机，检查自己喜欢的乘车应用程序，却很沮丧地发现……2.1倍的浪涌！？欢迎来到动态定价的第一堂课。