储存过程调用存储过程_拥有功能存储的重要性

最新推荐文章于 2024-07-08 21:49:15 发布

weixin_26704853

最新推荐文章于 2024-07-08 21:49:15 发布

阅读量402

点赞数

文章标签： python java 数据库编程语言 c++

原文链接：https://medium.com/data-for-ai/the-importance-of-having-a-feature-store-c91419b08dcf

版权

储存过程调用存储过程

Although feature stores play a vital role in data strategy, it’s still difficult to find information about them online. But understanding what feature stores are and why they’re important is crucial, especially in today’s world of increasing data governance and business problems being solved by machine learning models. Indeed, feature stores should be a fundamental part of your company’s entire machine learning operation.

尽管功能存储在数据策略中起着至关重要的作用，但是仍然很难在线找到有关它们的信息。但是，了解功能存储区及其重要性很重要，尤其是在当今数据治理和机器学习模型正在解决的业务问题日益增多的世界中。实际上，功能存储应该是公司整个机器学习操作的基本组成部分。

Among other benefits they offer, three specific advantages of feature stores make them invaluable: they enable the simple reuse of features across the company; they make it simple to standardize feature definitions and naming conventions; and they enable businesses to achieve consistency between the models a data scientist develops offline and the models when they are deployed online.

它们提供的其他好处中，功能存储的三个特定优点使它们变得无价：它们使整个公司中的功能得以简单重用；它们使标准化功能定义和命名约定变得容易。它们使企业能够实现数据科学家离线开发的模型与在线部署的模型之间的一致性。

什么是功能库？ (What is a feature store?)

Because “store” can have a number of meanings, it’s important to clarify that in the term “feature store” relates to “storage.” The store is actually a centralized software library that contains many functions, where each function creates a single feature from a standardized input (data). These features can be later fed into machine learning algorithms aimed to solve different problems.

由于“商店”可能有多种含义，因此必须澄清术语“功能商店”与“存储”相关。该商店实际上是一个包含许多功能的集中式软件库，其中每个功能都通过标准化输入(数据)创建一个功能。这些功能可以在以后用于解决不同问题的机器学习算法中。

When operating machine learning systems at scale, data professionals usually need to engineer large numbers of features in order to train their models. If the model is successful at solving the problem for which it was created and is deployed in production, the exact same features should later be created in the production environment to be fed to the model running in production. A feature store becomes an invaluable resource to data scientists during this process.

在大规模运行机器学习系统时，数据专业人员通常需要设计大量功能来训练他们的模型。如果模型成功解决了创建问题并在生产中部署的问题，则稍后应在生产环境中创建完全相同的功能，以馈入生产中运行的模型。在此过程中，功能存储成为数据科学家的宝贵资源。

Feature stores also allow data scientists to streamline the way features are maintained, paving the way to more efficient processes while ensuring that features are properly stored, documented and tested. Many projects and research assignments across a company use the same features. With a feature store, data scientists can quickly access the features they need and avoid doing repeat work. Feature stores also offer a tested and QAed way to create the feature and know that it’s reliable.

功能存储还使数据科学家可以简化功能的维护方式，为更高效的流程铺平道路，同时确保正确存储，记录和测试功能。公司中的许多项目和研究任务都使用相同的功能。借助功能存储，数据科学家可以快速访问所需的功能，并避免重复工作。功能存储还提供了经过测试且经过质量检查的方法来创建功能，并知道其可靠性。

为什么我们需要功能存储？ (Why do we need feature stores?)

There are a few feature-specific challenges data scientists face that the use of feature stores helps alleviate. These include:

数据科学家面临的一些特定于功能的挑战，即使用功能存储有助于缓解这些挑战。这些包括：

Features are not reused. A common obstacle data scientists face is spending time redeveloping features when using previously developed features or ones developed by other teams would have sufficed. Feature stores allow data scientists to avoid repeat work.
功能不会重复使用。 数据科学家面临的一个常见障碍是，在使用以前开发的功能或其他团队开发的功能就足够时，要花时间重新开发功能。功能存储使数据科学家可以避免重复工作。
Feature definitions vary. Different teams at any one company might define and name features differently. Moreover, accessing the documentation of a specific feature (if it exists at all) is often challenging. Feature stores address this issue by keeping features and their definitions organized and consistent. The documentation of the feature store helps you create a standardized language around all of the features across the company. You know exactly how every feature is computed and what information it represents.
功能定义各不相同。 任一公司的不同团队可能会以不同的方式定义和命名功能。此外，访问特定功能的文档(如果存在的话)通常具有挑战性。要素存储通过保持要素及其定义的组织性和一致性来解决此问题。功能存储区的文档可帮助您围绕公司的所有功能创建标准化的语言。您确切地知道每个功能是如何计算的以及它代表什么信息。
There is inconsistency between training and production features. Production and research environments often use different technologies and programming languages. The data streaming in to the production system needs to be processed into features in real time and fed into a machine learning model. For the modelling effort to be effective, the model developed offline in research needs to provide the exact same prediction as the model deployed online given the same data as input. Having a feature store that is environment agnostic (online and offline) suggests that given the same data, the model will be fed the same feature exactly.‍
培训和生产功能之间存在不一致之处。 生产和研究环境通常使用不同的技术和编程语言。流到生产系统中的数据需要实时处理为功能，然后输入到机器学习模型中。为了使建模工作有效，在给定相同数据作为输入的情况下，在研究中离线开发的模型需要提供与在线部署的模型完全相同的预测。具有与环境无关的功能存储(在线和脱机)表明，如果给定相同的数据，则将为模型精确地提供相同的功能。

特色商店的好处 (Feature store benefits)

When a company embraces feature stores, it allows data professionals across teams to follow the same general workflow for any machine learning use case — regardless of the challenges currently addressed (such as classification and regression, time series forecasting etc.) This workflow is typically implementation-agnostic, which means it can be easily adopted for use with new algorithm types and frameworks, such as classical ML algorithm alongside the newer deep learning frameworks.

当一家公司采用功能存储时，它允许跨团队的数据专业人员针对任何机器学习用例遵循相同的通用工作流程，而无需考虑当前面临的挑战(例如分类和回归，时间序列预测等)。此工作流程通常是实现方式-不可知论，这意味着它可以轻松地用于新算法类型和框架，例如经典ML算法以及更新的深度学习框架。

Another major benefit of using feature stores is the time savings it creates. The stage in any modelling effort where features are created tends to be the most time-consuming; this sensitive process requires that features be calculated correctly, with thousands of features being created at a time and computed in a production environment in the exact same way they were computed offline during research. The use of a feature store makes the process of creating features much more streamlined and efficient.

使用功能存储的另一个主要好处是可以节省时间。在任何建模工作中，创建特征的阶段通常都是最耗时的。这个敏感的过程要求正确地计算特征，一次要创建数千个特征，并在生产环境中以与研究过程中脱机计算完全相同的方式对其进行计算。使用要素存储使创建要素的过程更加简化和高效。

我的建议：集中式功能存储 (My recommendation: a centralized feature store)

My team have gained much value from building and maintaining a centralized feature store where different data professionals across the company can create and manage canonical features to be used by other members of the team. This allows data scientists to easily add features they’ve built into a shared feature store. Once features are there, they are easy to consume both online (in production) and offline (in research), simply by referencing a feature’s simple canonical name.

我的团队从构建和维护集中式功能存储中获得了很多价值，在该功能存储中，公司中不同的数据专业人员可以创建和管理可供团队其他成员使用的规范功能。这使数据科学家可以轻松地将其内置的功能添加到共享功能存储中。有了功能后，只需引用功能的简单规范名称，就可以轻松地在线(在生产中)和离线(在研究中)使用它们。

Today, we have thousands of features in our feature store that are used in a variety of machine learning projects across the company and across all domains. Our data scientists are adding new features all the time, with new features calculated automatically and updated daily. This has allowed our team members to avoid repeat work, and easily access a wealth of data they need for modelling and research purposes.

如今，我们的功能存储区中拥有成千上万的功能，这些功能已在公司和所有领域的各种机器学习项目中使用。我们的数据科学家一直在增加新功能，并自动计算并每天更新新功能。这使我们的团队成员避免了重复工作，并轻松访问了他们用于建模和研究目的所需的大量数据。