大数据平台构建_如何像产品一样构建数据平台

大数据平台构建

重点 (Top highlight)

Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on what it actually takes to successfully build one.

在过去的几年中,许多公司都将数据平台视为一种有效的大规模聚合,处理和利用数据的方法。 尽管数据平台越来越受欢迎,但是,关于成功构建数据平台实际所需的文献很少。

Barr Moses, CEO & co-founder of Monte Carlo, and Atul Gupte, former Product Manager for Uber’s Data Platform Team, share advice for designing a data platform that will maximize the value and impact of data on your organization.

蒙特卡洛(Monte Carlo) 首席执行官兼联合创始人 Barr Moses Uber数据平台团队 前产品经理 Atul Gupte 分享了有关设计数据平台的建议,以最大程度地提高数据对组织的价值和影响。

Your company likes data. A lot. Your boss requested additional headcount this year to beef up your data engineering team (Presto and Kafka and Hadoop, oh my!). Your VP of Data is constantly lurking in your company’s Eng-Team Slack channel to see “how people feel” about migrating to Snowflake. Your CEO even wants to become data-driven, whatever that means. To say that data is a priority for your company would be an understatement.

您的公司喜欢数据。 很多。 您的老板今年要求增加人员,以增强您的数据工程团队(Presto,Kafka和Hadoop,我的天哪!)。 您的数据副总裁一直潜伏在公司的Eng-Team Slack频道中,以了解人们对迁移到Snowflake的 “感觉”。 您的CEO甚至想成为数据驱动型,无论这意味着什么。 要说数据是贵公司的优先事项,那是轻描淡写。

To satisfy your company’s insatiable appetite for data, you may even be building a complex, multi-layered data ecosystem: in other words, a data platform.

为了满足公司对数据的无限需求,您甚至可能正在构建一个复杂的多层数据生态系统:换句话说,就是一个数据平台

At its core, a data platform is a central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. For most organizations, building a data platform is no longer a nice-to-have but a necessity, with many businesses distinguishing themselves from the competition based on their ability to glean actionable insights from their data, whether to improve the customer experience, increase revenue, or even define their brand.

数据平台的核心是所有数据的中央存储库,用于处理数据的收集,清理,转换和应用以产生业务见解。 对于大多数组织而言,构建数据平台已不再是一个好主意 ,而是必不可少的 ,因为许多企业基于从数据中收集可行的见解,是否改善客户体验,增加收入的能力,将自己与竞争对手区分开来。 ,甚至定义自己的品牌。

Much in the same way that many view data itself as a product, data-first companies like Uber, LinkedIn, and Facebook increasingly view data platforms as “products,” too, with dedicated engineering, product, and operational teams. Despite their ubiquity and popularity, however, data platforms are often spun up with little foresight into who is using them, how they’re being used, and what engineers and product managers can do to optimize these experiences.

就像许多人将数据本身视为产品一样, UberLinkedInFacebook等数据优先公司也越来越多地将数据平台视为“产品”,并拥有专门的工程,产品和运营团队。 尽管数据平台无处不在且很受欢迎,但是它们常常毫无预见性地演变为谁在使用它们,如何使用它们以及工程师和产品经理可以做什么以优化这些体验。

Whether you’re just getting started or are in the process of scaling one, we share five best practices for avoiding these common pitfalls and building the data platform of your dreams:

无论您是刚刚起步还是正在扩展一个,我们都会分享五种最佳实践,以避免这些常见的陷阱并构建您梦想中的数据平台:

使您的产品目标与业务目标保持一致 (Align your product’s goals with the goals of the business)

Image for post
It’s important to align your platform’s goals with the overarching data goals of your business. Image courtesy of John Schnobirch on Unsplash.
使平台的目标与业务的总体数据目标保持一致很重要。 图片由 John Schnobirch 在Unsplash上​​提供。

For several decades, data platforms were viewed as a means to an end versus “the end,” as in, the core product you’re building. In fact, although data platforms powered many services, fueling rich insights to the applications that power our lives, they weren’t given the respect and attention they truly deserve until very recently.

几十年来,数据平台一直被视为实现目标而不是“终结”的手段,就像您正在构建的核心产品一样。 实际上,尽管数据平台为许多服务提供了支持,并为支持我们生活的应用程序提供了丰富的见识,但直到最近,他们才真正得到应有的重视和关注。

When you’re building or scaling your data platform, the first question you should ask is: how does data map to your company’s goals?

在构建或扩展数据平台时,您应该问的第一个问题是: 数据如何映射到公司的目标?

To answer this question, you have to put on your data platform product manager hat. Unlike specific product managers, a data platform product manager must understand the big picture versus area-specific goals since data feeds into the needs of every other functional team, from marketing and recruiting to business development and sales.

要回答这个问题,您必须戴上数据平台产品经理的帽子。 与特定的产品经理不同, 数据平台的产品经理必须了解全局和特定区域的目标,因为数据会满足从营销和招聘到业务开发和销售的每个其他职能团队的需求

For instance, if your business’s goal is to increase revenue (go big or go home!), how does data help you achieve these goals? For the sake of this experiment, consider the following questions:

例如,如果您的企业目标是增加收入(变大或回家!),那么数据如何帮助您实现这些目标? 为了进行此实验,请考虑以下问题:

  • What services or products drive revenue growth?

    哪些服务或产品推动收入增长?
  • What data do these services or products collect?

    这些服务或产品收集什么数据?
  • What do we need to do with the data before we can use it?

    在使用数据之前,我们需要对数据做什么?
  • Which teams need this data? What will they do with it?

    哪些团队需要此数据? 他们将如何处理?
  • Who will have access to this data or the analytics it generates?

    谁将有权访问此数据或其生成的分析?
  • How quickly do these users need access to this data?

    这些用户需要多长时间才能访问此数据?
  • What, if any, compliance or governance checks does the platform need to address?

    平台需要解决哪些(如果有)合规性或治理检查?

By answering these questions, you’ll have a better understanding of how to prioritize your product roadmap, as well as who you need to build for (often, the engineers) versus design for (the day-to-day platform users, including analysts). Moreover, this holistic approach to KPI development and execution strategy sets your platform up for a more scalable impact across teams.

通过回答这些问题,您将更好地了解如何确定产品路线图的优先级,以及为(通常是工程师)为谁构建的(而不是针对(包括平台的)日常平台用户的设计) )。 而且,这种用于KPI开发和执行策略的整体方法为平台建立了跨团队的更具可扩展性的影响。

获得正确的利益相关者的反馈和支持 (Gain feedback and buy-in from the right stakeholders)

It goes without saying that receiving both buy-in upfront and iterative feedback throughout the product development process are necessary components of the data platform journey. What isn’t as widely understood is whose voice you should care about.

毋庸置疑,在整个产品开发过程中,既要获得预购的支持,又要获得迭代式反馈,这是数据平台之旅的必要组成部分。 尚未广为人知的是您应该关注谁的声音。

Yes, you need the ultimate sign-off from your CTO or VP of Data on the finished product, but their decisions are often informed by their trusted advisors: staff engineers, technical program managers, and other day-to-day data practitioners.

是的,您需要最终产品的CTO或数据副总裁的最终批准,但他们的决定通常是由其值得信赖的顾问(员工工程师,技术程序经理和其他日常数据从业人员)告知的。

While developing a new data cataloging system for her company, one product manager we spoke with at a leading transportation company spent 3 months trying to sell her VP of Engineering on her team’s idea, only to be shut down in a single email by his chief-of-staff.

在为公司开发新的数据分类系统时,我们在一家领先的运输公司与一位产品经理进行了交流,他们花了3个月的时间试图根据她的团队的想法出售她的工程副总裁,但随后被他的首席执行官以一封电子邮件关闭了,工作人员。

Consider different tactics based on the DNA of your company. We suggest following these three concurrent steps:

根据您公司的DNA考虑不同的策略。 我们建议遵循以下三个并行步骤:

  1. Sell leadership on the vision.

    领导愿景。
  2. Sell the brass tacks and day-to-day use case on your actual users.

    向实际用户出售铜钉和日常用例。
  3. Apply a customer-centric approach, no matter who you’re talking to. Position the platform as a means of empowering different types of personas in your data ecosystem, including both your data team (data engineers, data scientists, analysts, and researchers) and data consumers (program managers, executives, business development, and sales, to name a few categories). A great data platform will enable the technical users to do their work easily and efficiently, while also allowing less technical personas to leverage rich insights or put together visualizations based on data without much assistance from engineers and analysts.

    无论您与谁聊天,都应以客户为中心 。 将平台定位为增强数据生态系统中不同类型角色的一种手段,包括您的数据团队(数据工程师,数据科学家,分析师和研究人员)和数据消费者(程序经理,主管,业务开发和销售),列举几个类别)。 出色的数据平台将使技术用户能够轻松高效地完成工作,同时还允许较少的技术人员利用丰富的见解或基于数据将可视化结果整合在一起,而无需工程师和分析师的大力支持。

Image for post
There are a variety of data personas you have to consider when you’re building a data platform for your company, from engineers, data scientists, product managers, business function users, and general managers). (Image courtesy of Atul Gupte)
在为公司构建数据平台时,必须考虑各种数据角色,包括工程师,数据科学家,产品经理,业务职能用户和总经理。 (图片由Atul Gupte提供)

At the end of the day, it’s important that this experience nurtures a community of data enthusiasts that build, share, and learn together. Since your platform has the potential to serve the entire company, everyone should feel invested in its success, even if that means making some compromises along the way.

归根结底,重要的是,这种体验应养育一群数据爱好者,他们可以一起建立,共享和学习。 由于您的平台有潜力服务于整个公司,因此每个人都应该为自己的成功而投入,即使这意味着在此过程中做出一些妥协。

优先考虑长期增长和可持续性与短期收益 (Prioritize long-term growth and sustainability vs. short-term gains)

Image for post
Data solutions with short-term usability in mind are often easier to get off the ground, but over time, end up being more costly than platforms built with sustainability in mind. (Image courtesy of Atul Gupte.)
考虑到短期可用性的数据解决方案通常更容易上手,但是随着时间的流逝,最终要比考虑到可持续性的平台成本更高。 (图片由Atul Gupte提供。)

Unlike other types of products, data platforms are not successful simply because they benefit “first-to-market.” Since data platforms are almost exclusively internal tools, we’ve found that the best data platforms are built with sustainability in mind versus feature-specific wins.

与其他类型的产品不同,数据平台之所以不能成功,不仅仅是因为它们有益于“首创”。 由于数据平台几乎完全是内部工具,因此我们发现,构建最佳数据平台时要考虑到可持续性与特定功能的优势。

Remember: your customer is your company, and your company’s success is your success. This is not to say that your roadmap won’t change several times over (it will), but when you do make changes, do it with growth and maturation in mind.

请记住:您的客户就是您的公司,而公司的成功就是您的成功。 这并不是说您的路线图不会多次改变(它会改变),但是当您进行更改时,请牢记增长和成熟度。

For instance, Uber’s big data platform was built over the course of five years, constantly evolving with the needs of the business; Pinterest has gone through several iterations of their core data analytics product; and leading the pack, LinkedIn has been building and iterating on its data platform since 2008!

例如, 优步(Uber)的大数据平台是在过去的五年中建立的,并随着业务需求不断发展。 Pinterest已经对其核心数据分析产品进行了多次迭代。 从2008年开始, LinkedIn就一直在其数据平台上进行构建和迭代!

Our suggestion: choose solutions that make sense in the context of your organization, and align your plan with these expectations and deadlines. Sometimes, quick wins as part of a larger product development strategy can help with achieving internal buy-in — as long as it’s not shortsighted. Rome wasn’t built in a day, and neither was your data platform.

我们的建议: 选择在您的组织范围内有意义的解决方案,并使您的计划与这些期望和最后期限保持一致。 有时,只要不是短视的话,将快速获胜作为更大的产品开发策略的一部分可以帮助实现内部认可。 罗马不是一天建成的,您的数据平台也不是一天。

签署数据的基准指标及其测量方式 (Sign-off on baseline metrics for your data and how you measure it)

It doesn’t matter how great your data platform is if you can’t trust your data, but data quality means different things to different stakeholders. Consequently, your data platform won’t be successful if you and your stakeholders aren’t aligned on this definition.

如果您不信任数据,则数据平台的强大程度并不重要,但是数据质量对于不同的利益相关者而言意味着不同的事情。 因此,如果您和您的利益相关者对此定义不统一,则您的数据平台将不会成功。

To address this, it’s important to set baseline expectations for your data reliability, in other words, your organization’s ability to deliver high data availability and health throughout the entire data life cycle. Setting clear Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for software application reliability is a no-brainer. Data teams should do the same for their data pipelines.

为了解决这个问题,重要的是为数据可靠性设定基线期望换句话说,就是组织在整个数据生命周期中提供高数据可用性和运行状况的能力。 为软件应用程序的可靠性设置明确的服务水平目标(SLO)和服务水平指标(SLI)并非难事。 数据团队应该对他们的数据管道做同样的事情。

This isn’t to say that different stakeholders will have the same vision for what “good data” looks like; in fact, they probably won’t, and that’s OK. Instead of fitting square pegs into round holes, it’s important to create a baseline metric of data reliability and, as with building a new platform feature, gain sign-off on the lowest common denominator.

这并不是说不同的利益相关者对“好数据”的外观会有相同的看法; 实际上,他们可能不会,那就可以了。 与将方形钉插入圆Kong中不同,重要的是创建数据可靠性的基准度量标准,并且与构建新的平台功能一样,获得最低公分母上的签字。

We suggest choosing a novel measurement (like this one for data downtime) that will help data practitioners across the company align on baseline quality metrics.

我们建议选择一种新颖的度量标准( 例如用于数据停机的度量标准),以帮助整个公司的数据从业人员调整基准质量指标。

知道何时建造与购买 (Know when to build vs. buy)

One of the first decisions you have to make is whether or not to build the platform from scratch or purchase the technology (or several supporting technologies) from a vendor.

您首先要做出的决定之一是是否从头开始构建平台或从供应商那里购买技术(或几种支持技术)。

While companies like — you guessed it — Uber, LinkedIn, and Facebook have opted to build their own data platforms, often on top of open source solutions, it doesn’t always make sense for your needs. While there isn’t a magic formula that will tell you whether to build vs. buy, we’ve found that there is value in buying until you’re convinced that:

尽管您猜对了,但Uber,LinkedIn和Facebook这样的公司通常选择在开源解决方案之上构建自己的数据平台,但这并不总是符合您的需求。 虽然没有一个神奇的公式可以告诉您是建造还是购买,但我们发现购买是有价值的,直到您确信:

  • The product needs to operate using sensitive/classified information (e.g., financial or health records) that cannot be shared with external vendors for regulatory reasons

    产品需要使用出于监管原因无法与外部供应商共享的敏感/分类信息(例如财务或健康记录)进行操作
  • Specific customizations are required for it to work well with other internal tools/systems

    为了使其与其他内部工具/系统良好配合,需要进行特定的自定义
  • These customizations are niche enough that a vendor may not prioritize them

    这些自定义项非常利基,因此供应商可能不会优先考虑它们
  • There is some other strategic value to building vs. buying (i.e., competitive advantage for the business or beneficial for hiring talent)

    建立与购买之间还有其他一些战略价值(例如,企业的竞争优势或人才的聘用优势)

One VP of Data Engineering at a healthcare startup we spoke with noted that if he was in his 20s, he would have wanted to build. But now, in his late 30s, he would almost exclusively buy.

我们与之交谈的一家医疗保健初创公司的数据工程副总裁指出,如果他20多岁,他本来想建造。 但是现在,在他30多岁的时候,他几乎会独家购买。

“I get the enthusiasm,” he says, “But I’ll be darned if I have the time, energy, and resources to build a data platform from scratch. I’m older and wiser now — I know better than to NOT trust the experts.”

他说:“我充满热情,但是如果我有时间,精力和资源从头开始构建数据平台,我会感到惊讶。 我现在年纪大了,也比较聪明-我比不信任专家更了解。”

When it comes to where you could be spending your time — and more importantly, money — it often makes more sense to buy a tried and true solution with a dedicated team to help you solve any issues that arise.

说到您可能会花费时间的地方-更重要的是,省钱-在专门的团队那里购买经过实践检验的真实解决方案来帮助您解决出现的任何问题通常更有意义。

下一步是什么? (What’s next?)

Image for post
Building a data platform is an exciting journey that will benefit from applying from a product development perspective. Image courtesy of memegenerator.net .
从产品开发的角度来看,构建数据平台是一段令人兴奋的旅程,它将受益于此。 图片由 memegenerator.net提供

Building your data platform as a product will help you ensure greater consensus around data priorities, standardize on data quality and other key KPIs, foster greater collaboration, and, as a result, bring unprecedented value to your company.

将数据平台构建为产品将帮助您确保就数据优先级达成更大的共识,标准化数据质量和其他关键KPI,促进更好的协作,从而为您的公司带来空前的价值。

In addition to serving as a vehicle for effective data management, reliability, and democratization, the benefits of building a data platform as a product include:

除了充当有效数据管理,可靠性和民主化的手段外,构建数据平台产品的好处还包括:

  • Guiding sales efforts (giving you insights on where to focus your efforts based on how prospective customers are responding)

    指导销售工作(根据潜在客户的React为您提供工作重点的见解)
  • Driving application product road maps

    驾驶应用产品路线图
  • Improving the customer experience (helps teams learn what your service pain points are, what’s working, and what’s not)

    改善客户体验(帮助团队了解您的服务难题是什么,什么在起作用以及什么不起作用)
  • Standardizing data governance and compliance measures across the company (GDPR, CCPA, etc.)

    标准化整个公司的数据治理和合规措施(GDPR,CCPA等)

Building a data platform might seem overwhelming at first blush, but with the right approach, your solution has the potential to become a force multiplier for your entire organization.

乍一看,构建数据平台似乎不堪重负,但是采用正确的方法,您的解决方案就有可能成为整个组织的力量倍增器。

Want to learn more about building a reliable data platform? Reach out to Barr Moses and the Monte Carlo Team.

想更多地了解构建可靠的数据平台吗? 接触 Barr Moses 和蒙特卡洛团队。

This article was co-written by Barr Moses and Atul Gupte.

本文由Barr MosesAtul Gupte共同撰写。

翻译自: https://towardsdatascience.com/how-to-build-your-data-platform-like-a-product-6677e8abe318

大数据平台构建

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值