COVID-19 has forced nearly every organization to adapt to a new workforce reality: distributed teams. We share four key tactics for turning your remote data team into a force multiplier for your entire company.

COVID-19迫使几乎每个组织都适应新的劳动力现实:分散的团队。 我们分享了四个关键策略,可将您的远程数据团队变成整个公司的力量倍增器。

It’s month 6 (or is it 72? It’s hard to tell) of the global pandemic, and despite the short commute from your bedroom to the kitchen table, you’re still adjusting to this new normal.


Your team is responsible for all the same tasks (handling ad-hoc queries, fixing broken pipelines, implementing new rules and logic, etc.), but troubleshooting broken data has only gotten harder. It’s difficult enough to identify the root cause of a data downtime incident when you’re all 5 feet away from each other; it’s 10 times harder when you’re working on different time zones.

您的团队负责所有相同的任务(处理临时查询,修复损坏的管道,实现新规则和逻辑等),但是对损坏的数据进行故障排除只会变得更加困难。 当您彼此相距5英尺时,要确定数据停机事件的根本原因已经非常困难。 当您在不同时区工作时,难度会增加10倍。

Distributed teams aren’t novel, in fact, they’ve become increasingly common over the last few decades, but working during a pandemic is new for everyone. While this shift widens the geographic talent pool, collaborating at this scale entails unforeseen hurdles, particularly when it comes to working with real-time data.

分布式团队并不是什么新奇的事物,事实上,在过去的几十年里它们已经变得越来越普遍,但是在大流行期间工作对于每个人来说都是新事物。 尽管这种转变扩大了地理人才库,但这种规模的协作带来了不可预见的障碍,尤其是在处理实时数据时。

Your daily standup only gets you so far.


Here are 4 essential steps to managing a great distributed data team:


记录所有东西 (Document all the things)

Information about which tables and columns are “good or bad” breaks down when teams are distributed. One data scientist we spoke with at a leading e-commerce company told us that it takes 9 months of working on a team to develop a spidey-sense for what data lives where, which tables are the ‘right’ ones, and which columns are healthy vs. experimental.

分配团队时,有关哪些表和列是“好是坏”的信息会分解。 我们在一家领先的电子商务公司与之交谈的一位数据科学家告诉我们,一个团队需要花9个月的时间开发出针对数据存放在何处,哪些表是“正确的”表,哪些列是什么的间谍意识。健康与实验。

The answer? Consider investing in a data catalog or lineage solution. Such technologies provide one source of truth about a team’s data assets, and make it easy to understand formatting and style guidelines for data input. Data catalogs become particularly important when data governance and compliance come into play, which is top of mind for data teams in financial services, healthcare, and many other industries.

答案? 考虑投资数据目录或沿袭解决方案 。 此类技术提供了有关团队数据资产的一个真实来源,并易于理解数据输入的格式和样式准则。 当数据治理和合规性发挥作用时,数据目录就变得尤为重要,这对于金融服务,医疗保健和许多其他行业的数据团队而言,是最重要的。

设置数据的SLA和SLO (Set SLAs and SLOs for data)

It’s important to ensure alignment not just among data team members but with data consumers (i.e., marketing, executives, or operations teams), too. To do so, we suggest taking a page out of the site reliability engineering book and setting and align clear service level agreements (SLAs) and service level objectives (SLOs) for data. SLAs for expectations around data freshness, volume, and distribution, as well as other pillars of observability, will be crucial here.

重要的是,不仅要确保数据团队成员之间的一致性,而且还要确保与数据消费者(即市场,执行人员或运营团队)的一致性。 为此,我们建议从站点可靠性工程手册中抽出一页,并为数据设置并调整明确的服务水平协议(SLA)和服务水平目标(SLO)。 关于数据新鲜度,数据量和分布以及其他可观察性Struts的 SLA在这里至关重要。

Katie Bauer, a Data Science Manager at Reddit, suggests distributed data teams maintain a central document with expected delivery dates for important projects, and review that document weekly.

Reddit的数据科学经理Katie Bauer建议分布式数据团队维护一个中心文档,其中包含重要项目的预计交付日期,并每周审查该文档。

“Instead of pinging my team for updates throughout the week when questions arise from stakeholders, I can easily visit this document for answers,” she said. “This keeps us focused on delivering our work and avoids unnecessary diversions.”

她说:“当利益相关者提出问题时,我不必整周对我的团队进行更新,而是可以轻松访问此文档以获取答案,”她说。 “这使我们专注于交付工作,避免了不必要的转移。”

投资自助工具 (Invest in self-serve tooling)

Investing in self-serve data tools (including cloud warehouses like Snowflake and Redshift, as well as data analytics solutions, like Mode, Tableau, and Looker) will streamline data democratization no matter the location or persona of the data user.


Similarly, self-serve versioning control systems helps everyone stay on the same page when it comes to collaborating on larger workflows, which becomes extremely important when it comes to leveraging real-time data across time zones.


优先考虑数据可靠性 (Prioritize data reliability)

Industries that are responsible for managing PII and other sensitive customer information, like healthcare and financial services, have a low tolerance for mistakes. Data teams need confidence that data is secure and accurate across their pipeline, from consumption to output. The right processes and procedures around data reliability can prevent such data downtime incidents and restore trust in your data.

医疗保健和金融服务等负责管理PII和其他敏感客户信息的行业对错误的容忍度较低。 数据团队需要信心,确保从消费到输出的整个管道中的数据都是安全和准确的。 围绕数据可靠性的正确流程和步骤可以防止此类数据停机事件并恢复对数据的信任。

For many years, data quality monitoring was the primary way in which data teams caught broken data, but this isn’t cutting it anymore, particularly when real-time data and distributed teams are the norm. Our remote-first world calls for a more comprehensive solution that can seamlessly track the five pillars of data observability and other important data health metrics tailored to the needs of your organization.

多年来,数据质量监视是数据团队捕获损坏的数据的主要方式,但是这种情况已不再减少,尤其是在实时数据和分布式团队成为常态的情况下。 我们的远程第一世界需要一个更全面的解决方案,该解决方案可以无缝地跟踪数据可观察性的五个Struts以及适合组织需求的其他重要数据健康指标。

记住:没事也可以 (Remember: it’s OK to not be OK)

We hope these tips help you accept and even embrace the data world’s new normal.


On top of this more tactical advice, however, it never hurts to remember that it’s OK to not be OK. Emilie Schario, GitLab’s first data analyst who is now an internal strategy consultant, put it best: “This is not normal remote work. What it takes to be successful during a period of forced remote work in a global pandemic is different from what it means to be remote-as-usual.”

但是,除了这个更具战术性的建议外,记住“ 不行是可以的”也从未有过任何伤害。 GitLab的第一位数据分析师Emilie Schario现已成为内部战略顾问,他最好地指出:“这不是正常的远程工作。 在全球大流行中被迫进行远程工作期间要取得成功所需要的与不同于通常进行远程管理意味着什么。”

We’d love to hear your advice for leading distributed teams! Reach out to Barr Moses with your words of wisdom.

我们很想听听您对领先的分布式团队的建议! 用您的智慧之言与 Barr Moses 接触

This article was written by Will Robins & Barr Moses.


数据库备份策略 分布式





