数据库不停机导数据方案
In addition to wasted time and sleepless nights, data quality issues lead to compliance risks, lost revenue to the tune of several million dollars per year, and erosion of trust — but what does bad data really cost your company? I’ve created a novel data downtime calculator that will help you measure the true financial impact of bad data on your organization.
除了浪费时间和不眠之夜之外, 数据质量问题还 导致合规风险, 每年 损失 数百万美元的 收入 和信任度下降-但是,糟糕的数据真正会使您的公司付出什么呢? 我创建了一个新颖的 数据停机计算器 ,可以帮助您衡量不良数据对组织的真正财务影响。
What’s big, scary, and keeps even the best data teams up at night?
有什么大的,令人恐惧的,甚至可以让最好的数据团队在夜间工作?
If you guessed the ‘monster under your bed,’ nice try, but you’d be wrong. The answer is far more real, all-too-common, and you’re probably already experiencing it whether or not you realize it.
如果您猜到了“床下的怪物”,可以尝试一下,但是您会错的。 答案要真实得多,太普遍了,无论您是否意识到,您可能已经在体验它了。
The answer? Data downtime. Data downtime refers to periods of time when your data is partial, erroneous, missing, or otherwise inaccurate, ranging from a few null values to completely outdated tables. These data fire drills are time-consuming and costly, corrupting otherwise excellent data pipelines with garbage data.
答案? 数据停机时间。 数据停机时间是指数据部分,错误,丢失或不准确的时间段,范围从几个空值到完全过时的表。 这些数据防火练习既耗时又昂贵, 使用垃圾数据破坏了本来很好的数据管道 。
坏数据的真实代价 (The true cost of bad data)
One CDO I spoke with recently told me that his 500-person team spends 1,200 cumulative hours per week tackling data quality issues, time otherwise spent on activities that drive innovation and generate revenue.
我最近与之交谈的一位CDO告诉我,他的500人团队每周花1200个小时累计时间来解决数据质量问题,否则将时间花在推动创新和创收的活动上。
To demonstrate the scope of this problem, here are some fast facts about just how must time data teams waste on data downtime:
为了说明此问题的范围,以下是一些快速的事实,说明数据团队必须如何浪费时间进行数据停机:
50–80 percent of a data practitioner’s time is spent collecting, preparing, and fixing “unruly” data. (The New York Times)
数据从业人员有50-80%的时间用于收集,准备和修复“不守规矩”的数据。 ( 纽约时报 )
40 percent of a data analyst’s time is spent on vetting and validating analytics for data quality issues. (Forrester)
数据分析师有40%的时间用于审查和验证数据质量问题的分析。 ( Forrester )
27 percent of a salesperson time is spent dealing with inaccurate data. (ZoomInfo)
销售人员有27%的时间用于处理不准确的数据。 ( ZoomInfo )
50 percent of a data practitioner’s time is spent on identifying, troubleshooting, and fixing data quality, integrity, and reliability issues. (Harvard Business Review)
数据从业人员有50%的时间用于识别,故障排除和修复数据质量,完整性和可靠性问题。 ( 哈佛商业评论 )
Based on these numbers, as well as interviews and surveys conducted with over 150 different data teams across industries, I estimate that data teams spend 30–40 percent of their time handling data quality issues instead of working on revenue-generating activities.
根据这些数字,以及对跨行业的150多个不同数据团队进行的访谈和调查,我估计数据团队将30%至40%的时间用于处理数据质量问题,而不是从事创收活动。
The cost of bad data is more than wasted time and sleepless nights; there are serious compliance, financial, and operational implications that can catch data leaders off guard, impacting both your team’s ROI and your company’s bottom line.
错误数据的代价不仅是浪费时间和不眠之夜; 严重的合规性,财务和运营影响可能会使数据领导者措手不及,从而影响团队的投资回报率和公司的底线。
合规风险 (Compliance risk)
For several decades, the medical and financial services sectors, with their responsibility to protect personally identifiable information (PII) and stewardship of sensitive customer data sources, was the poster child for compliance.
几十年来,医疗和金融服务部门一直负责保护个人身份信息(PII)和管理敏感的客户数据源,这是遵守法规的典型代表。
Now, with nearly every industry handling user data, companies from e-commerce sites to dog food distributors must follow strict data governance mandates, from GDPR to CCPA, and other privacy protection regulations.
现在,几乎每个行业都在处理用户数据,从电子商务站点到狗粮分销商的公司必须遵循严格的数据治理要求,从GDPR到CCPA以及其他隐私保护法规。
And bad data can manifest in any number of ways, from a mistyped email address to misreported financials and can cause serious ramifications down the road; for instance, in Vermont, outdated information about whether or not a customer wants to renew their annual subscription of a service can spell the difference between a seamless user experience and a class action lawsuit. Such errors can lead to fines and steep penalties.
从错误的电子邮件地址到错误的财务报告,不良数据可能以多种方式表现出来,并可能导致严重后果。 例如, 在佛蒙特州 ,有关客户是否想要续订其年度服务的过时信息可以消除无缝的用户体验与集体诉讼之间的区别。 这样的错误可能导致罚款和严厉的处罚。
收入损失 (Lost revenue)
It’s often said that “time is money,” but for any company seeking the competitive edge, “data is money” is more accurate.
人们常说“时间就是金钱”,但是对于任何寻求竞争优势的公司来说,“数据就是金钱”更为准确。
One of the most explicit links I’ve found between data downtime and lost revenue is in financial services. In fact, one data scientist at a financial services company that buys and sells consumer loans told me that a field name change can result in a $10M loss in transaction volume, or a week’s worth of deals.
我发现数据停机和收入损失之间最明显的联系之一是金融服务。 实际上,一家买卖消费者贷款的金融服务公司的数据科学家告诉我,域名更改可能导致交易额损失1000万美元,或一周的交易额。
Behind these numbers is the reality that firefighting data downtime incidents not only wastes valuable time but tears teams away from revenue-generating projects. Instead of making progress on building new products and services that can add material value for your customers, data engineering teams spend time debugging and fixing data issues. A lack of visibility into what’s causing these problems only makes matters worse.
这些数字背后的事实是,消防数据停机事件不仅浪费宝贵的时间,而且使团队远离创收项目。 数据工程团队没有在开发可以为您的客户增加实质价值的新产品和服务上取得进展,而是花时间调试和解决数据问题。 对导致这些问题的原因缺乏了解只会使情况变得更糟。
侵蚀数据信任 (Erosion of data trust)
The insights you derive from your data are only as accurate as the data itself. In fact, it’s my firm belief that numbers can lie and using bad data is worse than having no data at all.
您从数据中得出的见解仅与数据本身一样准确。 实际上,我坚信数字会撒谎,并且使用不良数据比根本没有数据还要糟糕。
Data won’t hold itself accountable, but decision makers will, and over time, bad data can erode organizational trust in your data team as a revenue driver for the organization. After all, if you can’t rely on the data powering your analytics, why should your CEO? And for that matter, why should your customers?
数据本身不负责任,但是决策者将随着时间的流逝,坏数据会削弱组织对您的数据团队的信任,因为它是组织的收入驱动力。 毕竟,如果您不能依靠数据来支持分析,那么为什么您的CEO应该呢? 那么,为什么您的客户呢?
To help you mitigate your data downtime problem, we put together a Data Downtime Cost Calculator that factors in how much money you’re likely to lose dealing with data downtime fire drills instead of working on revenue-generating activities.
为了帮助您缓解数据停机问题,我们建立了一个数据停机成本计算器 ,该计算器将您可能会损失多少钱来处理数据停机消防演习而不是从事创收活动。
您的数据停机成本计算器 (Your Data Downtime Cost Calculator)
As such, the annual cost of your data downtime can be measured by the engineering or resources you need to spend to resolve it.
因此,数据停机的年度成本可以通过解决该问题所需的工程或资源来衡量。
I’d propose that the right data downtime calculator factors in the cost of labor to tackle these issues, your compliance risk (in this case, we used the average GDPR fines), and the opportunity cost of losing stakeholder trust in your data. Per earlier estimates, you can assume that around 30 percent of an engineer’s time will be spent tackling data issues.
我建议正确的数据停机计算器应考虑解决这些问题的劳动力成本,合规风险(在这种情况下,我们使用GDPR的平均罚款)以及失去利益相关者对数据的信任的机会成本。 根据较早的估计,您可以假设工程师的大约30%的时间将花费在解决数据问题上。
Bringing this all together, your Data Downtime Cost Calculator is:
综上所述,您的数据停机成本计算器是:
Labor Cost: ([Number of Engineers] X [Annual Salary of Engineer]) X 30%
人工成本:([工程师人数] X [工程师年薪])X 30%
+
+
Compliance Risk: [4% of Your Revenue in 2019]
合规风险:[2019年收入的4%]
+
+
Opportunity Cost: [Revenue you could have generated if you moved faster, releasing X new products, and acquired Y new customers]
机会成本:[如果您移动得更快,发布X个新产品并获得Y个新客户,您可能会产生收入]
= $年度数据停机成本 (= $ Annual Cost of Data Downtime)
Keep in mind that this equation will vary by company, but we’ve found that our framework can get most teams started.
请记住,这个方程式会因公司而异,但是我们发现我们的框架可以使大多数团队入手。
Measuring the cost of your data downtime is the first step towards fully understanding the implications of bad data at your company. Fortunately, data downtime is avoidable. With the right approach to data reliability, you can keep the cost of bad data at bay and prevent bad data from corrupting good pipelines in the first place.
衡量数据停机成本是全面了解不良数据对公司的影响的第一步。 幸运的是,可以避免数据停机。 使用正确的数据可靠性方法,您可以控制坏数据的成本, 并首先防止坏数据破坏好的管道。
Have another way to measure the impact of data downtime? Would love to hear from you!
还有另一种方法来衡量数据停机的影响吗? 希望 收到您的 来信!
翻译自: https://towardsdatascience.com/how-to-calculate-the-cost-of-data-downtime-c0a48733b6f0
数据库不停机导数据方案