How to navigate the challenges of the data modeling process

Data modeling and curation can help businesses more efficiently use data they've collected. There are challenges, however -- beginning with ensuring data quality.

George Lawton

  

Enterprises are adopting data modeling and curation practices to help bring order to a process to quickly create business value. It sounds easy on paper, but managers should consider several challenges to making the data modeling process work effectively.

At a high level, the biggest challenges include ensuring that the data correlates with the real world and that it can be woven into existing business processes and analytics apps. Business managers frequently underestimate the amount of time it takes to clean data.

Other challenges include data curation and modeling across disparate sources and data stores, as well as ensuring security and governance in the process.

Challenge No. 1: Ensuring data quality

Data quality and the amount of effort required to address data quality issues are usually the biggest challenges in this area, said Ryohei Fujimaki, founder and CEO of dotData, a data science automation platform.

"The traditional process to deal with data quality involves a lot of hard-coded business logic, which makes the data pipeline very difficult to maintain and to scale," he said.

Cleaning and preparing data in an automated way typically requires a significant upfront investment in data engineering to improve the data sources, data transportation and data quality. These efforts can be stymied when managers don't know the potential business value of a project.

Challenge No. 2: Identifying contributors to dirty data

If the data at the source is not accurate, everything in the rest of the data modeling process that's based on that data will crumble like dominos.

"All of the decisions or insights generated will be exponentially inaccurate, and this is what most businesses are facing today," said Kuldip Pabla, senior vice president of engineering at K4Connect, a technology platform for seniors and individuals living with disabilities.

The traditional process to deal with data quality involves a lot of hard-coded business logic.

Ryohei FujimakiFounder and CEO, dotData

Data inaccuracy could creep in during the creation and acquisition of data, during the cleaning process or even when annotating data. For instance, in the healthcare or elderly markets, inaccurate health datacould be the deciding factor between life and death. A wrong decision based on inaccurate data could lead to severe consequences, Pabla said.

Challenge No. 3: Enabling fitness for purpose

Fitness for purpose means the data is correct and trustworthy for its intended uses according to the rules that govern it. According to Justin Makeig, director of product management at MarkLogic, an operational database provider, "The real difficulty is that correct and trustworthy are very contextual. They vary [based] on the data and how it's being used."

For example, a health insurance provider might use the same data about patients to analyze the quality of care and to perform billing. Those are very different concerns with very different rules, even if they both use the same or overlapping data.

Challenge No. 4: Breaking down data silos

Another challenge to the data modeling process is siloed systems, which often run on legacy architectures and undermine the effectiveness of predictive analytics tools. In addition, newer data sources, such as third-party apps, premium data sources, and self-service BI and machine learning curated data sets, are growing in importance.

"Knowing what data you have available [and] efficiently empowering data-driven cultures of self-service users is no easy feat," said Jen Underwood, senior director at DataRobot, an automated machine learning platform.

Privacy regulations also create new obstacles to bringing these data silos together efficiently.

Challenge No. 5: Avoiding the Las Vegas effect

Analytics users can also inadvertently create new data silos in the cloud when they use SaaS-specific analytics platforms like Salesforce Einstein to curate and model data sets outside of centralized data engineering efforts. Organizations need to strike a balance between the benefits of ad hoc analytics on these platforms and making it easier to adopt centralized data management practices.

"It's the Las Vegas effect: What happens in Vegas, stays in Vegas," said Jean-Michel Franco, senior director of data governance products at Talend.

For example, a business user might use Tableau to fix errors in data that comes from Salesforce. Although this fixes his problem, he has, in fact, created a new data silo.

Overall, this approach is also inefficient. Research from IDC shows that a typical data professional spends only 19% of his time analyzing data, while 81% is dedicated to finding data, preparing it to meet requirements, and protecting or sharing it.

Challenge No. 6: Starting fresh or cleaning old data

Enterprises need to confront a sunken cost bias when deciding on a curation and data modeling process. Many enterprises have assembled massive stores of data without figuring out how they fit into the business process.

Josh Jones, manager of analytics, AspirentJosh Jones

Managers may feel a sense of loss when data engineers suggest that throwing this data out and starting over could reduce the cost and effort required to clean and model the data for specific goals.

"There is always a tradeoff between taking in more data and cleaning the data you already have," said Josh Jones, manager of analytics at Aspirent, an analytics consulting service. There are also often important negotiations between managers about who owns the data and who is responsible for cleaning it.

"Often, one group does not want to take on all the work," Jones said.

Different business units may also have different requirements for how they want to clean the data.

Challenge No. 7: Understanding what the business cares about

"The biggest challenge to data curation and modeling remains the same: Does the business care?" said Goutham Belliappa, vice president of AI engineering at Capgemini. If a business cannot see the value of data, then it will have no motivation to curate it.

Excel remains the No. 1 BI tool in most organizations, which is a fundamental demonstration that quality is secondary to immediacy, he said. Curation needs to add value to the data modeling process.

"What is bad data for one person could be good data for another," Belliappa said. For instance, bad sales data from a customer accuracy perspective is clearly bad for sales. But this same data might be ideal for implementing AI to identify challenges in the current sales process.

"Too often, people try and curate toward a single perspective, leading to other areas of the business disengaging," Belliappa said.

This was last published in February 2019

source: https://searchdatamanagement.techtarget.com/feature/How-to-navigate-the-challenges-of-the-data-modeling-process?src=5900535&asrc=EM_ERU_113169155&utm_content=eru-rd2-rcpD&utm_medium=EM&utm_source=ERU&utm_campaign=20190521_ERU%20Transmission%20for%2005/21/2019%20(UserUniverse:%20539424)

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
电子图书资源服务系统是一款基于 Java Swing 的 C-S 应用,旨在提供电子图书资源一站式服务,可从系统提供的图书资源中直接检索资源并进行下载。.zip优质项目,资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松copy复刻,拿到资料包后可轻松复现出一样的项目。 本人系统开发经验充足,有任何使用问题欢迎随时与我联系,我会及时为你解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(若有),项目具体内容可查看下方的资源详情。 【附带帮助】: 若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步。 【本人专注计算机领域】: 有任何使用问题欢迎随时与我联系,我会及时解答,第一时间为你提供帮助,CSDN博客端可私信,为你解惑,欢迎交流。 【适合场景】: 相关项目设计中,皆可应用在项目开发、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面中 可借鉴此优质项目实现复刻,也可以基于此项目进行扩展来开发出更多功能 【无积分此资源可联系获取】 # 注意 1. 本资源仅用于开源学习和技术交流。不可商用等,一切后果由使用者承担。 2. 部分字体以及插图等来自网络,若是侵权请联系删除。积分/付费仅作为资源整理辛苦费用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值