A Journey from Data Warehousing to Big data Insights – What I learnt from Henry Ford, Albert Einstei

The pictures are from the white board in my office –you might have heard of Henry and Albert, but Rusty is the VP of Enterprise Data Services and Big Data at Regions Bank. He made that statement on a call several months ago and it’s been on my white board ever since. The quotes remind me every day that all companies are facing huge issues from the explosion of Big Data that is breaking traditional architectures. Yet while this disruption presents great opportunity – to get new insights that are transformative to the business the existing architecture and way of doing things has to change – more of the same has already proven to not be the right answer.

Syncsort, Tableau and Cloudera delivered a webinar and Paul Lilford, Matt Brandwein, Jorge Lopez and I worked together for several months building the content and at every meeting it was interesting how all of us were seeing exactly the same disruption from different viewpoints. I began my career in data warehousing 15 years ago, and since then, I’ve helped design, build and implement not just data warehouses, but also, the products and tools like ETL that are used to create them around the world across every vertical. But what’s completely blown my mind over the last 18 months is the incredible disruption Hadoop has had on traditional data architectures and from what I can tell that’s just a small taste of what’s to come.

Early data warehouses were insanely successful ─ they exposed the fact that business users have an unquenchable thirst for more information and once they are hooked they just want more and more.  The only comparison that comes close is trusted data for business users is like a drug addiction. They always want more data, more frequently, of a higher quality and with fewer restrictions while dealing with a limited budget.

Fast forward to today and data warehouse and business intelligence infrastructure has been a victim of its own success – like a teenager hitting adolescence – warehouses have experienced massive growing pains. While at the same time the data they provide has become such a critical part of corporate infrastructure that for example lots of quarterly reports are based on data from them.

If you’re delivering your financial results based on the data warehouse, then you have to trust and govern that information or your executives could go to jail. As a result business intelligence projects have continued to receive funding despite growing costs – Gartner estimates the average costs of just the data integration component is between 500K and 1M USD.

So what are the essential elements in the new information architecture to liberate organizations access to data insights?

1)  You need an enterprise data hub (EDH) which serves a number of roles:

  1. It provides an active archive (the ultimate enterprise data staging area) for all data and types, cost-effectively retained in full fidelity for any time period automatically providing a compliance archive.
  2. A single place to transform data from its raw format into structured formats for consumption by existing systems like data warehouses and marts. You can with tools like DMX-h bring in legacy sources like mainframe that were previously unavailable to the warehouse / reporting tools and combine that data with other sources in full fidelity.
  3. Business users get agile access to specific data sets to directly explore and analyse reducing business intelligence backlog requests and freeing capacity on existing systems.
  4. You can bring new analytic workloads to the EDH including products like Syncsort DMX-h running directly IN it and Tableau running AGAINST it generating greater insight, and more value from data driving revenue and profit

2)     Migrate data staging – ELT (ETL running as SQL in the RDBMS) is an EDW Resource HOG up to 80% of storage/workload in some cases. By collecting, processing and distributing data using an EDH (ETL on Hadoop) you can reduce costs, free up resources etc. Today warehouse databases have less capacity to run end user queries and reports because of ELT and dormant data, so instead of the full data volume “data retention windows” were necessary. In addition the maintenance nightmare this creates means a spaghetti like architecture that customers call the onion effect because new layers of ELT SQL are added around the existing ones – because nobody wants/knows how to change the existing code without breaking it and if you have to make changes – everyone involved ends up crying. TDWI estimates it takes upwards of eight weeks to add a single column to a table and I’ve regularly seen six month wait times for new column /adding new data sources

3)     You need a tool like Tableau to provide self-service connection and visualization of the data – not just for business users doing reporting, but for data scientist that wants to explore / analyze data and discover new insight that may even lead to new report requests from the warehouse.

All these topics are covered in the webinar but if you take one thing away from reading this please remember – it’s not just Rusty’s warehouse that has become like a barge ship and it’s not a bad thing either – data warehouses are great at solving a specific problem but to discover new insight we need to do something different. Ultimately, successful customers are deploying an enterprise data hub downstream of the warehouse to enhance the staging area – the dirty secret of every data warehouse. They are definitely not getting rid of the RDBMS and if anything the capacity it’s releasing there is being used to solve new problems.

This new paradigm delivers a seamless end-to-end solution from data to insight and it’s evolving rapidly to become easier and less expensive. Don’t be held back by “old school” notions of how to solve for the explosion of data. What has worked in the past is not the answer for today. Just like me, every day remind yourself of the wise words of Henry, Albert and Rusty.

 

- See more at: http://vision.cloudera.com/a-journey-from-data-warehousing-to-big-data-insights-what-i-learnt-from-henry-ford-albert-einstein-and-rusty-sears/#sthash.II7xtlkp.dpuf
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值