3-phased approach to delivering a lakehouse with data mesh

Quick introduction. I actually, most of my career have been on your side of the stage. So I run data analytics platforms globally for companies like Nike, Zendesk, American Eagle, Phillips, Concur. I begin my career working at the NSA and intelligence community. So first and foremost, security is always gonna be at my core.

About a year ago, I actually switched, I say on the other side of the table, I was the chief data officer at Databricks who was recently acquired by DataBricks. I actually reached out directly to the founders here at Dremio because frankly, I believe the technology that popped out offered something that I've needed for a very long time.

The talk has a bunch of buzzwords in it. We're gonna navigate around that. But what I really wanna highlight and make it real for you all is stuff that we're actually seeing in the field. And I'm also gonna highlight that from frankly some of my pain that I imagine I'll get a bunch of head shaking in the crowd too. And I know a lot of you feel a lot of this pain too. And obviously, that's why we're here to be able to learn from each other.

I'd say let's just jump right in. At Dremio I kind of live in multiple areas between product strategy, our go to market messaging as well as about half of my week, every week I spend with customers actually helping design data architectures and connecting the data transformation with the actual digital transformation, right? Because as you'll see in one of my slides, nobody funds architecture.

Let's just jump right in. I learned very early in my career, the hard way that in order to really drive transformation, you got to understand the past, respect the present and then go to find the future. This is why I won't say things like legacy because obviously those are operational systems that run the business today. So being able to respect those while trying to drive the transformation and bringing along the people and not just basically being a skunk works organization.

Let's do a quick review of what got us here today, right? And the one thing I want you all to anchor on is transformation especially even in technology doesn't happen because technology, right? Those engineering teams aren't often funded to drive technical innovation, right? And frankly, most companies don't cross the chasm because technology is cool. What if you look at this view a slightly different way? What you really see is better decisions need to be made faster, teams needed more access to resources and customers demanded more near real time and continuous based experiences. But what does that mean for the core infrastructure team?

I don't think there was a material change or difference really until we hit this lake house piece of this picture. And what I mean by that is if you go back to like the OGs in the room, who is still here today, right? Your Teradata's and so forth, right? They still run in my mind like 60 to 70% of infrastructure in the world, right? And why? Because it works right? But what did it mean to scale that up?

I remember trying to convert closets into server rooms to be able to expand out capacity, right? And it just had to be a better way, right? And it had to be a faster way when I was at Phillips. The teams used to tell me that. How did I, how did you get your capability to say? Well, I call Rackspace once a quarter and tell them what I'm gonna do. And stuff showed up, right? It just doesn't work that way anymore.

So when we shift it up and right, I, I like to attribute this like to the Martin Fowler of the world and really going from a domain driven design. Can we start logically separating those things? Right. And what was cool about Fowler? And at the time was it made complete sense, right? And when Hadoop and the world was spinning up and at the same time you think about it from the technical side, this is where open source really started hitting its boom.

I used to say when I was on cabs, this room will innovate faster for you than you ever will if you let us. And I think that's the cool thing that everybody starts now expecting openness in your architecture from those points.

But what happened? I don't know, like I almost want to ask like, how many people in here know more than four really good Hadoop people? And I mean, like really good. Okay. 122 hands, maybe there's three, I'm missing. I know like four and I worked in a lot of places and, and one time in my agency, I had a 90 person Hadoop team just on one implementation, right? So how does the technology work? Not that well? Right. But again, it still runs the ecosystems that are out there today. So you got to give it the respect it still deserves.

So as we shifted into cloud data warehouse. Right. And massive respect to Snowflake, I was Snowflake's second biggest customer at Nike and I would have spent anything on Snowflake because it gave me two things. I couldn't do one separate storage and compute. Right. I didn't need to go find the closet space but two, it gave me the ability to be cloud agnostic. Right. And, and that started really fully becoming a thing where I'm going no, I want the compute capabilities of an AWS competitor, but I want the infrastructure maturity of an AWS, right? But then I wanna run my Databricks ecosystem somewhere else in another cloud, right? And not have to worry about the Databricks piece, right? So again, it was about building the right experiences for my customers.

But we all still generally have the same problem is we're continuing to get locked in to different proprietary. I'll say not just formats but pieces, right? So even in the Databricks piece even with it open, it's like, well, good luck trying to take your code base out of that. I don't know if any of you have done that type type of migration, same thing within a Snowflake. But again, it was giving you capabilities, you couldn't find anywhere else in the world. And I think this is where the material shift really started hitting where you see a lot of the best of the other parts of the ecosystem in the world coming together.

And I think, and this is where I say massive respect to Databricks. Coining the term lake house because it, it allows you now to start thinking about agnostic, a lot of these underlying sources, right? You're now thinking about open as a priority on in your architecture underneath of everything. And you're thinking, and one thing I usually tell customers is it shouldn't take you 7 to 10 years to rip out a Dremio. If you don't want to use it anymore, it should take you 1 to 2 because of the use cases tied to it. Not because it's hard. And I think that's the part of these patterns, right? And most folks are thinking going, I want cloud agnostic, I want things built on open source or I want an orientation to open within your architecture ecosystem, right?

When I was at Zendesk, I had over 250 SaaS applications powering Zendesk. How could you ever connect all of that? Right. So it needed to be easy and flexible. And I think this is where you're starting to see the change. And when we look ahead now and, I'm not going to differentiate mesh from fabric. I could get into a fussy debate if anybody wants to have over there. I don't think that's the point that matters.

The point that matters is mesh really started introducing the idea that I can put more power in the business user's hands, right? And take a lot of some of a lot of the pain that shouldn't be in the data engineer hands and put it in the business. But in a way that's not the data cowboy problem we had years ago, we talked about data democratization where people got in service in trouble where they took down operational systems, right? These patterns now we got the capability across multi platforms to be able to do that safely. And I think that's where we're gonna start seeing things move forward.

And you got some companies like DVT that are allowing you now to go. Can I even have reuse of even my, my, my code base? And so even with Dremio donating Nei, right, can I start thinking about my data products or even data assets, whether that's a table, a model or otherwise? And actually treat it the same way I would do as the actual core product of my company, which means code checks, which means being able to do full on testing, which makes lets me roll back because somebody changed an up, up system sales force column and it took out five of my top 10 dashboards. And yes, that's happened to me before.

But being able to quickly roll that back because when that happened to me, it took me a whole day to figure it out why can't we just do that in seconds or minutes, right? Because you can't always forecast those problems.

So, so all that said that was a really cool diagram of, of, of views, but most folks still have the Oracles in the world, the Teradata's, the Hadoop, they have multi cloud, they have, they're looking at lake calls as a pattern, they're looking at mesh, right? This is the world we still operate in. It's still kind of nasty, right? So we need a maturity path to be able to get there and to start kind of separating.

And my my fun down here and we can even get debates on data contracts and such is even orientated around that dark data. We still kind of want it because we know there's probably a data scientist somewhere wicked smart that we haven't hired yet. Let's go to transform our company. I still hope for it at least.

So I wanna like point down to the bottom, right? The thing not to ignore in all of this is in the world that we're operating in our microeconomic climate is the compelling economics. The one thing I continue to hear on is every single feature we release or every single thing you're talking about integrating with. What does that mean for TCO, right? I'm being constantly asked to do more with less and I'm incessantly asked for my business to do more, right? So we still need to drive better transformation experiences. While at the same time, we still have all that legacy architecture underneath of us.

And I said I wouldn't say legacy. But um so what does that mean? That means our patterns need to shift and they need to change. We can't keep doing things the same way, right? So ETL shifts more into ELT. Yes, there's danger there, right? But how we navigate that matters and we need to start thinking about how do we make it more simple for folks to interact with the data and not just put all that straight on central engineering teams.

So finding common languages like SQL, that's easier for folks to understand and learn and use the momentum of gen AI that allows SQL to be auto generated off of natural language, right? That's where everybody's starting.

So this is kind of the view is like, well, that's cool. Like what does this mean? How do we make it real?

So I do want to call out a couple of customers that are building out their data mesh with Dremio um but not to the normal way. The the cool part of this is that I want to start with MSK. So MSK was initially looking at doing data virtualization and they were like, well, can you just help us do this? Because I have way too many CSVs way too many emails going out in the world. I can't really govern that I have no idea what's happening to my data. And my initial response when I was chatting with Elite, there was like, yeah, I don't know if there's still a company that I've ever even worked at that still knows where all their data is, but it shouldn't be that bad. Right.

So just by connecting and doing the virtualization piece where they ended up finding, doing both the mesh and the lake house type of architecture, combining those best debris principles. They, they immediately found out that they can eliminate about 90% of their Kafka footprint because one, I don't need to be using data copies everywhere. I don't need to be incessantly enriching all this. And three, I didn't know what was happening after the copy moved anyway, right? So there had to be a better way. And what that in turn also turned into is four data engineers, not just on the platform side, four data engineers of efficiency that they found that they were able to free up for new opportunities.

So again, it's not a, I'm worried about layoffs. This is a, hey, I'm thinking about what I can do more and I'm freeing up people thinking about the problem differently. And my, my favorite example on here is actually TransUnion and TransUnion in a similar way they want to say as well. Can we go after some really meaty nasty problem in the world?

After our initial work with them on Hadoop and Oracle modernization initiative. And the really cool thing they, they decided was, well, what if we started combining external data, data resources with internal data resources? What could happen? Could we do something transformative? And what they found is they were able to actually take farmers in India with no access to capital previously and actually generate over 80 million people new access to capital. And they said, well, could this even scale? This sounds really cool. They brought it to the US and hit another 20 million. So they basically opened up over 100 million people that could never have access to the former capital in the farming industry, new access to capital.

They also saved 14 data engineers in the process. So all this is say like what, what does that process or phasing look like?

So this is the three phase piece. So first one is I say unified data access, right? And at the core of this one, it's truly like more of that virtualization type of story. But I want to be extremely clear, don't stop at virtualization, virtualization doesn't help you fully on the performance side. Virtualization also doesn't help you define and do ownership of the assets that are being created. What it does is it makes it easier to get access to the data. While at the same time on the central teams, it agnostics the underlying sources, right?

So the teams don't go directly to Oracle, the teams don't go directly into BigQuery, right? The teams interact at a higher level and the semantic layer also moves up to a higher level. So if you want to make any of the changes underneath, it's much easier and faster to do so because you're not disrupting their day to day life.

So all that said you do the phase one to move in the step two. If you already have obviously a data lake, this is much faster and easier to do. But this is the part where you start moving in your modern data step. And I use my air quotes there because what I really mean is this is where you start finding your TCO. This is where you start going. It like what patterns or platforms can i start bringing in or replacing? Right.

I had a leader once hit me on the head and said, if you ever try to bring another piece of software in here, you need to deprecate at least two other ones, right? So, this is where this starts to happen, right? And what do you normally see? This is where you're finding your TCO, right? You're, you're si I think most of the problems I've ever brought in my past, I self funded, I renegotiated my BI license, I switched from a Adobe implementation to a G4, right? Those are the pieces where you can go cool, I can now kind of self fund and show the innovation that allow me to fund in a longer term, right?

And I think this is where you should also expect on the engineering team to drive a better performance and experience for the engineers. I got a couple of examples here. So just with Renaissance Re for example, with S3 and Dremio just plugging it in not using reflections or anything else, their performance alone dramatically improved, right? And when they started using reflections, obviously dramatic from 33 to 3 is, is big. It's not as much as 4200 to 33.

And I think these are the pieces where being able to create quick wins and show fast value, right? And if you think about that in in context of, of the actual cost, here's an example. And, and I like to say a lot of people ask for benchmarks. Every vendor that will ever send you a benchmark report will show they're faster. So I prefer never to show those. What I'd rather do is show stuff customers have sent us unsolicited.

So this is an example of a Fortune 50 customer that just on the infrastructure alone, not talking, taking account of resourcing, not taking account of data movement and copies and so forth just on infrastructure alone. And this is the most mature Hadoop implementation I've ever seen already. They saw 120K savings and then the last one here, we'll just say our other lake house, big competitor, which frankly, I still can't get a benchmark report that shows a material difference between us, which I shouldn't probably say that out loud. But this is really cool when I, when we got sent this, and this is the, the, when we start talking about TCO, that's a material difference than just performance. And obviously we can talk about that over the booth if you're interested.

So what's phase three? Right. So we agnostic at the sources we've connected. We started looking at bringing in new modern platforms to patterns. This is the part now of going cool. Do you want to talk about enterprise mesh? Do you wanna talk about enterprise fabric? And what does that mean on the technology side? Right? Most companies live in phase one and two, to be honest with you, right? The more advanced companies start shifting into three. But we all know you don't start with the whole enterprise, you start on use cases or you start in organizations or you start in hot hot spaces, right?

But the part you want to incessantly look at is I wanna, I wanna mature into that three and that three now is you're now fully running an open architecture and ecosystem. You are operating with much more flexibility in your, in your implementations to be able to say if I want to rip a Dremio out cool, I can, right? If I wanna switch from one vendor to another, I can, the formats aren't locking me in, how I'm running it is not locking me in. Here is the remaining transcript formatted:

That was all cool. But remember the ugly spaghetti sheet, we still need to be able to get there, right? How are we thinking about the Oracle migrations? How are we thinking about the Hadoop migrations and modernizations all being able to hit that as a first step.

Here's a couple of examples here. And again, I'll grab an example like The Hartford. This is a good example where again, a lot of this was just being able to run, I want to run faster smarter and also cheaper, right? So I need all of those things. All three are possible now.

I am at time. I'm happy to dive into any of these examples. I'll throw this up here real quick. We do have a free lake house option. So if you want to get in, just try it, my biggest recommendation to you don't listen to me and try it yourself.

The second one, if you want to chat, we're happy to, right. So the proof's always the p. Otherwise, thank you very much. And again, we're happy to dive into any of these examples over the booth, right over here to the right or your left.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值