You are a data governance person. And what that means is we're going to talk to you a little bit today about what that means. But we're trying to treat data as your differentiator for all of your funded business initiatives throughout your organization. And that means that you need skills. You have to know how to understand how to curate and how to protect that data.
The good news for you is that we have experts here, Gabe from Semper California, Mirco from Narra and Ruben from BMW. And we're gonna give you the cumulative knowledge we've gained over 70 plus years in this field to talk to you about how to do those things better, including how to manage your data for generative AI applications.
My name is IA Scheim. Let's get started. We're skipping the agenda.
So this is gonna be a very informal conversation all of us chiming in, but I wanted to talk a little bit about the overall vision at AWS. We call this AWS for data. You may have heard of this. But really again, it's that data being your differentiator.
So quick question for you guys. How many of you have released, let's say a major project in the last three months? Give me your hints. Ok. Now, keep your hands up. No, no, no. Keep them up. How many of those projects were shipped? Exactly as scoped. Exactly. No. And if you have your hand up, come up here, you're speaking next time.
Um but none of them are exactly scoped. And because of that you need adaptability, you need a comprehensive set of solutions that you can adapt and plug in and plug out. That's the first of these AWS for data pillars. The second is integration because when you have all those changing pieces, you need them to be integrated both from a UX layer as well as a data layer.
Now, the good news for you is that there are, I don't know 60 sessions about those things that re invent. So check out your session agenda. That is not what we're talking about today. We're talking about the third pillar, which is data governance.
So when you talk about data governance, what we're talking about are these three things, finding and accessing, keeping the data secure and enabling controls. I mean, those of you who've worked in data governance a long time, do you have top talent pounding on your door saying I wanna be on data governance. It's a cool place to be. See, laughter is what we're getting here. Laughter that is not how it works. And why is that because data governance has the perception of being bureaucratic and being slow.
Um I talked to a customer who said when they need a new field added to a report, it takes them nine months to do it and see nobody's laughing at that because that sounds like, yeah, I can get that in large part. This is because data governance is viewed as the stopper, the the bureaucratic piece that says, let's write it down, let's discuss it, let's write a mission statement and this could take a long time, but we can't treat data governance that way.
And especially how quickly the the industry is moving and with new capabilities like generative AI instead it needs to be that enabler for innovation. There are ways to put the in the control that you need govern that access as well as to really be the engine that can, that can work behind your funded business initiatives like generative AI.
So those are the things we're gonna talk to you about today. How do you do that? Well, at AWS, we group this into a few categories. The first one being understanding your data. This means what is the context of that data? Do I know where it came from? Do we know who has transformed it can do I understand what are the potential best uses for this information? That's really that understand category. And you're gonna hear specifics about this later from Ruben when he talks about the BMW example.
The second category there is curate. How do you make sure that your data is fit for use and fresh enough for the uses that you need it for not just for users but also for engines. Also for models. Also for applications, you need to make sure you're curating the data for all of those use cases. And you're gonna hear later from Mirco about the curate case from Narra.
The bottom section is protect. This is where we're worried about compliance governing the access and the life cycle of the data. We're gonna touch on that lightly today, but we're gonna focus also on the middle of this circle which often gets overlooked. The middle circle is about how do I get my people and processes together to make sure all of this happens. How do I organize for success? How do I report back for success and show that we're having an impact on our funded business initiative? And you'll see, analytics and ML governance on the circle because they need to be best friends, your, your ML governance team, your analytics governance team and your data governance team need to work hand in hand.
And you'll hear from Gabe from Central California and how they have organized for success and tackled some of these things. It's not just the technology people like I think people come to this session and they maybe think we're just gonna give them a bunch of AWS services. We're not gonna do that here because a lot of what the conversation revolves around is the people in the process. And how do you change that organizational mindset? Of course, we do have, we will talk some surfaces and solution architectures. You're gonna get all of that goodness too, but there will be some fair focus on how to get the people and the process together.
And we do that by following this high-level scheme. Um first of all, we talk a lot about data as a lifestyle, data decision making, having a data driven company, you need that sort of data lifestyle in your organization. And guess what? This requires a lot of work. If you think about lifestyle changes that you make for yourself, none of these things happen in three months like January and February don't count if you just go to the gym in January and February, sorry, that doesn't count. You need a sustained long term lifestyle change.
And how do you do that? You need quick wins. If you decide you wanna run a five k, how are you gonna do that? You're not just gonna go right to the five k, you're gonna develop a strategy for how do I get there? My husband actually did this, it's called a couch to five k. Um and it's a whole program you can follow, but it's a measurable goal. And you know, when you've reached success and then you can push that success further. It's the same with setting up your data lifestyle. You need quick wins, you need measurable impact on the business and you need a way to report that out. And as you go and fold in more, more services and fold in more capabilities, you get better and faster as you go. So that's the sort of data flywheel that we're talking about here.
So now that I've tuned that up, I'm gonna hand it to Gabe to talk a little bit about how they manage that governance process at some point. So what I heard was couch to data governance. That's what, that's what we're gonna talk about today.
Hi, everybody. My name is Gabe Mica. I am the director of digital innovation at San Diego Gas and Electric. I also support Southern California Gas Company. Um those two companies represent Central California and the service territory that we're talking about is Central California all the way down to the Mexican border serving 25 million consumers. So just a tremendous amount of energy infrastructure, a tremendous amount of data and a huge opportunity to solve some really, really big problems with data and solving some really key problems for our customers as well.
So the story I'm gonna talk about very briefly today is just a story about opportunity. Three years ago, we didn't have a data platform to innovate, right? We didn't have a data platform where we could move really quickly. One of my primary roles with the company is to test out emerging technologies, you know, 10 to 12 every single year. And what we found was our on prem data systems just didn't meet the needs of that.
So we had to retrench, we had to rethink about how are we gonna go and continue to innovate at the speed that we wanted to. And we did that over the past couple of years doing these three things. One building out of greenfield architecture with what we call the data mesh. I think we're all gonna refer to things as the data mesh and um we're gonna build our stories. It'll be pretty interesting.
And the second is rethinking governance and, you know, governance without operating model is, you know, you, you really need to pair those together if you're really gonna do it right. And the third is keeping innovation front and center. So we talk, you know, and I talked a little bit about we're gonna, we're gonna speak to some machine learning, some AI and how we apply that. And that's what we keep front and center because there is a direct tie to our data and the quality and the trust and the confidence that we have in our data and our ability to leverage things like generative AI and artificial intelligence where we can trust what's coming out the other end, so very tightly connected.
And we'll start with the data mesh and the data. And we all know that data can be a company's greatest asset. It can be if we trust it. But we also know that there are trust, we have trust issues with data, right? We have, we have either it's not accessible or there's gaps in the data, we have data quality issues.
So what we've tried to do over the past couple of years is build this up in a green field way where we can so start to solve some of those trust issues with our architecture embedding some of that governance into the technology that we're leveraging.
So what we have here is two sides of the data mesh. We have the data producers and the data consumers and the data producers are really responsible for bringing the data in from those 600 systems across 11 data domains. So responsible for data management, data engineering, data storage, data quality, those types of things and producing good data sets and good data products into the data mesh that can be consumed at the speed that we really need to consume that data.
On the right hand side here, we have data consumers, those are our traditional product teams, those are people who are building out solutions. We're building out analytics, advanced analytics artificial intelligence, types of use cases on that side. And our mantra is pretty simple, integrate it once consume indefinitely. We want to be able to integrate that data into the data mesh in a way that's trusted. So those product teams can consume that data and solve those problems as quickly as possible. And we'll talk about how that's actually accelerating our innovation through that framework.
But we also knew we had to change a few years ago, we had a primarily a small but centralized data organization and we just weren't meeting the needs of the entire enterprise. We weren't reaching the limits of the enterprise. We weren't innovating at the speed that we wanted to.
So we started to move some of these data teams out more closely aligned with the business and focusing our data team on this central hub. So this, we've moved from centralized to more of a federated hub and spoke model and the centralized hub, they're responsible for the data mesh, they're responsible for the platform, they're responsible for the data governance and data standards on that platform and really enabling all of the teams out there that are doing the data management to do it as efficiently as possible.
And what we have around the outside is the spokes.
Those are the teams that are more closely aligned with the business. They're making decisions on behalf of the business. They're prioritizing the right data work at the right time to drive the most value and that's where we want them to be. And that's where we're seeing a lot of really good traction. And we, you know, I was asked to, you know, to talk a little bit about how do we measure success and we measure success at how fast we can actually solve business problems and how fast we can do that with data that we trust.
So uh with our work, we, we test out 10 or 10 or 12 emerging technologies every single year. Last year, we were, we were it, it was taking us about 6 to 8 weeks to complete an innovation proof of concept or MVP. We can do that in about 21 days now because of our, our investments here. And what we're doing at the end of 21 days is we're starting to build an opinion on does this technology actually work based off of the data that we have? Right. How much is this gonna cost from a cost benefit perspective? Is this something that we actually want to invest in? How does it fit within our architecture? How does it fit within our culture? And we're doing that because we're investing in data mesh and everything that I talked about in the past couple of slides, but we're also investing in composable architecture. We're leveraging infrastructure as code. We're thinking of both data and how we build solutions as composable components that we can pull and put together and solve problems faster and faster.
And we don't think that we're going to slow down here. We do, we continue to invest here and we just continue to see the results as being able to solve problems faster and faster. So, really proud of what we're doing there and to put it all in one slide we have starting out with our central central data team in the middle. This is the hub we're responsible for, you know, the the standards driving, driving, um uh best practices and standardization across the enterprise, responsible for the data catalog and governing it.
And on the left hand side, those are those data producer teams. Those are the teams that are more closely aligned with the business, bringing that data in from those 600 systems, producing good quality data into the data mesh for consumption. On the right hand side, those product teams, those solution teams building out analytics, advanced analytics and artificial intelligence. And with that, I think we get to move into our first question.
Yeah. So the first question is um you talked a bit about the people and culture and we teased that in the very beginning, what were the biggest challenges that you saw in building that new data lifestyle, that new data culture at sempra data as a lifestyle?
Yeah, I think the the there's, there's one main thing when we, when we started to move from centralized to more of a federated model, more more decentralized but really federated with hub and spoke. And that's really, we see the success as being able to balance empowerment of those data producer teams and accountability, right? So we've started to align our data teams more closely with the business. They have some more skin in the game, they're making decisions, they're closer to the action. Uh but we also need to balance that with accountability. And from an accountability perspective, we're really talking about data quality. So a big effort that we have right now is focusing on making data quality visible um and making it real for those data producer teams. So bringing that around data quality was the one of the big challenges in getting.
Yeah, it's and still is and it's, I don't know if you, if you're ever done with that quite frankly, but um but I'll, I'll turn it over to me for, for a for a response on that as well.
Sure. Yeah. Amazing story. I really like the way on how you presented that. Um on our end, it's kinda like um a way on how we decentralize the data story to the different application teams and the applications teams on one end basically have to take over additional responsibility of building, creating, managing and working with data, which they didn't have to do when they were like a centralized team which was taking over everything. So for them, it was really one step removed and on our end now, they had to take over that responsibility and changing that culture is not an easy task. And how did you convince them to take that responsibility. It really helps them to access the data and have the ability to use data at the first release of their functionality immediately and measure how they are actually doing from a functional standpoint as well. And with that, they're very motivated to use that to get it in.
Nice, that little data flywheel.
Yeah. Well, that's a great transition to your section Mirco, which is really that that curate portion of the wheel. Take it away.
Sure. My name is Mirco Balzer. Um I'm a VP of engineering at Nota Nota is a leader in self free DNA sequencing. Um last quarter we did um just passed a 10 million sample um in one of our labs at California and Texas. Um and we're working in, in three major areas where we're helping patients um to improve their health uh on the women's health side, organ health side and on the oncology. So you can imagine data is like a key element in our business from day one. And obviously as a small start up, you can like start with a handful of systems. You can like have the ability to go directly to the source, being able to access the data um and leverage that from, from day one.
Um yes, that's not gonna scale. And I think a lot of people here are like in larger organizations where um you have to have the right governance in place um with this, um we moved into kind of like a centralized gatekeeper environment um where we had dedicated teams uh really working similar to what Gabe just mentioned um in, in extracting the data, preparing the data and then also exposing it to the business. Um this worked well in the beginning, but then quickly went into a bottleneck where they were not able to keep up with uh innovating and moving the data to a point where the business was asking for new information and insights where they couldn't really keep up and provide that.
So with that, we moved into a kind of decentralized approach where we had different data domains, being responsible for their own data, uh managing that out of the box and being able to push that into the data mesh that we're using. Um and have the ability to centrally govern the whole aspect of the solution, still to manage the governance aspect of it.
So to be able to do that, you can't just come in and say, oh, let's just distribute everything and we're good to go. Um I think a big aspect and uh has been mentioned before is really changing the culture. And for us, changing the culture was really around uh making sure we can improve the flywheel and um speed up the flywheel um that we are having to kind of bring new innovation into, into our organization.
Um with that, we basically moved the flywheel directly to the application team. Um having them to really go through, learn, grow, fail and improve very quickly iterations by every release that we're doing for an application, including this part and making sure data is like a first class citizen.
So with that, we also changed the way on how we're um architecting our solution. Um there's kind of like two main ways on how today we're kinda like looking at um architecture approaches. Um one like the data fabric approach where we're centralized and then the data mesh approach where we have um all the the responsibility in the different domains. But then the centralized approach here, for example, we are using AWS datazone to manage the catalog and the access to the data, which helps us quite a bit to just get the governance aspect. Going.
Another big piece specifically in the uh ingestion um is really to change the way on how the application teams are managing aspects of the ETL. Um so the extraction and transformation um is very dependent on the application that you have. So um in a traditional way, you basically replicate the data, you run through an ETL um uh pipe and then you move it into your um data store. Uh this is usually managed by the central team, but we didn't want to have that dependency from the application teams to that central team.
So what we did is really moving the extraction and transformation into the um application teams as well. Um and for us like the push data ingestion approach um had a lot of advantage to do that. So when we look at the traditional pole approach, um we use that for off the shelf solutions, maybe some legacy systems that we have um still kind of like a traditional ETL. Um we can use AWS DMS to extract the data, use data pipelines to um transform the data to a to a way we want and shape we want and then move it into the, to the data store.
Um this is scheduled and also um centrally managed. So we try to avoid this as much as possible. But there are still some systems that are basically working that way. The way we want to implement it is really to move the responsibility of the data, the schema, the integrity of the data itself to the application teams have them basically push events to uh uh event bus in our case, Sachi Kafka and then also work through Kinesis and Lambda to allow them to do some last minute transformations if needed to then push it into the data store.
So that's kind of the approach we're taking um using AWS crawler and glue to create the catalog and then expose that through athena and Redshift. So that's like the key aspects we change from a cultural standpoint as well.
Um we are not just providing data internally for our teams and our business operations side. Um we also have um data as a product available, which we are exposing to our partners, be it pharma um where we do biomarker discovery, where we do um patient clinical trial matching or real world clinical response. So a key aspect of this core data that we're ingesting is also to create that domain where we have data as a product where we can expose this then to, to our partners and, and um other uh organizations we work with.
So as a summary, kinda like what we really kinda like did to improve the innovation flywheel um is really around moving the responsibility from the central team to the application teams making sure they treat the data as a first class citizen. Um they're the owner of it and the first users of it. So you basically get high quality rich data out of their systems directly. We then introduce centrally the governance aspect with um datazone and have the ability to really manage access and the catalog of the data that is available in our organization. And this helped us to really improve that flywheel and be able to um create uh valuable insights that we can then use either internally or with our partners.
Perfect. So um Marco, we talked about part of that create goal being fit for use data and making sure that it is good for the for the application it's being exposed to, can you talk a little bit about how did you, how did you measure that? How did you define what good enough is?
Yeah, it's, it's, it's usually pretty hard to like figure out what is good, good enough. So um one approach that we're thinking of is really moving that responsibility to the application team. The application teams knows the data the best. If they're working with the data, day in day out, we basically are making sure that they are producing the best data at the source. Uh there's the famous word of garbage in garbage out.
Um same thing happens there. If you have not good data at the source, you're gonna be trying to improve it over time, but it's always challenging. And Ren I'm sure you have some thoughts on this fit for use and how to define it. Here is the remaining transcript formatted for better readability:
Thanks for this good question. Um so we are BMW, I mean we have a very big organization, approximately 150,000 employees. And so for us also important gap, you mentioned data quality. Can you trust the data? Can you work with that? And for us fit for use data means that we really see it as a product, as a premium product BMW as premium car manufacturer. So we also want premium products and um we built a one stop data shop which we call the data portal. It's somehow similar to the datazone. I will show it later and there you can shop data, how we say it. And um it's clear for, for the users. Um what's the data set about? So it's all maintained and curated by the business. So we really involve the business here.
Um it is um you can see what's the data about. You can see the data quality checks that we already run. You can even pre the data. And this is where we think it's really fit for use. Does it fit my use case? And this is then how we want to make it available for our business?
It sounds like you're not talking about a central policy that's defining fit for use. You're talking about transparency to those end business users and then they have enough information to make that decision. It's mainly about the transparency. So is it really fit for use? Can I use it right away? And I will show it later in my section for how we do that and how we somehow enforce this data governance, but also give enough flexibility to our product teams.
Perfect. So why don't you take it away and take us through that? Understand part of the bill?
Yeah, thank you very much. Um so I think I hope everybody here heard of BMW.
So it's one of the biggest car manufacturers in the world and I will tell now the story how we it seems like a pattern. So from Gabe to me, it's they also thought about ok, what is central, what is decent? Do we follow data mesh? Do we follow data fabric? And it's nearly the same story for us. So different company sizes, but it's the same, the same story for us.
So how did we start a BMW many many years ago, we also started with the on premise data. Like I think it was the same like like you did, we had one central instance, one central big Hado cluster where we built our data lake, we had several teams working on that. So the teams were just the data from all the data sources, data engineering teams who build the data products for our use case. And yeah, you can imagine on this one very big instance what happens?
So first of all, we have shared resources. So the people working on that a central team but we had very very many business requirements, many use cases that had to be built, but we couldn't fulfill the requirements. Our capacity was not there. Point number one point number two, it's hard to scale. So if you have all these use cases on this platform, how do you scale it? How do you react to the demands in in the computing? And then also how do you anticipate when you want to upgrade the software or something? Will it break a pipeline for a different team? So this is something where we said innovation, somehow it happens outside of our platform. We don't want that. We really want to have a good offering for our users, for our customers. We said, ok, we need a different solution.
So this was more the centralized solution. And then we said, ok, we have to do something different. And this is when we started our journey with AWS some years ago. At that point of time, we didn't have Data Zone as a service just to keep it in mind, what we wanted to have is innovative freedom for our data providers and for our data consumers and the data mesh concept. But we also need somehow a strategic standardization. It's a big company so fit for use data.
So how do we trust the data that we have? How do we really make it data products? And this is how we how we did it for data fabric concept, we provide a platform. So it's a big platform. Yeah. So we call it Cloud Data Hub and on the front in front of it, you have the data portal as the main interaction for our users. We have 10 p 10 petabytes of data, 6000 users. Um yeah, you can see the scale. You can even imagine all the telemetric data which is coming from the cars, we have production data, we have custom data, everything.
So somehow we need also to follow the governance approach. So this is why we say we have the data portal where our um the data products have to build in a certain way. So they have to be published to a catalog that you can search the data. And then when you found some interesting data for you as a potential consumer, you can see also the data lineage information. Where does the data come from? Who else is using it? Um we have a so called SQL app where you can even preview the data so you can have a look at the data. Is it really the data fit for use that I need? And I would say try that with an SAP solution, it will not work.
Um and then of course, we also have a governance processes on top of that. So you can access the data in our data portal right away and you can do all the governance processes here. And so the data governance, yes, it sounds like not the most appealing job, but for us, it's really important. We don't just offer like hardware or something. It's just the center of our concept is the data governance and so that we can enable the data products really for the whole company, it must be transparent. What's the data set about? And can I use it? Is it the right for my use case, we also offer some some optional um um services like the data ingest here we offer like blueprints for the ingesting part, they can use it, they don't have to use it but all the other services, they, they have to use it and then we combine this data fabric.
So in the middle, you have the data fabric concept, we combine it with the data mesh. So on the left side, you'll see the providers, on the left right side, you see the consumers and the providers and the consumers. They know best how to ingest the data, how to build the data products and the consumers, they know best how to use the data than in the applications. Do they need an a p? Do they need it for reporting? So for the providers and the consumers, we offer them really the maximum of flexibility they can have, but they still have to follow this central data fabric concept, the data governance and so that the data that we have the data products are really understandable for the whole company.
And now I want to show you our data port how we did that. So it's all built on AWS and the data portal that I'm going to show you is our one stop data shop. So you can just see the data, explore the data and then get access to that.
So here what you see here is the the landing page for our data portal. Um and you can just every, every BMW employee can log in and you can then search for data. For example, here, vehicle sensor data, you can see then what are data sets that are already there. What are the use cases and providers um that already use the data or provide the data. And then of course, you get the list with the proposals. But the thing is still we have now over 6000 data sets. So how to find the right one?
Then in this screen here, you can see it's a little bit different view data sets here. You could also see it for the use cases and providers. But you can see the data sets, the data set name, data set description, we cluster our data set in business objects. So is it purchasing data? Is it customer data? We also have a three layer concept which is source prepared and semantic layer and semantic is then really the fit for use data for our for our business where the business KPIs they are defiant there and you can then just use it right away.
You can see is the data set released? Is it deprecated? Is it still planned? So all the questions that you need to answer the question, is it fit for use? And then here's a more detailed view. So one example for one data set, so you have the data set name, you have the description there on the top right, you can see the data quality checks. So we also have automated data quality checks. You can see the metadata store all the meta data. Is it well maintained or not?
Then you can see in the in the um also how many consumers do already consume this data set? In this case, it's 13 and then you have different tabs like the summary tab lineage resources. So in the summary tab, you can see who are the data engineers who built the data set, who are the data stewards. Though these are the business functions who are responsible for to curate the data to give you the access. So these are the contact persons for our users to to get additional information.
And um yeah, then if you want to understand more about the data set, we also offer the so called data lineage. It's also automated. Um so it comes via the the ingest, it's all fed by API and here in the middle, you can see the data set on the left, you can see where does the data come from the data source? The blue boxes that you see are all the jobs that were built to create this data product. So really on table level, which column was dropped? Which where was the filter applied? Where did we apply joints? And then in the middle you can see it's mainly the Glue Catalog that you see the databases, the tables and all the information there. And then on the right hand side, you can see then who consumes the data product and and then how were the transformations and they did for this data product?
And then last but not least so what, what are we talking about here? It's a very big data, big data platform that we have. Um so it's again, all built on AWS. We really enabled the company to break up the data silos. So um um the data sets that they use the data products, it's 30% reused over different divisions. So production uses sales data and the other way around. And we created with this data platform, 1.9 billion business value already. We have 6000 monster users, 20,000 annual users on the platform. And yeah, this is how we did, how we build our journey. It's quite similar to G and Mirko. So question centralize it, decentralize it. And i think, and i hope you get a good idea how we did that at BMW. I think.
So part of when we talk about the the understand phase is understanding the business context and you showed a lot of metadata, how did you capture all of that metadata to establish context? So i think we get a good understanding of how do we capture the business context in BMW. So um most of it, it's just um curated by our business functions and it's um it's mainly still manual, we will have a um outlook later, how we want to automate that. And um it, so the lineage information, the data quality checks, it comes all automated from our jobs, it's all um yeah, like API driven and ok.
And Gabe, how do you capture any of that um context for the data that you gather? I was just mesmerized by, by what Ruben was saying, i was, i was gonna jump to the next question. Um yeah, i mean, i, i'm gonna, i'm actually just gonna tackle that, you know, i, i think, you know, a lot of what we're talking about today is we're building these solutions that are fit for purpose. I mean, getting back to the fit for use, comment. Uh and i'll talk a little bit about, you know, what we have here, which is build versus buy. And every time i hear that i'm thinking, how many times, you know, have we been purchasing software off the shelf that doesn't actually meet our needs? Right. And we end up over customizing, customizing, customizing that still doesn't meet our needs and we have to buy something else and it becomes overly complex really quickly.
Um so when we talk about building um things that are fit for purpose, fit for use in AWS, primarily we're, we're, we're leaning on serverless architectures. Um that's really where we wanna be, we're building things and i don't know how, how both of you feel, but we're building things that are solving our industry specific problems um because we have the ability to build things that are fit for use, right? And no more, right? We're not, we're not dealing with a lot of that extra stuff that you buy off the shelf. But um thoughts on that Ruben.
Yeah, i think it's what you mentioned. Yeah. So do you need the flexibility or do you need standardization? And when we uh started our journey in BMW, um a lot of features that you saw today from our data portal. It's quite similar to Data Zone, but when we started Data Zone was not available. So we started this journey together with AWS and um yeah, so the question was, do we want to buy it or do we want to make it? And we scanned the market and we didn't see a proper solution for this whole cataloging and an end to end journey for our users. So we decided to build this data portal. And so the question is always, yeah, do we do we still want to continue this journey or not? And we can reflect, do we still want to use Data Zone? But the point is what happens for example, if then we have to go multi cloud. So we also have to use Azure or something. And then is it integrated with Data Zone? So i think we will still need somehow the flexibility with that. We built something on our own. But of course, we also reflect um is it worth it or is there a service coming up that also covers the requirements that we already have? And part of that you guys worked on some cove on Data Zone, right? And fed up a bunch of requirements in as well. So yeah, so we are doing collaboration with AWS here. So the some of the ideas we shared with the Data Zone and our data portal. Yeah, perfect. Ok.
Whereas in this, this data, we're feeding into our analytics, advanced analytics artificial intelligence models. And how do we know how you know, do we trust that data? Do we trust the result? Do we trust the decision that's coming out the other side? So I think it's extremely important to see data governance to increase trust. So ultimately, we can move faster. If I trust that data, I can spend less time checking it and I can spend more time in solving the actual business problem.
How are you guys capturing that trust at all or, or, or understanding if people are trusting data, I mean, I think, I think, you know, if you ask the data scientists, you know, they've, they, they love spending time in the data and they spend 80% of the time wrangling the data and 20% solving the problem. And we're really actually trying to ss swap that with what we're doing with, with data governance and spend less time wrangling. So what we're really trying to get to now is like, how can we flip that? How can we spend less time wrangling the data because we trust it and spend most of our time solving problems and moving the business forward.
Ok. So the next question there is, we've talked a bit about, you, talked a bit about new business models already and treating that data as a product. Um can you talk a little bit more about, about how you go about that process and what you need to make sure the data is ready for that move?
Yeah, I think in general we, the initial business model of narra is really about the clinical tests that we're doing be it in the women's health, oncology, space, organ health, space building, collect new products where we can help one individual patient, which has a question about their health. So that's kind of like the initial business model that we are using. Obviously, when we are getting millions of samples in house, we now have the ability to create a whole cohort of different patients. Using that data across the patients, being able to actually find patterns, find biomes, find other insights in the data that we can then expose and use to provide to pharma companies or others which are doing drug development or help individual cohorts of patients as well to be part of a trial, for example.
So in our case, it's really about leveraging the data that we're doing on one end or creating on one end for our individual patients and the tests that we're running and then being able to use that for greater uh good to be able to help collect the the health uh insights that we're generating from there uh as well. So that's kind of the the aspect that we're using to create those specific um data as a product uh models where we are working with different partners.
And of course, then when the data is in a slightly different use case, you have those um the protection part of the wheel, right? Making sure that the access is right and the compliance.
Yeah. Yeah, that's very crucial. Making sure we are anonyms, the data, we are basically working in a cohort environment where this is not like patient specific, this is really across multiple patients where we're analyzing that data. So very crucial aspect there to like work with hi a and p to be able to extract that data in a way that we could work and expose this then to the different partners.
And then the last one here for you, ruben, you teased earlier that you had some things to say about generative a i. And we talked a little bit about generating all of the metadata on your side. Do you wanna talk a little bit about the bmw approach?
Yes. So um do you want to go to the next slide? So I will, we will show you now. Um we also had heard of of ja i. So this is the data portal that you just saw before. And what we want to introduce is maya, we call it the maya i assistant. And so this will help our business users um to, to really explore the data that we have in the data in the cloud data and the data portal. So we have so many data sets there. So how to find the right data and this is how you can interact, you can ask the question. Ok, please give me all the electric uh car uh sales in 2022 for a specific month and then it will return you um um the secret query and also like some, some graphs and tables that you can use right away. So it's all based and trained on, on our, on the data that we have. It's also trained on the, the documentation and the training material that we have. So this is the moonshot that we have how we want to enable our users to better interact with our data products.
Um and yeah, also one part of this will be that somehow the the all the meta data that you saw in our data portal. Um it's still a challenge to maintain all this data. And um mayer should also then help to, to maintain it to make proposals or even make it automated to, to update all the metadata description. And yeah, that it is then fit for you so that, that you can really use the data and it's um it's of good quality and that it's clear what's the data product about. And then this is our our mood, how we want to enable it still in development. But we just launched the first release of our maya and this is our outlook, how we want to enable then the governance functions to be more automated and more hip why is it called maya? it's my a i assistant. it's just coincidence and um but it sounded like good, nice.
Ok. Well, i promised you guys that you would hear some best kept secrets from this group of things that you wish you knew when you started your data governance journey.
Um so i'm gonna start, we talk so much about data governance being a supporter of the funded business initiatives that you have at your company. And one of the tips i got really early on, we were working with many customers in workshops to define their funded business initiatives. But sometimes it gets translated to what is your business value. And a lot of you in this room have probably done this as well where your business value is things like speed agility, moving to the cloud. That's not a funded business initiative, funded business initiative is separate. What that means is there's a spec specific funded business initiative that shows up in your, in your company filings or in, in your company goals for the year that says this is how we want to move the company forward with either an improvement to customer experience or launching in a new company or investing in smart agriculture um to, to change the way that you're behaving in that market. That is your funded business initiative. It is not speed, it is not agility, it's not adaptability. So it's that subtle switch that i think is really hard for people to understand in the beginning that it, that it's not about the it goals, it's about the funded business initiative goals.
What about you, gabe? What do you wish someone had told you in the beginning?
Yeah. Uh i mean, three years ago, we were, we were firefighting, we were trying to, we were centralized, we were trying to solve all the problems for everyone which we knew we just couldn't do. And it was really hard to take a step back though. I i think the, you know what, what, what i wish i would have done then was slow down, like slow down, take a step back. Understand, you know, what's the actual data platform that's going to meet the needs of the organization? What's the operating model? How are we going to govern this? Right. And have a very clear data strategy about moving forward and we'd be much further along at this point. But that's the way it goes. But it's, it's always hard when you're in the throes of everything to take a step back and really understand is this what we should be investing in, is this the most valuable thing that we should be doing? And having a clear data strategy is, is is only gonna help that.
Yeah, i think, i mean, for notara one of the core values in the company is, is show me the data and show me the data really goes back to like obviously from the clinical test to like figure out do we really have enough proof that this test is working? And we can we can apply that to, to the the public. It also includes like show me the data for any argument you're doing in the organization, right? So if you're just coming up there and saying, oh, i want to do this and this and this, but you don't really have the data behind it. Yeah, it's not that much of a value. So for us, show me the data is a key aspect. And with this, the best kept secret is basically the data culture in the company itself, making sure that each one of our team members thinks about how we can actually measure certain things, how we can expose that, how we can use the data to then help everyone else.
Yeah, perfect. Yeah, it's a good point. But uh what nico mentioned, it was similar to us. Um yeah. So what would i have been known before we started? Um i start again. So it's the same now really involve the business into the topic. It's not an it project data is not it project in this case. And it's we can even still see it when we on board new customers to our platform and show me the data. So we tell them uh you have to publish your data, our data catalog. Yeah, you have to make it available for the sql lab to for others to explore it and it's still for some business departments. It's a challenge. Yeah, but this is how we enable with our platform, this whole data transformation. So in the center of it, it's the data governance. It's really to make it the data transparent to make it as data product. And this is something, yeah, we we learned uh over the years involve the business and the transparency and data governance.
It sounds boring, but it's really important to make this whole data transformation. Otherwise it will just be some storage blob storage somewhere on the on the drive. Yeah, it it doesn't work. So really to create the value you need to have somehow this data governance framework or data strategy you mentioned.
Um yeah, perfect. So we've given you a lot of advice, but let's talk a little bit about how you can get started, what you can do next. So there's a new master class from aws on data governance. This talks a lot about that funded business initiative part those cycles of the wheel and the capabilities that you need to build along the way.
Um it's an on demand video series. They're short, which i know everyone loves like none of them are longer than 15 minutes. And there's also a workbook that goes with this to give you that gives you hands on exercises that you can do at your company to establish and elevate your data governance practice. So that's a great place to start.
Secondly, we also offer a data maturity assessment. Um this helps you assess kind of where you're at and across all of the capabilities that um understand curate and protect. There's also recommendations that are involved there. This is offered from the, from the services arm. So if you talk to your aws ser professional services group and they can get you started on this state of maturity assessment.
They also have workshops that can deep dive into, into data governance specifically at your company there is a new aws workshop and there are some workshops at re invent as well. That is a hands on pragmatic approach to data governance. So they're probably full but there's usually some waiting room reserved. So check out the data governance workshop um on your re invent calendar.
Um and with that, here are the other re invent sessions that also apply to data governance, make sure that you have favored them or that you can, that you can drop by some of them. Some of their uh like i said, are those hands on workshops. But there's also choc talks where you can ask specific questions of the presenters and that's a much more interactive, more intimate sessions. So that's another place to go.
And now see we've ended early for one reason, you guys fill out your survey, you have two minutes to do it, make sure you're filling out the survey. We'd love to hear your feedback um on the content that we've delivered as well as this new sort of panel format for us.
Any other closing remarks from you guys? How do you fill out the survey? What do we have to do now? It's in the mobile app. So open your mobile app and fill it out. Here is the remaining transcript formatted:
Um so we're available for questions after as well as there are a number of people here with the orange badges. These are people from aws if you have questions about datazone or some of the other um solutions that we've talked about um find these people in the hallway after the session and they can answer some questions for you too.
Thank you guys.
Thank you.
Thank you.
Mhm.