Understand your data with business context

Hey everyone, welcome and hope you're having a fun day at the first day of the conference. It was nice to chat with many of you to understand why you're here and what you're planning to learn. And one thing that has come out from all the conversations is we have figured out where and how to store the data and across all the different storages that are available. And one big challenge that we're trying to solve is how do we make this data available for everyone across the across the organization? If this resonates with you, can i have a show of hands?

So see. Oh, that's good. Ok, you've come to the right place.

Um in today's session, we are going to talk about amazon datazone and the capabilities it offers. Uh amazon datazone is a new data management service that we announced at re invent last year and it has um been generally available since october 4th 2023. And what amazon datazone does, it, it helps you build this active metadata layer across your data. So every user across the organization can get to the data portal. It's like a data marketplace of sorts where they can find, understand and subscribe to the data that they want to use for the analysis.

I'm priya taani, senior product manager with amazon datazone. And with me, we have leo and gwen joining me to share some of the details that we want to talk to you about on amazon datazone.

All right. So what are we going to talk first? Let's take a look at um why data catalogs and how it has evolved over time. And then i'll dive a little bit deeper into amazon datazone and uh specifically on the enterprise business data catalog capabilities. And then we'll see how it actually works in the product. And finally, we'll have a customer story where we'll hear from gwen um about how naira is trying to build a solution where it, it's trying to make its data available across all of the users. And then we'll have a few minutes left for q and a.

All right. Let's see where their catalogs started. I think 19 nineties, two thousands, we all had data catalogs. It's been there for a while and predominantly it started where companies wanted to keep an inventory of data, they wanted to know what data existed. And usually it was tasked with the it themes just to know what data is available and it was more a passive collection of data to know what's there.

And then came the big wave of big data in 2010. And this bombarded those uh data catalogs with data that varied in speed size formats and accuracy. And this challenged those data catalogs in a way that, you know, it couldn't expand or adapt quickly. It took longer implementation cycles and eventually saw, saw lower adoption uh across its users added to that because of the variety of data that's coming in there. We saw the rise of data stewardship because we wanted governance around the data and also the ability to build some bus business context. So everyone understands the data. But that was not enough.

In recent years, we are seeing that the data catalog has to be active, has to be on at all times. And data metadata is becoming a data itself where you want to actively manage and allow collaboration for its various users. So they can collaborate with other users across the platform, understand and and and build applications based on the data that's available.

So what is it that we need in the current business data catalogs? First catalog is no longer restricted to structured data. It's no longer restricted to rows and columns but it needs to have data of various types. It can be a dashboard, it can be a sequel, it can be uh uh just a link or a job, it can be anything and to have that data, people can come and find it and use for every anything that they want to take forward in their analysis.

The next is metadata has to be on at all times and the metadata has can be searched analyzed and it has to be maintained like data itself.

The third need is for it to be uh provide that end to end visibility and embedded collaboration for its users. So you know where it originated and and where it's being used.

And finally, it has to be api driven where your data platform can bring in the solution that you need can build that integration between the services that they have and also have that seamless capability for all the data users where they are able to find the data, understand it and quickly switch, switch to the tools that they want to use to continue the analysis.

And that's why we built amazon datazone. This new service helps you build that bridge that gap between the data producers and the consumers. The data producers are the experts in data and own the data and they, they have the autonomy to decide what data they want to share with the rest of the organization and how to enhance it and enrich it for everyone to understand it.

And then comes the data consumers who can search, understand and subscribe to the data and get approval from these data producers to use it for their analysis. And that builds the data fly wheel where the data consumers can also become data producers. And this is what amazon datazone tries to do. It helps to bridge that handshake between the data producers and consumers.

So let's see under the hood, what what capabilities are built in amazon datazone to provide that handshake that you need between the data producers and consumers first is the domain. So what is the domain domain is nothing but uh how your teams are structured, who, who owns the data, who understands the data better? Because that forms the container of showing the ownership of data and where the data is originating from.

And in that domain, one of the foundational blocks is the business data catalog. The business data catalog helps you uh catalog all of the data. So anybody wanting to know if, if your organization has this data can go search for it and look for it in one single place.

And when the data, people want to collaborate, they, they form this container of projects and environments which is pretty much tied to a business use case to collaborate with data people and tools where all of your permissions and access is managed for those uh for the data that's been requested.

And all this not without the governance and access control, you want to make sure the right data is accessed by the right person for the right purpose. And datazone provides this governance and access control where the data producers get to know who's using what data for what purposes and also have control over how they can um access data and um and continue the analysis.

And all these capabilities are available from a data portal which is like a central marketplace for all of your data where all the data people can log in and and go either be a producer where they can produce the data and share with the rest of the organization or they can be the consumer and go search and find and also quickly switch to the tool of the choice from datazone where they can continue the analysis and goes without saying if you don't want to use the portal, we also have api s that helps you build the programmatic uh layer to integrate with your existing processes and also with the other services that you that you have in your platform.

And today we are going to dive deeper into the capabilities of the business data catalog because we are focusing on how can we make this data visible and accessible for all the users across the organization.

So core themes that the data catalog provides from datazone is first, you should be able to catalog all of your data. And in datazone you can actually catalog bring in all the data that you have and then the producer can decide if they want to publish it or not. And when i say publish, it actually makes means that once data is once the data is published, anybody in the organization will be able to search it. But when it's an inventory, it it is just a record for you to know that this data exists and then it might not be time for you to publish it. So you can publish it at a later time.

And once you have the data catalog, you can enhance it, enrich it and augment it with so much of more information that it makes it valuable for all of these consumers looking to understand data. So think of uh you know, going to amazon.com, when you go look at a particular product page, you understand everything about the product before you make a decision, whether you want to buy the product or not. Similar is the experience that we are looking for to provide all the data consumers who want to work with data and understand it and use it for their analysis.

And added to that, we, we are focusing on introducing a lot of automation that alleviates the redundant manual work, which is very error prone and also speed up the process because if it, if your assets are tens or hundreds, you can still do things manually. But what if it becomes millions, it becomes unmanageable and your data assets might be incomplete. So we are introducing a lot of automation capabilities that helps you build the context for the data. So everyone understands it across the organization, then comes the need to manage and maintain and also make it visible across the organization.

And for this, you, you have all the foundational blocks that helps you make an asset understandable and also helps you build some standardization for the assets where everyone knows what to look for and also knows whether to trust the data or not.

So I'll dive a little bit deeper into what those foundational blocks are. But datazone provides you all these building blocks so you can build a catalog in a way that the data consumer can understand the data and work with it.

Next is the embedded collaboration. So yeah, we have the data, we are publishing it, making it visible and accessible. But what are they going to do it? Like how are they going to take it forward? Eventually, everyone wants to work with a tool of their choice and they want to take it forward for the analysis and how can we build that seamless workflow between um searching, subscribing and moving to the data, um moving to the tool of their choice so they can work with data.

So amazon datazone focus on building that integrated experience. So the consumers don't have to struggle in understanding a lot of technical capabilities, but focus more on the work that they have to do and and and switch the tool of their choice.

All right, we spoke about the building blocks. Now, let's take a look at it. So enterprise wide business data catalog needs some foundational elements.

First is the organizational domains. You want to organize your data in a way that everyone understands where the data is coming from, which team owns it and also who's going to curate the data in a way that everyone understands it and trust the data.

Next is the business glossary that helps you standardize the lot of techni technical mambo jumbo into business terms. So everyone understands it in simple language, metadata forms is something um that we've introduced in amazon datazone which gives you the power to bring some consistency for the assets that you want to publish.

So think of again, go to amazon.com and look at when you look at a product, you always look for the manufacturer details. You want to see who manufactured it, you want to see the product details. So those are the metadata forms that can be added to your assets. So everyone knows what to look for when they're searching for an asset and understand where it's coming from and is and make a decision if this is the data that i want to work with.

So that's the power that metadata forms gives the uh data producers while they're publishing an asset. And finally metadata curation, you you bring in a lot of technical metadata into datazone and the more you curate it, the more you enhance it with a lot of business information, the more information the consumer gets to make a a very good judgment of whether they want to use the data for the analysis is that even the right data for the analysis that they want to drive and, and, and take it forward.

So let's see. First is assets you bring all the assets to um to datazone, you catalog them. And now with datazone, you not only can um catalog the structured data, but also you can structure the variety of data that's available. There be a dashboard um sql query links and you can build the business workflow around them.

Next is a business glossary where you can, we can standardize the terms across the teams because an account for a marketing team means one thing, an account for a sales team can mean something else. So this bridges helps you bridge that definition for what an account means. So when you know that data is coming from a particular team, you can understand it better.

And finally, the me of forms is what helps you dive a little bit more, add additional information as an admin, you can determine what is it that you need as part of every asset that is being published and make it available for all the publishers to bring it as part of an asset. So they can um they can have it and publish it. Like think of me of form like ownership. So you can have who the technical owner for the data and who needs to be contacted for more information. So you can have that information as part of the asset. So any consumer finding the data can go uh look up where it's coming from, whom to contact. And also, you know, reach out to the person if they want to learn more about the data and all this that you're building across these assets. Glossaries, merita forms are searchable and helps you uh you know, build filters in your search.

So you get the right subset of results for you to dive deeper. Yeah, this is just to quickly show what kind of assets you can catalog. So you are no longer limited to just the the rows and columns, structured data. But you can also now um catalog a lot of other types of data. Be it an ML model dashboard uh SQL query, build it in the catalog enrich with a lot of the business metadata around it. So people can find it and take the dashboard that's already built to enhance it further for a particular business use case or an ML model for any further analysis.

So this is how um a particular asset will look like when a when a consumer searches in DataZone. First, you will have the business name and description of the asset. So it helps any consumer to know what that asset is and what more details of the asset from the description. And then you have the glossary terms, um you know, classifying it or grouping them in a way that you can quickly search and understand the data um whether it's usable or whether you want to take forward for analysis. Then is the technical metadata you want to know where it came from. So this is derived from the source. And um and so the consumer can understand all the technical information about data data.

And finally is the metadata forms where you can standardize across all of the assets that being published in the catalog. So everyone looking for data can go find it, understand it. And in this particular example, we see the ownership metadata form that gives you the details about the owner of the data. So if you have any questions, you know whom to contact to and not wonder which team do i have to reach out to. And of course, we have a versioning introduced in DataZone. So this helps you understand the versions and uh and also know that which version you are using for your analysis.

So all that we saw looked very manual, right? Like there's so much work that happens behind it. So what can we do to make it simplified? The first thing that we introduced in GA is the automation that helps you uh bring the technical metadata into the catalog and we leverage the strengths of data, um the AWS Glue to, to have those crawlers go to the technical metadata, bring it to us and then we, we um share it with the users or the producers who can build that business context around it.

And then um in during GA, we also introduced the auto automated uh name generation for our assets and columns. So in a way that you don't have to manually go enter the business names for the assets or the columns. But DataZone takes care of it while you're ingesting to automate that name and for you to review and approve it if you think it's right or edit it and, and uh you know, make the work simple for you to um enhance the description of the assets and the columns.

And one thing that's coming cool, tomorrow is look out for Adam's announcement where we'll we'll be sharing more information about the additional automation that's going to get introduced in DataZone. And this is one space that we are investing a lot. So a lot of the work redundant work can be eliminated and a lot of automation can be infused, especially with all the genetic capabilities that AWS offers.

Now we saw with all these building blocks we are able to bring data in, we are able to leverage the strengths of um the Glue Data Catalog where we are bringing the metadata from. And also we leverage the um the richness of Lake Formation to provide a lot of the access um permissions and then um and then have that applied to the data. So any team that's working together in a project can actually get access to the data. And as and when you add or remove people from the project. They either get access to the data or um or um not have access to data.

And then the consumers can share the data with their team members and, and do their analysis and, and also become producers at the end of the day. So now we saw what DataZone offers as a whole from capability perspective. So now let me invite Le Leo to show how it actually works in the product. So you can see it work end to end.

Perfect. Thank you, Bria.

Hi everyone. Um well, Priya just uh walk us through everything that DataZone offers related to data catalog. Now, let's see how to do it. Let's see a demo. Let's see the service in action. Ok?

Uh before starting um with the demo itself, let's let me walk you through the use case. We have uh two personas. In this case, we have uh two teams, uh the traffic team and the trials team. Why? Because we are going to uh analyze um uh traffic ticket data. Ok? A very standard use case.

Um so Julia, the data engineer is uh the data producer. Ok? She uh wants to um uh create their own uh her own business glossary or business glossary terms. She wants to uh document uh the data asset or the data set that she is about to publish. Uh and also she wants to publish the data. So once she harvest the metadata and he and she enriched that metadata with business information. She wanted to share it as part of the uh the enterprise data catalog.

And then we have uh we have Susan, Susan is a BI a specialist and she wants to build a dashboard. So the only thing that she wants and it's very simple is to um uh search and identify the right data asset for her use case. In this case, uh could be a, a SQL query, it could be a dashboard and so on. If you and in the middle of course, data working uh to help data producers to pro to document and produce data and there are consumers to find uh the right data asset and consume it. Ok?

So let's go to the computer and see the demo. Ok? So let's start uh from the um amazon data home page. Ok. This is the data portal. Remember that data offers out of the console experience. So you don't, the only thing that you need is uh as a consumer is a url that is going to be generated by data. So and your users using SSO they're going to be able to access the data portal. Ok? No IM users, no IM roles, just the data portal link and the SSO user and you are going to be able to access this, this UI ok.

So the first thing that we have to do is select the, the project that we are going to use to produce the data asset and then go to to the catalog. Ok? So here you can see uh three different options, uh browse data, catalog, glossaries and metadata forms.

So le let's start from uh with glossaries. Ok? So here you have all the business glossary for, for this example, i create uh like 20 ok? You can have it even more and you can see on the left side of the, of the screen, the list of the business grocery that we already create.

So if i want to see the details about one specific business glossary, i just want need to click on top of it and you can see the description of the of the business glossary. Ok? Also you can see uh the terms that are part of that business glossary with uh descriptions, ok? If you want to go deeper on any of the terms you can click on, on any of them and you are going to see a read me section, ok?

So as you can see here, you can put uh whatever information you want. So if you want to enrich the the the description of you have your business grocery term uh with some documentation that is going to help the consumers to understand the meaning of this term, you can do it. Ok?

Also uh you have the option to create relationships between terms. Ok? So for example, uh for this demo, we have verdict and also we have we associate that term with uh trials. Why? Because verdict is, is, is a, is part of a trial. But also if you have two terms that they are very similar, um you can associate both of them. Ok, perfect.

So what happened? uh right now we have just 20 or more than 2025 business grocery. But in the future, you may have 100 or maybe 1000. So if you want to search for a specific term, uh you can also use this um search and uh uh feel that we have here identify, search for a specific term and see the information.

So if you see here, uh you can see the description and if you want to enrich even more that uh business glossary term as i mentioned before, you have the read me and you have the, the rhyme section and also you can create relationships between different terms. Ok? This is the way to do it. Perfect.

So we saw the list, we saw how to search for for a specific um uh data asset. Now let me show you how to create a business glossary. So we click create, we put the name of the glossary. Of course, in this case, it is going to be classification and then uh we need to select the project that is going to be the owner of this business grocery. And then of course, the description and you have the option to uh create it enabled or disabled. In this case, i'm going to leave it enable and i'm going to create the the term the the glossary.

So as you see here, the gloss, the glossary is empty. So we need to start adding new terms. So for this demo, i'm going to add four terms. So um four classification in this, in this case, the first one is going to be confidential. Ok? I'm going to put a description and i'm going to create the the term and we are going to repeat this operation four times. Remember something. Uh this is something that i'm doing manually right now, but you can automate this using the API s ok. Everything that is supported by the UI or is you can reproduce it in a in a using the API s ok?

So now we add a sensitive and restricted again with uh the description. Perfect. So now we have our new business glossary term as you can see here, sorry, our new business glossary with all the terms and all of them enabled. So you have the option to enable or disable terms depending on your use case. Ok?

So now let's talk about metadata forms, metadata forms are going to allow you to enrich your technical metadata with business metadata. So here the thing that we are going to do, i'm going to show you in a minute is to create a form that is going to be attached to your data asset in order for you to add business related information for this demo, uh i already create two that we are going to use uh during the, the rest of the demo. But also um we are going to create a new one, a simple one, but just to show you how to do it.

So as i mentioned before, you can customize your, your uh uh metadata forms uh in whatever way you want. So you can add uh the uh uh a good number of, of, of different fields. Ok? So as you can see here, you have um um um the, the data source, the uh an example of usage um the data source and so on. Ok.

And something that you can do here is that you can create um fields that could be a string, could be integer, it could be dates

"But also, and this is the one that we are showing right now uh uh a field that is um the type is glossary. What this means is that you can link this field to a glossary that you already create as part of your business glossary. Ok. So how this work uh the value that is going to be uh uh selected from uh to be part of this field, it should come from a business glossary term. Ok. You are going to see it in, in a minute as part of the demo but just to let you know that metadata forms is one of the ways that you have to associate business grocery terms uh to um um data asset. Ok?

Perfect. So let's create our, our uh metadata form for, for this demo, we're going to adjust two columns to to make it uh uh fast. So here we are going to add um data classification. We put the name but also um it's asking me for a technical name. Why is this because we may want to change or manipulate this metadata forms using API s. And the technical name is the one that you are going to use when you are manipulating this uh metadata forms uh using the API s ok?

Perfect. So now we create the form we need to add fields. So uh for this, in this case, we are going to add two as i mentioned before. So we are going to put here a proper name. This one is going to be uh um um the um the string type, ok? Because we are going to put just a name there. Of course, a description. Remember that we need to document everything here. Uh you have the option to put the maximum length and the minimum, ok? In you in case you want to validate and also you have the option to make this field searchable and required searchable means that you can go uh when your consumers try to uh find this data specific data asset. Um uh they can search by this field. Ok. And require is that if, as part of, if you're, when you're documenting the data asset, uh you need to fill out this information in order to publish it. Ok? So in this case, we're going to leave it just required for, for this field and we are going to create a second one and this one is going to be a glossary base. Ok? The example that i shared with you before again, the name, the the business name. Now the technical name, a short description here on field type, you can see that i'm selecting a glossary. It's different from the other one that i select string. I have to select the, the glossary. In this case, the one that we just create the business grocery, we have the option to allow to select multiple values if you want to. For this example, i'm going to leave it empty. And uh for this uh example, i'm going to select searchable and i'm going to select also require ok? And i'm going to create the field.

Something that you have to remember is after you create, you finish your uh adding your your fields, you you need to enable the metadata form. Ok? So this is a way for you to have like a draft, ok? You can finish the the metadata form and once you get the approvals or once you review it is when you can enable it. Ok, perfect.

So now we are going to add business metadata to a data asset. So we already create the the business glossary. We already create the metadata forms. Now let's document a data asset using this information. Ok? So the first thing that we have to do is we need to go to our project, the producer project, the one that is generating the data and look for one of our, our undocumented um data asset. Ok? For that, i have to go to the data inventory, select the the asset that i'm going to to document. In this case, the name is traffic ticket. Ok, perfect.

So here you have the all the metadata from that data asset. If you notice on the on the top uh on the top part of the screen, you have a a green uh notification on green. Uh it says that uh amazon data generates uh automatically some recommendations about uh uh the metadata that we have here. Ok? We are going to see that in action in a minute but just to let you know that there is uh um what that notification about.

Perfect. Um here you can see uh on this is the the the main page, ok? That you are going to document the data asset in a general way. So you have a rhythm section similar to the business uh grocery term, but in this case at the asset level, so let's add uh information about the, the um this data asset and how to use it. Ok? So i copy and paste some information that i previously generate. But just to let you know that you can put here whatever you want, ok? To document your, your information, your your data asset, then i'm going to to assign three business grocery terms to this data asset. Ok? Why? Because i want to give more context to our data consumers. Ok? And adding this um business grocery terms, i'm going to help our, your consumers or our consumers to uh class uh to classify the the data asset itself.

So in this case, i select parking, ok? Because this is aaa parking ticket uh table and selecting now gps data because it is one of the data sources. Ok? And the last one, i'm going to select um traffic police, ok? Because it's the entity that is generating the, the the the data. Once i i, you can select many, many more for this example to make it short i just three and then you can add your your uh glossary terms.

So as you can see here, we already have our rhythm section and now we have our glossary terms. Ok? And if you go over the each terms, you are going to see the description after that here you have uh the first metadata form. This one comes from from default and this i call it the technical, the technical metadata form this one comes from the glue table. It was harvest from the glue table that we used to gather the the metadata for this uh data asset. Uh you have here, for example, the s3 location of the asset you have also the database, the glue database that um this table belongs to and so on. Also you have a, a business metadata form called ownership that we added to this data asset when when we harvest the metadata, ok? In which uh you are going to have the information related to the owner of the data asset and you have the option to add more uh metadata forms if you want to.

So let's add the one that we create uh in the previous part of the demo in this case, uh data classification. Ok? Is there is now part of my data asset. I'm going to put the uh approver name um in this case, sam smith and then um the the privacy classification, ok? Let a public if i mistake. So if you see public can uh because i link this um field with a business glossary, i was able to get the, the list of the terms. Ok. So is there a way for you to standardize the terms that are going to be used across your different data assets? Ok. And then we save it. And now the general part of our, the general information of our data asset is complete. Ok?

So as i as i mentioned before, uh we have uh on, on, on, on the top, we have a green message um that let us know that there doesn't detect that it has some rec that yeah, that it has some recommendation to improve the quality of our metadata. Ok? So for that, we need to go to to the schema section, ok? Let me show you a second.

Perfect. And here you can see on, on um in all the columns that are part of this data asset and you are going to see a green uh icon next to each column. This means that data soon detect uh from the technical metadata from the technical name of the column. It was able to uh understand or or identify a better option and human readable option for that column. Ok?

So here uh uh if you see the, the the last one, the technical name was location one with a number. So they doesn't detect that and uh improve it um uh by adding location uh space one instead of using uh numbers, a letter uh m more human readable. Again, this is a very uh simple example, but uh the song is able to ha uh to make recommendations based on complex technical names. Ok?

Perfect. So imagine that you like all the, the the information that all the recommendations from, from data. So the only thing that you have to do is uh go to the green notification that we got and accept all of them. Ok? Here you have the um the um confirmation button and that's it. But something that is important here to mention is that we are improving the business metadata. The technical name of that column on in this case on the glue catalog is going to remain the same. Ok? We are changing the data from the business context point of view or or changing the metadata from the business context uh point of view. Ok? So that is the first change that uh we implement at the column level.

But what happened here? You have the option to add a business related description to uh each column. And also you have the option to assign a business glossary terms to each column if you want to. So if you see uh in the previous uh uh screen, we add business glossary at the uh using metadata and also um metadata forms and also at the general level here we are adding meta uh we are adding business glossary terms at the column level.

So here we have the column officer. Uh i'm not going to put any description for for this example, but i'm going to add a business glossary term. Ok? In this case, uh police officers or traffic police, sorry. Perfect. And now i'm going to save it. So as you can see here, now that term is associated to that specific column. Let's do it on once more to show you a, a different example here. Ok? In, in, in this case, for set fine amount same procedure, in this case, we are going to look for for fine, we're going to select one of our business glossary terms and we are going to assign it to uh to the column.

So if you see now you have uh two columns associated to terms, but as i mentioned before, you can add it uh to each column and also you can add uh descriptions to every single one of them. Ok, perfect.

So what happened now uh once we have all the information complete, once um we add all the business context that we want to to our uh data asset, uh we can, we can proceed, we can go and publish the the data as ok, we click uh this button and we are going to receive a notification or a confirmation message if we really want to, to to publish that data asset. Ok. Let's do it. Ok? We say yes. And now we have to wait a couple of seconds to get the asset published. Uh what publish means is that that asset now is going to be uh everybody that has access to the data catalog is going to be able to search for that for this specific data asset"

Ok. Now let me show you how to search uh from the consumer point of view or from the consumer side for a data asset and request access to it.

Ok. So for that, the first thing that we have to do is change the project, i'm able to change projects because i, i have an main user for this demo. Ok? But in real life, you are going to have access just to the project that you really have access to.

Ok. So as you can see here, uh this project is, is not, is subscribed to zero assets. So let's uh let me show you how to change that. The first thing that i have to do is go to the search bar and look for a specific keyword. In this case, we are, we are going to look for tickets. Ok? So here you have a list of, of different assets related to the word ticket, but you have the option to apply filters on top of your search.

So in this case, i'm going to search by uh the assets that are related to traffic police. Ok? So if i click traffic police, now my the result list is uh it was narrowed down to just one. Ok? And uh that was just one filter, but you can also filter by uh uh the type of asset ra or glue. For example, you can also filter by the project that is, that is the owner of, of the data asset as well.

Ok? Once we identify the, the, the asset that we want to, that we want to go deep and see if the and it is the right one. Um you click on top of it and it takes you to the general information page.

So as you can see here, you can see the business glossary that we sorry, the reading section that was added by the data producer, ok? You can also see the metadata forms that were added as part of the, the the the when we were, when we were um documenting the, the data asset. Ok? And also you can see the business grocery terms that, that were associated to, to the, to the, to the each columns.

Ok? So as you see as a consumer, you can uh come here and get all the information that you need in order to identify if is this is the right data asset or not, once i identify that i want to consume this data asset, the only thing that i have to do is click, subscribe and start the subscription uh process.

Ok, perfect. So that we conclude the demo. So we already saw what data song offers, how to do it. And now when it's going to walk us through how customers are uh adopting data center. Thank you.

Thanks, leo. I'm excited to share with you naa's story. What can be unlocked when you start using a data catalog if you weren't able to join our data kernan session this morning, i'm excited to introduce you to narra for the first time, we're a health care company focused on genetic testing in the organ health oncology and women's health spaces. We use cellfree dna to answer specific health questions for patients. Clinicians and families.

Data is incredibly important to us. It helps us answer questions and frame new ones that our patients don't know to ask yet. It's been a great time to be in this space, bringing value to patients and transforming lives. We've been growing rapidly to new verticals, new business models, new products every new year, we seem to have found a new place where information helps transform treatment decisions and inspire new treatment paths for incredibly painful conditions.

As a company, this growth is awesome and it brings us challenges as we grow and expand, we also grow and expand the amount of data we have. It's a new set of resources, tons of information to make sure that we keep safe, our patients trust and information. As we add that data, we see many different data types come in. It's tough to understand what did it look like yesterday? Two months ago, two years ago, is this the same resource that i'm looking at?

We're at the cutting edge of technology for genetics, but also in the data space, we're excited to use new technologies and bring them into our tool stack. But that also comes with growth for our people, opportunity to try new things. Figure out what are these technologies and do they work for us as you grow your tools stack, you have a learning curve of how to use all these new resources. I often hear from our team that they know the data's there and the tools exist, i should be able to do this. But how do i connect it all that connection is important in our complex data and our complex ways of accessing it trying to find what is this called. And how does it relate to everything else? Challenges our team to make sure that we can innovate.

As i've heard from our organization, what are the challenges with data? What do you need to be successful and power our growth. We found that a data catalog is the path forward for our organization ensuring that we can provide the information and the terms that our team understands. Not just the technical or scientific terms that are useful from the producing teams in the health care space data privacy. Patient information protection is critical.

We need to continue our commitment to privacy and have access to the data for innovation while maintaining trust at times that can feel like it's at odds with self service access. We need to protect and make accessible the data so that we can move forward that self-service access and being able to use the data that is available and correct for that user will help us power data collaboration where we can see the same picture because we have access to the same information for that data catalog solution has been datazone.

We're excited to power our transformation towards a data mesh infrastructure using this catalog, taking information from our producers using a standard data pipeline to pull it into storage, using glue making that technical information metadata available to the data catalog where we can add additional context and access it through the data portal from there.

Having the data is not enough. I have a book on the shelf but i've never read it. Is it adding value? Is it helping you push forward? Once we have the information in our data catalog, we need to be able to consume it to use it to answer our data questions. And for us, this is an easy integration with redshift, quicksight, athena and other tools. This is a tool stack but a data catalog is one piece of the puzzle.

We also need to make sure the process is in place to bring in this information to be useful. Leo showed us a great way to import the technical metadata through the glue job and add additional business context. So that that data catalog is serving up the information users need. Let's move from theory to some practice. What have we seen since we've implemented this tool.

It's been exciting to reduce the search time looking for data. We know what assets are there and we can find them. It's very different from when we started our transformation hearing where is it? What is it? I don't know what data set this is. It's easier to use that data. I can find it and start to answer my business questions going directly into quicksight. So i can share with others.

We have repeatable patterns for bringing in those data assets every time we grow and we're looking to grow and grow and grow. We'll be able to do this in a repeatable fashion so that we don't miss out on the data. We're creating, we can add business context. Teams have started to understand each other because they have a shared language, a shared place to say this is the technical term and how it meets the business need more than agreeing on language. They also have a direct relationship fostered through that tool where consumers can start to ask for data they need or that they will need and have a direct line to who makes that data last.

Continuing our commitment to privacy and access. We're excited about the strong access controls datazone gives us to be able to tune what data people can see to the right role. We're at the initial phase of our roll out. We're excited about the transformation we're seeing. And in partnering with datazone, we're excited about where this can go.

As i said earlier, having data isn't enough, being able to see a book is on the shelf. It's great. We want to be able to use this data in our catalog integrating with snowflake fora would bring in more data sources that we can search, understand and leverage in a more powerful way. Simplifying permissions making it easy to use our data. We're excited to leverage this and roll it out to the business where you can set up a dashboard. Once you find that data really quickly integrating with sage maker will power our data science team to be able to pull in more powerful models and leverage those access controls to make sure we're training on the right data.

I'm excited about where datazone is taking us and leveraging a tool as part of our transformation. There are also people and processes in place to help power. This ultimately, this is about making information accessible. I'm gonna hand it off to priya to wrap us up.

Thanks gwen. That's exciting to hear about nara and the solution that you're building and can't wait to partner with you for your next steps.

All right. So now that we discuss the challenge, the solution, the capabilities. Do you see the light at the end of the tunnel where you want to make data available and accessible to all the users across your organization? So datazone strives to make do just that we want to make it super simple for all of the data producers to make data um uh accessible and discoverable for all of the users across the organization. And similarly for data consumers to be able to find understand and make the best judgment to take it forward for their analysis and datazone plugs into the data governance.

So if you've noticed we had the panel earlier today and also a series of sessions that dives deeper into each of the governance pillars. And with datazone, you can make data easily findable, accessible and shareable while keeping it safe, safe and secure. And also with the governance and access controls, the datazone provides, you can also build a lot of auditing and compliance reports.

With that. I would like to just wrap up by, you know, sharing all the sessions that we have. I know some of you showed interest in red shift uh access controls. So we have a whole lot of sessions, talk, talk, um talk, talks and workshops that are aligned to different uh capabilities of datazone and uh you know, feel free to attend them or reach out to us to inform more questions and data governance is people process and tool.

Datazone fits into the tool. But first, we need to understand how the people and process works. And for that very reason, we introduced a master class that has a series of videos to uh where we have a uh service leader, sharing the best practices that he has observed of what works and how to establish that people and process to make datazone successful with that. I would like to share the superhero if you're following that at this uh conference. And now i would like to open it for, uh, questionnaire.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值