Drive innovation with AWS Mainframe Modernization Data Replication

最新推荐文章于 2024-10-10 17:32:37 发布

李白的朋友王维

最新推荐文章于 2024-10-10 17:32:37 发布

阅读量141

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134834960

版权

And uh so we are almost like a few hours away from the replay party later tonight. I'm all really looking forward for it. But after all the hard work, we had done a lot of learning throughout the week. So before we going into the presentation, right? So I have one quick question just by raise of hands. How many of you heard about AI and JAI throughout the week, whether it is keynotes or whatever it is. So it looks like some people are raising two hands. That means they have attended too many sessions. But the biggest theme of this year's Rein event is all about AI and JAI. So you probably heard about that in every session.

The key ingredient to unlock and start building the AI and gen solutions is the data which is the lifeblood of building all these solutions. So over the next 15 minutes, we'll walk you through how to unlock the value of the data. My name is Rob Pancho Mary. I lead the AWS mainframe modernization architecture at AWS and I'm joined by Chengdu CTO of uh precisely and Bobo from Citizens Bank, uh director of engineering.

This is our agenda for today. First, I'll take 10 minutes to walk you through the value of the mainframe data. The challenges of unlocking that data and the solutions that AWS is providing to accelerate the innovation. After that, Tinder will explain about Precisely's architecture, both functional and nonfunctional features of the product. And then Bob will walk you through the real world scenario of how he has implemented this at Citizens Bank mainframes.

Mainframes run mission critical workloads. Load of enterprise data is logged on the mainframe which is a gold mine for generating customer experience and innovation. I'll give you a real world scenario to come to the Rein event. I had to make a flight reservation, hotel reservation around a month back. After that the day before I have to check in, then later fly that day, take a Uber to get to the airport from the airport, coming to the hotel and then staying here a lot of hotel reservations and all of these things has generated massive amounts of transactional data which is all sitting in a transaction system somewhere on the mainframe.

Now, if the bank can tap into that data, they can really know where what I've been doing throughout the week. That data can be used to generate rich customer experience. So this data is logged. And if we find a way to unlock all of that data, it really generates innovation faster. Every organization wants to be data driven, but only one in four has succeeded in doing it. The biggest roadblock of accomplishing that is harnessing the data that is sitting on siloed platforms. It could be a DB2, IMS and VSAM on a mainframe. It could be SQL Server or Oracle which is on the distributed platforms. If this data is locked on them, on all these silo systems, you would not be able to generate the value out of it.

So in order to unlock and start generating the innovation, you don't need massive data platform and you don't need full time the entire database replication, you can always start with small. Just by streaming out your transactional data. You can build a small application, learn from it, iterate it innovate it on top of it and add more to it. So you can start small like the flywheel that you see on the left of the screen. It's an innovation, flywheel, start small and incrementally iterate and innovate on top of it.

So it's time for a poll. So if you have a mobile devices, please scan the QR code and or if you have a challenge scanning the QR code, you can actually use the URL down and enter the number and we'll give a few seconds to learn about how you would be able to take the value out of the data that generated out of the mainframe. So we'll give a few seconds, could be real time analytics it could be operational insights.

So the poll results are coming in, we'll give five more seconds. All right, we have some results. Let's see the results on the screen. Wow, there's no surprise as we expected. All of the above. That means everybody has a need to extract and real time replicate the data. But let's see how we can do this. Let's move on to the next slide.

All right. So we all want to extract the data, generate the value out of the data. But what are the challenges? So I started replicating the data from the mainframes around 15 years back in 2008 time frame. The first real use case is replicating the data on the same mainframe from one database to another database. It was IMS to DB2, that was the first application. Then later we started replicating across the mainframes in different regions because we want to generate active, active point of sale application. So we use replication to replicate.

Later, we replicated the mainframe data to on prem Cassandra database so that we can start generating the customer experiences. Over the last five plus years. now, we are replicating onto the cloud. So after all these projects and conversations with our customers, we have learned that there are so many challenges to accomplish this.

The first one you come to encounter is what is the tool that really solves the problem. There are so many tools available in the market. So you have to first go do the research. Would it solve your problem? What happens if I have to do one more use case with the same tool support, you have to do all the research and uplifting. And then on top of it, how do you set up this tool? And what is the licensing cost for it? Do you have to sign upfront capital licensing fees so that you have to start using that product? And once you identify that product, the biggest hurdle that we faced is one of the bank that i worked. They update a massive batch processing, they do 100,000 database updates a second in the middle of the night. Some of the tools that we first tested couldn't scale up to that needs.

Then the later that since i worked in a financial organization for some time, and there is a lot of regulatory concerns where you have to show the near real time account balance. If you show the wrong balance, there is a chance that you are disappointing the customer at the same time, you are violating the regulations. So a lot of these challenges we actually observed and listen from our customers.

How do we solve these challenges at AWS? We listen to our customers. We get all their feedback from that feedback. We have identified a way to curate set of products. Create AWS Mainframe Modernization service as part of that AWS Mainframe Modernization service. This year we launched data replication capability in partnership with Precisely.

The key benefits of this is you don't have to do all this analysis going back to the market and finding what is the right tool that solves the problem. This tool is now rightly available out of your AWS console. You can go to the marketplace, start subscribing that tool, start working on that from the same day and within a short time, you will start replicating that data and it comes with pay as you go model, you don't have to sign up for any, any upfront cost or anything. Since it's coming from the AWS console. If you use it, you will pay it. If you stop using it, you don't pay for it.

The second thing is the tool provides the largest choice of the source databases and target databases for your replication. Source could be DB2, IMS or VSAM. The target could be any database that runs on the AWS, it could be PostgreSQL or it could be Aurora, it could be Redshift, DynamoDB. And if you want to really scale for the volumes and have high availability and scalability, you could also start streaming that data into MSK, which is Managed Streaming for Kafka into Kafka topics and then you can replicate and store into the rest of the databases. This solution can scale up one of our customers, the biggest user on the AWS platform. He's streaming 200,000 database updates per second using this tool. So the tool can scale and it also provides high availability and observability from your CloudWatch right out of it. And all of these architecture features Tinder will explain in her session.

How does this whole tool work? There are three steps to dip, there are three different steps to uh like end to end solution building. The step one is integrating the mainframe with AWS platform. So when you go to the marketplace and subscribe to the DMI, it comes with a software that needs to be installed on the mainframe which has a capture and publish agent which needs to run on the mainframe and then it streams the data where it then apply agent which runs on the AWS which basically receives that data over the TCP/IP. And now you have now once you have the data, you move on to the step two, you have to decide where do you store that data. If your use case is related to operational, it could be relational database like PostgreSQL. If your use case is related to some transactional data which is non relational, could be a DynamoDB or or if your use case is related to analytics, you could use Amazon EMR which is Elastic Map for big data or a Redshift for your data warehouse applications.

Now you have the data, you have to figure out what to do with that data. Typically, there are three different use cases. We see one is a business insights. It could be using QuickSight with by using your natural language processing queries on the QuickSight, you can generate a dashboard out of it very quickly. So your business can start building dashboards out of it very quickly. And you can even generate machine learning models using Amazon SageMaker or you can use generative AI solution like Amazon Lex where you can start building a chatbot, which is a conservative conversational application. There are three broad categories that we see. The customers are innovating with this data.

The first category is business insights, which is analytics. Typically, all these analytics used to be done using batch processing. But with real time data, you can actually start generating real time analytics and insights. I'll give you an example, right? Let's say you are going on a vacation for Christmas and you have booked a trip. For example, it could be Disney and paid a large purchase. Imagine if your bank sends you a quick message says that congratulations on your purchase. You now you are eligible for a 12 month installment on your big purchase with a small fee. How good that would be compared to you receiving that over a letter a month later, the experience will be different, how you engage and how you serve your customers will be different. So the real time data will really change how you work with your customers.

The second digital operations, a lot of this data can be integrated with web and mobile so that you can do your fraud notifications, push notifications. The third is incremental modernization. A lot of our customers, maybe many of you have attended yesterday's session with Cigna and IT bank. They are doing an incremental modernization. They are slowly strangling their mainframe and migrating their data onto AWS so that they are, they are basically starting with an incremental read only services strangling their application building CQRS pattern which is a Command and Query Segregation Responsibility pattern. With that pattern, you slowly move your command functions one at a time and you will slowly strangle the mainframe application and migrate onto the AWS.

These are our success stories. Uh some of our customers who deployed this solution successfully.

First one is Citizens Bank. You will hear from Bobo later.

The second one is Global Payments. We talked about it. They are building modernizing their payments and credit card issuing platform on AWS.

And at AAA Life Insurance, they have developed their customer analytics platforms on AWS.

And a Canadian life insurance company has saved 1.8 million by decomming their ETL process, which is a batch, typical batch ETL processes by building their customer 360 project using real time replication.

Backbone of all of this product is precisely a data connect product please welcome Chengdu to explain about the architectures.

My name is Thandi Yu and I'm the Chief Technology Officer at Precisely. If you are unfamiliar with Precisely, we are the leader in data integrity. We define data integrity as trusted data with maximum accuracy, consistency and context.

Precisely has 50 plus years of experience and expertise in mainframes. And we are really excited about this partnership and delivering the data application as part of the AWS modern modernization service.

Before I talk about um before I talk about the replication service, as you can tell, we love poll. Let's do another poll and understand how you are extracting the data out of mainframes. Please scan.

Are you using batch files? Are you doing real time streaming both batch and real time or unable to extract data? Which is fine. That's why we are here. Let's give another five seconds.

Let's see the results. This is not a surprise. Most of you said that both bad and real time and uh that really depends on the use case, right? And your business requirements, you may have one use case where the batch extraction is all good. You may have another use case where you need near real time or real time. And that flexibility is important as you select tools because you wanna be able to have the tool support as you evolve in your use cases and modernization effort.

Let's go back to the slides, please.

When we talk about trusted data. The foundation of trusted data starts with easy access to that data and making the data usable. We talked about data silos. There is no day that I don't have a conversation with a customer where their major challenge is data silos either through the number of businesses there where the data is divided in or across multiple data platforms. And if anything, the data platform ecosystem is getting complex and more complex every day.

So the real time replication or batch replication, the data application uh solution we have basically takes data from mainframes db two with sa i ms, all those sources and replicates in AWS cloud data stores or c afc a managed streaming c afc service, both in real time near real time as well as in batch.

It's a very lightweight solution for those of you who are familiar with the mps utilization. There is insignificant impact on the mps utilization on the mainframe and it's highly scalable scale is very important because mainframes are renowned for their high performance and scale for any solution we pick to do uh modernization, we have to really pay attention to per performance and scale.

We have a financial services customers. And uh in the us, uh this particular customer is achieving 200,000 transactions replicated per second. We have a very large retail customer, online retail in Europe. They have uh about 300,000 transactions replicated per second. So it's in the range of hundreds of thousands of transactions replicated per second and it is resilient.

What does that mean? Wouldn't you like your solution to restart from the point of failure? Instead of from the very beginning of that replication, something fails during the connection to cloud which happens, right? Doesn't happen in our environments. Connection to cloud may fail. Resiliency really saves a lot of time because the solution starts from the point of failure in case that connection fails.

And then if you are familiar with mainframe data, you would understand that replicating mainframe data is never just replicating raw bytes of data. The data is encoded in a different way. If it is visa or i ms, it's hierarchical data in order to make that usable in the target format, whether that's r ds or whether it's a uh json in kafka that's feeding into these uh stores, redshift and uh uh other formats on s3. You have to make it usable by translating and transforming. And the solution takes care of that. You don't have to worry.

The data application also leverages cobalt copybooks for understanding the metadata and uh automatically mapping metadata to the target. That's actually very important because one of our banking customers uh we were doing a proof of concept and they gave us a 85 page coal copybook, cobalt copy books can be really complex and there were two other vendors went before us. They both failed. They took uh three plus days and they could not make sense of the cobalt copybook and the data application tool automatically understand that meta data and can map to the target metadata.

And finally, we talked about performance, low latency matters. Latency is often limited by the band uh bandwidth that you have on your network. And the good news is now this data application is available through the AWS mainframe modernization service. And in fact, on Tuesday, we also launched the same service for IBM i series as well.

Let's look at what happens during the application. It's very simple. We basically capture the change, data and publish and then apply that on the target side. And uh the controller is important because controller is actually takes care is taking care of all of the multiple appliers, multiple uh captures as well as start and stop, right? We have to take care of where we are in that uh reading so that we can uh restart from the point of failure when that happens. And then the configuration utilities are also important because you want to have visibility to what's happening. You wanna understand the statistics, the performance, uh how many captures are there? How do you configure the applies, et cetera? That's where the configuration utilities play a role.

Could you please raise your hands if you are a technical user? So I have a sense of the crowd. Ok. So this is a little bit like a uh crowded picture, right? Let's dissect this into simpler pieces in terms of the architecture.

The important piece on the left side is that we support all these uh multiple types of sources with db two vs am and i ms. And it's often the vs am and i ms, they are a little bit more complicated than db two replication as you can imagine. And uh the change data is captured from the log streams and published publisher works closely with the controller to keep track of where we are reading from the logo.

And on the right side, on the cloud side, the applier applies the changes. And you see that actually uh the publishers publishes the encrypted data through tcp/ip to apply and applier is highly parallel. That's why you have those multiple workers. It's really highly parallelized. And in a typical scenario, you may not need further parallelization. However, with very large volumes of data, with some of our larger customers, they go and they want to have even further prioritization, horizontal scaling. And uh they use uh many streaming kafka or c a fa uh to do that.

You can write and apply the changes directly to the uh targets that you see on the right side, whether it's red shift, whether it's uh dynamite db or uh r ds or you can use uh c a fa as we spoke about and the controller works with the apply to make sure that there's an acknowledgement that happens back and verifies the data that was uh published is uh written to the target.

Are we all good with this architecture? All right. So let's talk about some use cases because that's the only way we can make actually good examples of how do we apply this architecture, right. Architectures are good as long as they bring value with the business.

So there are two main categories of use cases that we see as very common ones. The first one, most of you have a data driven initiative, you are going through your digital journey. And in fact, many of the ceo s ceo s of your organizations are actually uh rolling out initiatives for becoming a data driven company. This involves new workloads in the cloud.

It was interesting for me to see the poll results from raul that actually after all of the above the uh highest percentage was in a i and generative a i. And we want to understand how you plan to do that actually with the mainframe data.

So these initiatives all require critical data assets that are in the transactional systems on mainframes like the customer reference data to be part of that uh analytics, whether that analytics is uh with machine learning, generative a i or uh it's uh business insights that you are driving. So that's the use case category one. And it's very common, it's a low risk use case. And many of the organizations actually start with that making the data available.

And then the second use case is a uh a little bit of a bigger journey, multiyear journey in larger organizations where there's modernization going on as part of digital transformation, there are some workloads running on mainframes for historical reasons and those workloads in your organization, you are refactoring or replaying to run on the cloud. So you can have more agility in your organization become nimble for new products, new services. And that's the second use case because in order to refactor and in order to deplatform, you need to have the data associated with that application or workload.

The Canadian life insurance company is actually uh the one that uh also raa mentioned. Uh they had incredible savings but their goal was first to have a api platform that feeds into all of the applications in the organization. Over the years, they recognized that in order to enable the businesses, the other lines of uh businesses in the organization, their costs were increasing like six times and their ability to serve new products and bring to the market was actually slowing down.

They implemented this enterprise api data platform feeding data from db two and i ms. And in fact, they had also oracle sql server, other data sources and enabling all of the analytics applications and business insights in the organization. Their savings were significant from m ip s perspective. However, real time delivery of data and enabling that agility in the organization was even more impactful.

They change time and managing that change from data to applying to target was a manual process that was taking 30 days and they reduced that to 30 minutes.

I will not go through this architecture. It is the same architecture. It's just that the right hand side is different because in their case, actually the targets were s3 and redshift and they used sagemaker ml and uh a i capabilities uh on aws uh to leverage this data.

The second use case is the modernization. This is a luxury automotive uh manufacturer. And uh in their case, actually, they are going through a multiyear digital journey. And their goal is to decouple the data and the applications on the mainframes

They want to actually migrate these workloads from mainframes to cloud. By end of 2029 their business challenge was driven by retiring skills.

How many of you have shortage of skills in your organization? Yeah, and this shortage of skills is interesting, right? We either have shortage of skill in cloud computing and AI and gen AI or in the retiring skills with mainframe IBM i series, et cetera.

So we are in this spectrum of skills shortage and they wanted to reduce that risk as well as the maintenance challenges that they have. Again. The lack of agility that bringing services and new products was a challenge for them. And the benefits that they started seeing by decoupling the workloads and bringing workloads first, bringing the critical data associated to those workloads was significant and they are still in that journey. And every year they are achieving one phase and one iteration.

As you can imagine, they reduce their costs during these modernization efforts as well. Again, I will not go over this architecture here. The important piece is that in their case, the target was Aurora Postgres. And for scalability, they used Kafka and many streaming Kafka and they also used the transformation from that to Aurora.

Now, when you consider your functional requirements, you have to also think about nonfunctional requirements because on mainframes, you are running these highly scalable workloads. And when you bring them to cloud, you want to achieve the same level of scalability performance, latency, and so on.

We went through many of these already. I think I will kind of highlight the two important ones, observable matters because you wanna understand that pipeline. Did you listen to keynote today from Dr Vogel in the morning? And he gave an example in terms of houses in Amsterdam, he gave this example basically at some point, they were looking at these old buildings in Amsterdam and seeing that some buildings were utilizing 3000 kilowatts of energy and some buildings were utilizing 2000 kilowatts of energy.

And they were trying to understand what contributes to that because the houses were built in the same years and pretty much the same number of floors, etc. It turned out that the buildings that were utilizing more energy had their meter in the basement and the buildings which utilized less energy had their meter in the hallway.

So the first thing you saw when you entered hallway was the utilization, creating visibility is so important, especially with the complexity of the data ecosystem is evolving. And the applications are also getting more and more complex with AI generative AI. So being able to observe this and monitor the statistics and the performance through CloudWatch like you do for any other cloud application is really important.

And then the second one I will highlight here is the flexibility because you start with one use case one data set where your business requirements might be different, it might be near real time and then and another one, it's real time. You want that flexibility with the tool. So the tool can accommodate and can be future fit again.

Please ignore all the little boxes here. The takeaway from this slide is the solution is highly available and you see number one, number two, number three at the top, highly available at three levels.

The first one is at the LPAR level. What's an LPAR? It's a logical partition. You can think of as separate mainframes where each of these are running their operating system. And if one LPAR fails on the capture side, there's fail over to another LPAR.

The second level of high availability that comes from the EC2 on the applied side. If an EC2 instance fails, there is fail over to another EC2 instance. And then finally, there's fail over at the cloud region level.

And we talked about observable already. The point of this slide is you can use your existing monitoring tools and CloudWatch like you do for any other cloud application. And even the capture statistics from mainframe will be retrieved and provided to you through CloudWatch. You don't need to get another monitoring tool for this replication.

Three key takeaways, you had three days of fully loaded schedules and thank you really for joining today. It's the last day of the conference. Three things to remember, start small PK use case and the critical data set associated with that use case.

Two, think about both functional as well as nonfunctional requirements from the start and three measure. That's the only way we can prove return on investment and make the case internally to the executive leadership or to other business functions. So measure test and move on to the next use case and data as well.

And we have some exciting actually new capabilities. Now we'll talk at the very end about that, ultimately follow a business driven approach to your modernization. And thank you for your time with that. I introduce you to Babu Klau who leads the mainframe modernization for the core banking platforms on in Citizens Bank. And he's going to share their learnings and strategies for success. Thank you John.

Yeah, my name is Babu Klau. I manage the core banking modernization at Citizens Bank. Again, we have any other banks and our main systems runs on the mainframes and they're running for the years are a great providing a great value to it. But however, five years back, we started our journey to modernizing it.

What modernizing means is our bank. Customers wants to get into more and more digital experiences. Again, you imagine the banks 20 years to now the bank that you go to and you talk to a person, they do some operations, they click something on the keyboard and do all those things. And then now the bank is in your pocket, you know, mobile has all the details and you can make transactions, you can withdraw money, deposit, money, transfer money and whatever you want, you can do it even check deposits also came into the your mobile mobile device, right?

So in the same thing, online experiences also, in order to do that, we have to provide a richer digital experiences to our customers and they expect that too, right? So that in order to do that, what we need to do is data is the key part to make the digital experiences richer, right?

So, so when we are looking into that one is how do we do that, right? How do we make the data available to our digital experiences and all these things? So like any other mainframes, we can access the mainframe data using same old methods MQ, CICS, CICS transaction, Cobol program execution and all these things.

This has two aspects of backlogs on that. One is overall response time to going all the way back to mainframe aggregating the data and getting the data into the front portal is going to take the response time. The second thing is it is going to cost some money on the mainframe in the terms of MIPS.

So to optimize these things, one we can do is we can modernize the core. What modernizing the core in the sense means we take the process and everything and write it in a more genre or next generation core models. And that's a huge investment. It takes a longer time frame and we don't unlock the business value curriculum.

The second is upgrading our legacy systems. We can look into the mainframe itself and looking into the opportunities. How do we arrange the data differently and all these things, it still has the MIPS cost, we may be able to improve the response time, but this has one drawback is the business continuity is gone. While we're doing this one, I may not be able to deliver the growth opportunities to business by modifying my main things because I have a limited capacity.

The last thing we're looking into is how do we move, what all these things we need is on the digital experiences perspective is the data. How do we move data closer to digital experiences? So that's high speed replication of the data.

So once we decided that one, the next stage of this one is what tools we need to do this, right? So imagine when you are talking about high speed replication of the data, you are watching your favorite sports game live in your family room, sitting on the couch. So what is happening there? Right.

So the light changing somewhere else in the stadium is captured on a device and we transmit over the air or the cables or the satellites and all these things and you are able to see in your couch, what is happening there, the person that is seeing on the stadium, the person that you are seeing in the couch, there isn't any difference. It's not real nearly real time. It's you feel that there is no difference between that and this, right?

So that's what high-speed replication of the data we are looking into. In order to do this, we established the fundamental principles. What we need to do is we cannot change our legacy systems in any way to transmit any events. We have to capture the data to do that.

The first thing that we need is any time the data structure is changing, we need a tool that captures the change that is happening in individual data points and need to transmit that to somewhere. That's what precisely kind of this thing is to transmit that one, we had to launch somewhere.

Like I said, my digital experiences is already running on the AWS. It's obvious we need to bring the data to AWS. So we brought the data to AWS. Once it got into the AWS, we need to scale it horizontally, expand it. And all these things, we need some of a piping mechanism. We use Kafka for that one.

Once it has in the Kafka, and we have to process it and make it ready for consumption because I cannot aggregate the data on the on the call basis. So we have to make sure that it is ready for the consumption. Then that is the processing that we are doing. Amazon OpenShift on that one and final stage of the data is we store it in the MongoDB and expose them through APIs.

So once we decided all the tools and the next step is put them in an order that can give us a scalable application. So this is similar to what John presented to you guys, but I will not concentrate on the left side of the equation.

On the right side, we have two ways of data being manufactured on the mainframe. One is through online transactions which is CICS executing something. And our colleagues making some operations, banking operations and all these things that changes the customer data. It is mainly on the call center and bank tellers or back office operations. Making the changes to the data and financial transactions coming and we're posting in a real time, right?

So those are the things and a battery is different and batch is does the heavy processing in the night it manufactures the data heavy processing. In the sense means you can imagine we have to calculate interest for the 4 million customer, 4 million to 6 million accounts on a given statement there, right?

So this is how the batch creates a lot of amount of data in a shorter time frame. So to do this, we have to separate online architecture and batch architecture in the coming slides, i will explain the challenges we faced and that will make you understand why we need to take two different approaches, batch on our online perspective.

But what the key point you need to remember is the measurement we make is on the online perspective. The change happens on the mainframe. The change replicated here should have the difference in milliseconds, not even in seconds. The on a back side, it's not about the latency, it's about the elapse time.

So if bats finished on the mainframe and x time within minutes, i should be able to done with the replicating all the data on the cloud side, right? So, so those are two different measurements and for that, we need to make that as a two different architecture for that one.

So coming to the next slide, so what challenges we faced on these things? Number one is you heard about Cobol copy books and then you just mentioned that 800 page Cobol copybook. It's a similar problems anywhere in this one. Cobol is a beautiful language and it takes the same storage and redefines it 100 different times and then it gets into different slicing and dicing and all those things.

The, the way that you understand the data itself is a program in the Cobol, right? So that's one challenge on top of it, the legacy systems when they're written, they are written for IO operation friendly and they used on my side, at least on our side, they use assembly programs to compress the data and store it in the file structures.

And when you want to read that one, you have to uncompress it first. So that is the first hurdle we face.

The second is when you are talking about real time transactions, CICS and all these things, the transactions happen in a particular order, which is the sequential and transactional consistency. How do we maintain it?

The third one is at a batch, we just explained it. Huge amount of data is creating in a shorter time frame. How do we cope with the elapsed time requirements that we have.

The last one is because on the mainframe legs, some of the leg systems are designed in such a way that you have a different image, your batch operates on your different image online operates on at the end of the day. Once the batch is done, they switch, how do we mimic the switch over or switch over thing and cope up with the cloud replication perspective?

Ok. Going into the first thing which is the compressor structures. So this is very interesting. Like I've told you is there are a lot of assembler programs, takes the byte stream of the data, they compress it in such a way and throws it into the file. When you are reading that the same data, the same assembler program reads it and puts it into different parts of the cobalt program so that cobalt programs can understand.

Keep in mind our aim here is we are using a CDC tool that's behind the data structure that is running on these things, right? So we cannot use the similar program there to uncompress it. We are coming the by streams of this one to do this one. We have to run the decompress algorithm on this file. So then we need to go back and understand how assembly programs are compressing this data and storage files and replicate the same logic on the cloud side. For that one.

The reason we chose the precisely CDC is you look into the precisely CDC. On the apply side, you can write your own string, operation, scripts and slice and dice the information to map it to different cobalt copybooks and convert them. Finally, out of precisely apply in itself into a rest format that helped us to solve this issue.

Next one is sequential and transactional consistency. This one is very interesting, right? So this is mainly for the online transactions. How do we maintain these things? Right? So why these are important before we go into how we solved it? Why these are important is if we take the transactional consistency into the question in leg in any financial systems, you see two things, right? Account which has a balance, a transaction that tells you how the account balance can change.

So when you but these two data points are stored in independent structures, but the way that we maintain the consistency here is a single transaction is posting, it makes two operations. One is it makes the transaction table getting inserted or modified or whatever it is on the transaction table perspective on the account prospect to your balance is updated when we are streaming this data over the cloud and replicating it in man o db. We need to make sure we cannot take the account dimension without the transaction dimension being data present, we cannot have the transaction being present without account, being present.

If we don't maintain that consistency what it looks like is my balance is updated, but i can't see my transaction that corresponds to the balance update and vice versa. Right? So then if we display that in online, we look like kindergarteners don't know math, right? So we don't want to do that. That's why we need to maintain that one.

The second is the sequential consistency. If you look into it within the across the transactions, a field is change a to b and b to c, you cannot process them in a different order. You have to process them in the same order. Otherwise you end up differing the data and result of the data on the cloud side differently from the source side, right? So then within the transaction itself, there is a sequence of operations, right?

So what, what it gets into is it gets very complicated in the legacy systems. In order to make an update to a field, you probably delete first and then insert. So if you perform these operations in a different way within the transaction, you end up losing the data. So you don't want to do that to solve these things uh precisely connect. And even in the a w side, what you can maintain is the uh com stream.

Basically, we can understand how the mainframe com the data and we replicate the same comet stream in the same sequence on the cloud side. And order of the operations within the comical unit of work is maintained. So that's how we can solve this one. This is only a problem for real time applications for batch. It is a different story coming to the batch, right.

So badge we are not operating as a transactional basis. We let the mainframe process do the bulk amount of data changes on their site. And where everything goes on. What we take is we compare the amount of changes that the batch made by file comparison type of a thing. But that precisely has a different tool which is precisely differ. What it does is it compares this image of the data today in the file versus image of the data in the file and it creates a net change.

So when we are transmitting the net change on the cloud side, we are, we don't need to maintain the transactional consequence because a key is changed only once here, whether you changed inserted or deleted, right. So then you can replicate that one in a faster way on the cloud side. Uh because you are not doing so many transactions that mainframe is doing, you are only applying the net change that occurred on that one.

So this way and also on a batch side, this enables kafka streams enables the reason why we use kafka here is if you look into the kafka mechanisms perspective, while batch is going on, we don't want to do the online perspective and we can use the kafka streams in such a way that we can create the kafka partitions going in maha. Instead of doing in one partition, we can do 100 partitions, then we can perform the same operations, 100 different ways to do that one.

You can split the mainframe file into a number of ways and each file, each block of the file is transmitted to a different partition and do those things. That's how bad throughput we achieved.

The last one is very interesting, which still we are debating in how this can be smoothly done. But we figured out one way, but we are still experimenting with it because of the dual images. Should we maintain the dual images on the cloud side? Also, batch operations goes to the batch data, online operations goes we also need to take an outage in order to do those things.

We have to make sure on our side, we are cash is available while the mainframe is out, mainframe is going into the dark period. When mainframe is available, then only we can go and take our downtime on us. So how to time these things? We had a lot of challenges. We figured out a way at a smaller scale right now, we are still fine tuning to avoid any downtime related to that one on the cash perspect.

Ok. So the result moral of this one is we basically achieved the faster response times, we increased our availability and we are able to provide um provide transactions at a higher tps and we reduce the operations cost response times. was to give you an example. my legacy used to take 607 100 milliseconds in one use case making a call to mainframe. now, i'm taking less than 100 milliseconds tps perspective, we can achieve 200 tps inquiries easily on a cloud side. if you want, we can go holy scalable on the cloud a ps right?

So reduce the operation cost since we are not making the trip to mainframe and executing a mainframe cs s transaction, we are saving the cost there. and now we are extracting the data out of the mainframe using cdc tool. now we have a lot more use cases to business can think of, right?

So example is i did made a transaction at xyz, right? so what does that mean that i know immediately? then i can create a process that oh i want to send some kind of a coupon to the customer saying that when if you do this, you will get x mode of things. there are a lot of opportunities that business is thinking of when we listen to the changes that are happening on the core data and these implementations will come in the future. rao can explain better like what they are planning to do, how it unlocks much more opportunities next. thank you, babo.

Um so there is no compression algorithm for learning from the real world implementations and experiences. So thanks babo for sharing the the lessons learned.

So so far, we have spent some time and learned about the data application and how we solve business problems. But from aws perspective, the aws mainframe modernization service supports you all kind of patterns that require for you to modernize your mainframe. It doesn't matter whether you choose to keep the mainframe but argument on top of it, as you see on the right side of the screen, um right side of the screen, we already talked about the precisely for augmentation.

We also have support the dews capabilities and we, we launched yesterday as part of this rein event is uh b mc model line a i, uh the file delivery capabilities. So you can go to your uh uh aws console and it connects to mainframe and you can scan all the mainframe files that are available on the mainframe from aws console. You can transfer files and you can schedule the file transfer from directly from aws without even having to access the mainframe.

So we provide that capability which allows you to unlock the file delivery and start building your the analytics. use cases if you choose to modernize and migrate your applications out of the mainframe, we provide two capabilities. One is micro focus relat which keeps your cobal but replaces all the operating system databases and everything. And if you want to really modernize the entire software stack when you move the application out of the mainframe, let's say if you want to replace your cobal with java, replace your jcls with groovy scrape.

We have an automated tool which is aws blue, which blue acquired by aws two years back, which is fully incorporated into aws mainframe modernization service which accelerates your modernization. Some of our customers are doing rewrites, some of our customers are doing buying products, cars, products out of the market and we support all patterns depending on whatever your choice. They're all part of the aws mainframe modernization service.

What is the difference as of the database provides for mainframe modernization? There are three basically success stories. One is we categorize them into pillars. One is the product, we launched a mainframe organization service which is right out of the console, which provides fully managed runtime capability for analyzing, developing transforming testing and fully running the applications.

Then the people, we have the largest concentrate of experts within aws who understands both mainframe and cloud technologies, who helps you to solve your problems and the final process. And we actually with a lot of our partners close to, we pledged to train 5000 partner engineers this year to support our mainframe modernization acceleration programs. And we also have a map program which supports mainframe to provide some investments to accelerate our modernization.

These are all the eight new announcements we made as part of the re invent and the first one is application testing. So if you want to migrate your application, you have to capturing the mainframe, how the application is run on mainframe and you replay that on the aws. So we provided a application testing service so that you can accelerate your testing.

We already talked about the b mc file delivery capability which is part of the, the ami cloud. We also partnered with precisely which we all spent a whole hour learn about it. We introduced the unix uh product which is part of the relat product, which is by entity data. And uh the last one is um we are introducing artificial intelligence in the mainframe modernization throughout the life cycle of the journey, whether it is assessment, um uh transformation, business rules, reverse engineering or testing and finally the automated validation of the security.

So uh we are happy to connect with you and showcase these demos of how artificial intelligence accelerate the modernization throughout the life cycle and we can spend some time talking about it. Uh please uh can catch up with me right after this. There are many sessions throughout this rain event and i know this is the last day and one of the last sessions. But i hope we all caught up with these sessions. If not, they're all going to be available at the um on the youtube uh pretty soon. And we would like to get your feedback and the survey, but we have five minutes if anybody has any questions we are happy to take. Otherwise we are also available right at the end of the so that we can stand and.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫