Bringing workloads together with event-driven architecture

最新推荐文章于 2024-07-25 13:56:03 发布

李白的朋友王维

最新推荐文章于 2024-07-25 13:56:03 发布

阅读量109

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134810026

版权

My name is Viraj Mahapatra and I'm a Principal Service Specialist SA with AWS. I mainly focus on financial services customers. I have good experience with Java and then I moved on to Groove and Grails. I kind of treated myself as an SME in Groove and Grails. But then in AWS, those are the three things which I focus the most - building EDA applications for industry use cases and very recently how to run effectively Java and serveless and even driven patterns for generative AI.

If you want to follow me at LinkedIn or X, these are my handles and you'll get more information towards the end. But yeah, that is me.

What we are going to cover today - I'm fortunate to have Nick with me who is going to introduce himself in a bit from JP Morgan Chase. He's going to talk more about merchant services and how event driven architecture has simplified the whole set up.

Then we'll switch gears and talk about EDA as a concept. We'll see some other use cases that applies to different industries.

And then most importantly, some takeaways - how you can learn some patterns with containers and then we'll summarize our learnings. And then I'll give you some resources that you can start with in this topic.

So do you want to introduce yourself Nick?

Yeah, so like you said, I'm Nick Stumpo. I have been at JP Morgan for a little over 12 years now, started as a developer - C++, Java, Scala, all that jazz. I do think a lot for work, so in my spare time, I like to work with my hands - brew beer, repair watches, play board games. I do have a full event driven architecture to automate my beer brewing. So if anybody wants to talk about that later, I'm happy to do so.

You can go outside, I don't have an X so sorry you can't follow me there, but I do have my LinkedIn name up there if you're curious.

So I'm here today to talk about merchant services. For those of you that don't know what merchant services is - we're a merchant acquirer, which is part of the process of paying with a credit card and other forms of payments.

Something I've learned is that when given these talks, it's very important to go through who are the major entities, how credit cards work because it's not a concept that everybody knows, even though they think or use one every day.

So who are those entities first? You have the consumer - that's you, that's me, you're the person that wants to buy something. You also have a bank - if you're using a credit card, you do at least. And that bank has relationships with payment networks - Visa, Mastercard, Discover, Amex, etc. They can issue credit cards that you can then use at merchants.

Merchants are people that sell you things - think Amazon, Walmart, Google, Apple, places you go to buy things.

And then finally you have the merchant acquiring bank - that's us. For those of you that do know this call, I understand this is a simplification, don't hold me to that, but it'll work for the presentation.

So thinking about the flow - you as a customer want to buy something, so you go to your bank and say "Hey, I'd like to have a line of credit because I don't want to use cash every day." So you go to your bank, they have a relationship with card networks, they create a credit card number for you with a limit and everything you're used to seeing when you log into your bank's website.

And then you can go once you have that card to the merchant and the merchant can give that to the acquiring bank in order to make a payment with the networks.

So today we're gonna talk about that bank. As I said, we are part of JP Morgan Chase & Co. Payments is our line of business, specifically we're the largest merchant acquirer in the United States. We also are the largest e-com acquirer in Europe.

We process more than 5000 transactions per second, sustained all day every day. This weekend way more than that - it was a huge weekend, kind of a nail biter for us.

And we process well over a trillion dollars in payments this year, we actually just passed two trillion I think in September or October.

So while that's fantastic, the majority of those payments are running on mainframes - we have two different mainframe systems, not gonna get into it, but the majority are running through that. It's on a 30 plus year old codebase. Everything's batch, and that makes it super hard to introduce change, it makes it super scary to introduce change.

And we wanted to alleviate that. So luckily, my bosses agreed a couple years ago, we got together, started a plan to try to modernize that and go through an event driven architecture as a way we thought we could do that.

So let's talk about what we need - we needed a global solution. Today, those mainframes are primarily in the United States, a lot of our customers are growing and it's becoming a global economy. We need to meet our customers where they do business.

It also needed to continue to be safe, secure, reliable - there's nothing worse than when you're trying to buy something, you put your credit card in and it doesn't work, people hate that. And JP Morgan is known for keeping that up, so we wanted to continue that.

It also needed to be configurable, it needed to have an evolutionary solution - the interesting thing about payments is that it changes all the time, it's an area ripe for disruption every week. It seems like you have a new wallet that you need to accept, you have a new thing like Klarna or PayPal or Alipay or Paytm or whatever it is.

People want to pay with them, the merchants don't know what they want, but they know as soon as somebody wants to pay with it, we need to be able to take it. So that's what we had to build.

Let's talk a little bit about a use case. Let's first of all, put those entities back up that we talked about. What we're gonna do here is kind of walk through a use case and see some events and see some of the architecture we have.

So let's say I'm Nick, which I am, and it's the holiday season, which it is, and I wanna buy something for my friend. So I go to my favorite online retailer and I find something great that's $100.

What do I do? I give my credit card to the merchant and they do what's called an authorization for $100. So they call us, we're the acquiring bank, they say "Hey, is this good for $100?"

We turn around and figure out ok, which network is this associated with? We mark that we've got that payment received, and we go ahead and call that network who turns around and calls your bank. And this is what's authorizing the transaction - you'd see this when you log into your credit card portal and you see like pending transactions or authorized transactions.

We're not actually moving money yet, we're just making sure you have enough open to buy in order to complete the transaction. Assume I do, so they approve the payment, we then mark that approved, we tell the merchant "Yup, Nick's good for that money" awesome.

They give me a little thing that says "Hey, your order is shipped." As soon as they ship that order though, they want the real money, they don't just want the promise of money.

So what do they do? They initiate what's called the payment capture process. That is they call us again and say "Hey, I'd like to actually do this payment because I'm shipping the goods out to the user."

So we say "Hey, payment's captured, awesome." What that does is kick off a series of steps that help us integrate this new event system with more batch based systems that a lot of the payments things are.

So we can use AWS components like Glue and Lambda and other serverless things, along with those Kubernetes based applications to continue this process and build some files.

So we build those files for this, let's say I'm the only person transacting in the world today, so we're gonna give a file to the network and say "Hey, we need $100." They're gonna say "Cool, give that to the card issuers, they need $100."

And this is where it gets a little interesting, because this is where people start making money. Your points, your cash back, your whatever benefits you have on your credit card, those aren't free because you're getting money from somebody. There's risk that the issuing bank is taking with you in order to extend a line of credit, etc.

So for that, they take their cut, let's just say $2. Then the networks, which think they provide a ton of value as well because they have a whole relationship with the issuing bank, say "Ok cool, we're gonna give you $97."

So we get $97 back from this. When we do, we have another serverless component that's gonna kind of get us back into our event driven architecture and say "Thanks, payment settled, awesome."

I see that I'm getting $97 back from the payment network. Finally, those merchants actually want their money, so we have a funding system that drops us back into Kubernetes that's saying "Ok cool, this payment's settled, I can go ahead and pay that out."

We too want our cut, so we're gonna give them say $96.50 back, again fake numbers, but you get the idea. And then finally it's not shown here, but you would go and pay your bank back the full $100, and that kind of closes the cycle.

So what I didn't talk about is you saw events going to the side, and some of them are driving our differentiating flow - payment capture directly started clearing, payment settled directly caused things to be funding.

But there's a whole host of other services that do care about the additional events. So let's take an example of that.

We get a payment received - that's cool, but we might also want to report on that so that our merchants can see how much they sent us, whether it got approved or declined or whatever.

Payment approved also good, but now maybe risk cares or data and reporting. Data and reporting cares for everything.

Another one - payment captured, we saw this driving our main flow, but we also might want to do notifications, we might want to start billing and pricing based on that, there might be value added services they're getting that are going to be keyed off of that.

And then finally the payment settled here again, driving funding. But I think you saw it go to some more billion in pricing and some fraud as well. That can happen at that. It's not a one way street though. It doesn't only drive from our uh main flow, right? You could have another domain come in such as client on boarding, right? Which is gonna say, hey, we have a net new merchant that none of you knew about before. Go ahead and record that on board. It understand how they authorize. So that comes in, right? And then all these other services can learn about that.

Cool. So we have this cool new architecture, right? Uh but does it work? Yes. Yes, it works. So we're happy about that. Right. That's why we're here talking. Uh but I'm not gonna lie to you. Right? Uh it's been a bumpy road. Uh events can get out of hand very quickly. We were super chatty, right? We've had to nail that back. Uh we were duplicative, right? Uh our events got big. Uh it became tough to wield and we had to really bring that back in, right? And it's a, it's a process of understanding your state machines, right? Understanding what events are truly important and what things, you know, reed models can take care of uh so important things to answer, right? But in the end, we have a system that's more observable than we've ever had. Uh we have a system that's uh fault tolerant. Uh I don't want to say that we've ever had because we have a very fault tolerant system, but it's fault tolerant. Um and uh we also, right, have a solution that can handle the scale of jp morgan chase on the cloud, which we honestly weren't sure would work. Uh but we've seen it work and that's amazing.

Uh our event driven architecture allows us to add new products, those new methods of payments we talked about, you can add those quicker than we ever could before on the old systems and modern languages that allow us to move faster with less risk. Additionally, other areas of chase that would have never got our data before. can start listening and seeing these events and we can start to make new synergies with other lines of business.

So I'd like to finish up here with a quote from my boss or my boss's boss. I guess I'm not gonna read the whole thing to you. Uh but I did wanna highlight what's uh sitting there in yellow, right? Uh and it's good that a product leader can recognize this, right? That when you have thoughtful engineering, when you're using the tools that people like aws can provide to you, right? You can rethink what's possible and you can do things that you didn't think you could do.

So with that, I'm gonna turn it back over to dra here and he's gonna talk a little bit about another use case. Thank you, nick. That was fantastic. And it's a thank you for showing us the journey of merchant services.

So now let's take a step back and think about why is event driven architecture necessary like this is not a new concept altogether, even driven architecture has been there. What we're trying to focus on today is how we can get those nice cities and the best practices while you're operating on cloud native aws services, right? But before we go there. Let's take a few minutes to visualize the same merchant services that nick just talked about. How would it look like if you want to do that in a traditional synchronous architecture without using any events?

So it did look something like this. So you have merchant services which is uh talking to networks, talking to uh cardiff companies, sorry uh through the networks, but also talking to clearing services, settlement services and risk and fraud services. But all of those communications are happening synchronously. So what are the problems that might come up most critically? You will see that this kind of architecture brings up tight coupling between systems. What we saw was the agility. In the previous slides that nick showed like when you have new domains that needs to interact with the entire service, they can just plug it in to the event broker which was kin data streams in the previous example. But in this case, now you have tight coupling. So merchant service has to know exactly which all services have to be called in order to actually process a request that's called temporal coupling. So it has to know exactly the order of calling its services.

On top of that, you have multiple points of failure. So if let's say settlement service used to send back a response in 200 milliseconds. But out of one particular sprint, some additional code was added to the domain and now it is sending back in one second. Now, the entire system has to go down, right? It it was performing better, but the entire merchant service system has to go down and perform the way settlement service performs. So you get a varying degree of uh services now and you also add external dependencies as and when you want to grow your system and grow this business, more boxes here will be added and then that will hamper your total productivity. So at the end of the day, it all matters about user experience. So that will drive a bad user experience.

So what we saw with uh next presentation was it was a reliable, resilient and independently scalable system. If I have to redraw the diagram that i showed in the previous slide, the main part that comes into picture when it, when we talk about even driven architecture is the even broker in place. So the even broker here that you see is responsible for decoupling the producers from the consumers. And that's where you get the flexibilities. But there is a trade out though with synchronous architecture, you know, when a client calls, you are bound to send a response immediately. But in event of architecture, you have to think a little bit differently and you have to think about asynchronous communications, right?

So the beauties that we see here is merchant service can just produce an event to the event broker and based on who are consumers system or consumer domain wants to consume that event, they can consume it and then send a event back to the broker. If necessary, any kind of notification that has to go to the end user has to go through a different domain, maybe notification service where you can have web sockets i od core or sending an email, whatever overall the end user experience will be better. And what we also saw that as soon as you add a new domain, like customer on boarding, you are not touching any other places or any other domain. So you get a good isolation while you are working on adding new features.

So as a whole, what are the nature of even driven architecture that we touched on? Again, as i mentioned earlier, you have to think a little bit differently events are asynchronous in nature. The moment we start thinking about asynchronous city of events out of traditional synchronous way of communication, it will be easier for you to grasp even driven architecture at the core of event of an architecture. You have even routers or even brokers, that's the core component which decouples your consumers from your producers. So the producer can produce or send events at its own pace and consumers can consume those events at their own pace. And most importantly, when you have spiky workloads or when you have a producer who is producing events on our largest spike, you need some kind of event store at the consumer side to make sure that it doesn't overwhelm the consumers. So think of example as qsqsq is one of the aws example where you can use them at even as even store at the consumer site. So they buffer the messages until the services are available to pick them up and work on top of it. So three core things, asynchronous events at the central nervous system of even driven architecture as a pop up will be even routers or even brokers and even stores where you think that spiky workload can over overwhelm a consumer on top of that if you want to compare and contrast.

So when we talked about traditional architecture, synchronous way, you can see the degrees of freedom that you had in place is relatively lower because there was tight coupling. If you want to add a new component to it, you have to test every system that was touched or every system that is already there, even if you did not do any changes. So your degrees of freedom are lower and then they are actually directed commands. For example, if you say please send an email confirmation, now the caller has to wait until the email confirmation is sent and then it will move forward. So once the email confirmation is sent, you get an acknowledgement or a response back and then the next step will be done. So these are all directed commands, but then you lose some degrees of freedom.

But what you get with the event driven architecture is observable events. So think about this example of uh filing an insurance claim for producer sense customer x has just filed a claim and that's the event that goes to the event broker. I can have multiple consumers from different domains that can subscribe to that particular type of event that a claim has been filed and then they can work simultaneously. One of the entity can say, ok, let me let's go and verify the policy details. The other one can go and say, ok, let's go through the documents that were uploaded by the insurer on. You can say, ok, let me take care of notifying future communications with the end user. These are all good and they work independently. So they don't have to know that there is a different consumer sitting next to them or working on a same type of event in future. If i want to add a fraud service, if i want to do something to check as fraud for the claim that has been filed, i can simply add them as a different domain. And the beauty of this is that neither the producer knows that fraud domain has been added, nor the other three consumers know that a fraud domain is actually working on the events. So you can imagine the isolation that you get with event driven architectures.

Now talking about incidence, claims processing. Uh let's see how actually as a demo how this whole setup works. And then we'll see how we can get different types of workload together in this example. And then we move forward.

So traditionally, when you go for insurance, you get a quote from insurance companies. And once you're happy with the quotes, you register online via a web application or using your devices, and then you provide your personal information, like driver's license, email addresses and social security numbers. And then optionally, you are asked to upload image of your driver's license or image of your car. And now this is the registration process when the insurance company validates that your information is all correct. So they do a kyc process validate, the email is correct and all of the provided information are good. They on board you as their customer. So now you are an insurer.

However, the reason of having an insurance in the first place is to handle the worst case scenarios in our life in as swiftly manner as possible, right? So let's assume one day we are driving to work and then suddenly you've met with an accident and you have to now deal with the whole process of claims, right? So first of all, when you see that you have recovered from the incident and you're safe, the next set of uh step starts, either you should file a police report, either you have to capture the information of the other insurer so on and so forth and finally once you capture all of those information, including the images of the damaged car, you file a claim and that process is called first notice of loss. Anybody from insurance domain, they already know what is fnol? Thank you, fnol is first notice of loss. So that is where the initial handoff happens. So you hand off all your details to uh the insurance company saying that, hey, I'm filing my claim here are all the details of that incident and let's work on that claim.

So let's see in practice how it happens. I have a sample application here which is pretty simple. It's plain vanilla doesn't have doesn't cover everything, but it has enough information to showcase how uh you onboard a customer, how we have uh uh documents uploaded and how the whole claim process works. Ok? It's a simple form where i have preloaded some of the information uh already with first name, last name, email addresses and uh uh dummy ssn the address and the make and model of the car that i'm going to uh try for insurance and then provide that information to on board. So let's submit this and one. Yep

So now we got a response back from the back end. So it looks like a synchronous response. But actually it is an event that has come back from the back end via a notification service to the browser. It says, ok, the customer is accepted all of the provided information looks good. So you are partially onboarded because you still have to provide driver's license information and so forth.

So let's say i'm playing, i'm in a jolly mood. I want to provide a wrong driver's license or my brother's driver's license to, to just uh test how the system works. And I upload my driver's license and you will see this one. Once i upload this driver's license, it goes to an S3 bucket and then it kicks off a different processes. One of the process is i process the document using textract. It extracts the information from the driver's license and then it figures out whether the first name and last name that is present in the driver's license matches with whatever you provided in the form. If it doesn't match, then immediately fraud is detected that there is some document fraud happening and the first name doesn't match.

So you get a notification back in the browser immediately. So let's go and fix that. I have the right driver's license right now. But the same process of document processing uh goes up. So once you upload a document, i process the document using text track. Now everything looks good. So fraud is not detected. I got that even back for that particular customer id that is shown here in each uh label.

Now, if you remember if you've seen the form i provided that my, i have a green color card. If i use a car image that is not green. The same thing will happen. It will process the document, it will figure out the color is not green. It will say there is some document fraud happening. It will say, ok, it's document fraud. Your car color doesn't match. But let's skip that part. I'll upload a green color car image again. Any time i upload any kind of document, whether it's a driver's license or a car image, it goes through a document processing and this time the car color has been detected, it's green and there is no fraud detected.

So in order to detect the color of the car, i'm using recognition custom labels where i have trained the whole model to figure out what is the color of the car when i upload an image? So it figures out it is green color.

Now this is all good customer has been on boarded sometime in the future. You have to file a claim because there was an incident when you file a claim. These are the information that you provide. This is not limited to justice. First notice of loss can have multiple information when you file a claim. But this is just capturing some of the details like where did where did that incident happen? Whether it was a rare end collision or something else? Number of passengers? Was there a police report filed? And what are the information of the other insurers that were part of the uh incident and then when everything looks good, you submit the claim.

If there is any error or if you provide any data that doesn't match, you'll get a claim rejected event. But this time since we have all of the good data, we got a claim accepted even back to the browser.

Now, my job is to upload images of my damaged car. Now i mentioned i have a green car. If i upload any damaged car that is not green, i'll get a document uh processing and a fraud detected message. But let's go ahead and upload uh the green car with a bumper dent. And again, when i upload this document recognition, custom label will figure out that there is a bumper dent. It's not hard coded, it's a recognition custom label going and figuring out the color is green. The damage damage is a bumper dent and there was no fraud detected.

But one thing that you it was very quick which we did not see uh uh right away is as soon as there was no fraud detector, i immediately got two events, one is settlement finalized and the other one is vendor finalized.

So when everything looks good with my claim, the damaged car information is good. I can immediately fire an event where settlement can come up and say, ok, everything looks good. Let me go and settle your claim and let's figure out whether you have to pay $100 out of the pocket or $200 out of the pocket. But rest of the payment we can take care of.

Now once settlement is finalized, nobody is asking vendor or nobody is actually directing a command. Now, go and finalize my vendor. The moment settlement is finalized. Another domain wakes up. That is the vendor domain that checks the settlement has been finalized. Let me go and do my work. That is finalize a rental car vendor, we'll dive deeper on each of those domains.

So while your car is being repaired or while your car is in the shop, you will need a rental car to to go around. So the the vendor system will make sure that you have a rental car in place.

So this is the whole uh demo as how this event driven architecture works. We'll jump into the underlying architecture to show actually how the choreography of event happens. That way it will be pretty clear and then we can go from there. Ok?

Back to the presentation. So what we saw is this architecture, the blue boxes are the uh domains that actually did the work. The red box is the even broker. In this case, i'm using eventbridge and you will see some dotted boxes on the left which is sign up api, those are the api s that are used to sign up fnol api to uh file a claim and then you have claims api customers api to read that information. Back as a get request and then you have notifications service to send notifications back.

So when we started with the assigned up process, that was a post call that went from my browser to an api which got converted to a customer submitted event. Now in my even broker eventbridge, i have a rule that says any time you get customer submitted event, make sure you send that to customer service as a customer submitted event and then customer service can work on it. You get a customer accepted event back.

Once you get customer accepted event notification service checks that ok, i have customer accepted event. Let me go back and send that to the end user so that we'll know that customer accepted. So this is the blue card that came up the moment i fill the form and submit it, that's notification. So sending back the event type and then we uploaded some images, a driver's license and a car image.

The moment we uploaded image, i got an object created event from s3 that triggered document service. We did all the processing with textract figured out uh whether we have data or not once document processed event was emitted by document service. You see there were two lines that went one to the fraud service, the other one to the notification service.

So nobody asked fraud service to actually go and check fraud, fraud service was waiting for that event, document process event to come up so that it immediately can wake up and start working on fraud determination. So this time we saw with the wrong driver's license, we got a fraud detected event and then we notified that to the customer.

We after fixing the images, we filed a claim. So that's fa o api call and then claim was requested. The claim requested even goes to claims service. The claim was accepted and then we notified back to the customer.

So notice one thing since i'm using even bridge, i have rules defined that can selectively, i can selectively target a particular domain based on the event type. So there are other brokers where you, you might have to actually call the consumer and the consumer has to bail out whether saying i whether i need the event or i don't need. But with even bridge, the rules can define whether i should call customer service or claim service or document processing service.

And then we uploaded images back uh for the damaged car. Same process happened with document service and this time, it will again check for fraud whether there was any document fraud. We did not see any document fraud when we uploaded the car made. So fraud was not detected and you see immediately settlement service consumed that event and then provided the settlement finalized even back to the system.

What you don't see here is the vendor service. What what was in place? We we'll talk about it later. But at the core, the even broker amazon even bridge was in play to choreograph those events.

So with amazon eventbridge, you can have events coming from aws services or your custom events. In this case, i use custom events or you can get events from your sas applications to a partner event bus.

Talking about bus, eventbridge is based on buses. You have default event bus, you have custom event bus and your partner event bus. In my example, i used a custom event bus called claims bus or claims insurance bus.

And i was talking about rules. So i have defined some rules based on even type. I can see if i see a customer dot submitted even coming in. I'll say the target should be customer service. If i see a claim or requested coming in, i would i would define a rule that my target should be claim service. So i can selectively provide those consumers who are actually taking those events and working on it without bailing out.

And then we have a bunch of targets. It can be anything out of those services. There are more, but you can definitely check out eventbridge targets.

Now, if you want to dive deep to one of those uh domains where we use lambda function as a consumer, that's a choice of compute i chose for claims service. So when fnol api submitted a claim requested event to an event broker, the claim requested event actually got sent to an amazon sqs. That's the target for even bridge. And you can, if you, if you can remember the slide where i talked about nature of even driven architecture, we had even stores, claim service makes a good place for you to put the event store. Because when catastrophic events happen, like earthquake or flood or multiple car damages during winter and icy conditions, you will see a spike in claim applications, right? That's where you need an even store so that the lamda function behind it doesn't get overwhelmed.

When the spike of traffic comes in, you might not need this uh sqsq as a buffer for on boarding, but you can use it. But claim service makes it a good example. Like if you have a spike in traffic, you can have sqs as the buffer and then lambda function can pull from that sqs q at its own pace and work on it. So that's what claim service does. Once it is done, it can it sends a claims accepted event or a rejected event based on the type of information you have provided even driven with a wf functions.

So again, talking about different type of workload. When you upload a document to the bucket, we saw object created even goes to the bus and then that triggers a state machine. In this example, i have a state machine that figures out whether it's a driver's license image or a car image. If it is a driver's license image go this path and do used extract to figure out the data go from unstructured to structure data. If it is a car image, go through recognition, custom label and figure out whether the color of the car, what is the color of the car and whether it has any damages or not. And then once you figure out all the information the staff function reconciles the data and puts it back to the even broker saying that hey, this is the data that i got from unstructured images and you have structured data with the event type of document processed.

So now the downstream system will be eager to subscribe to document processed event and then work on it. Fraud domain being one of them. How did i use even driven with aws fargate? So think of settlement service assume settlement service is one of the domain which is uh working. It is a microservices and they have been integrating with other domains using s tt pn points, right? So you have a s tt pn point and your aws fargate runs a springboard application behind a load balancer and use the settlement table.

Now, how how do i want to integrate this with the even driven architecture that i've already set up? So you go to this team and say, hey, we are running an even driven architecture. We want settlement service to integrate with this architecture so that the integration will be seamless. They would say, oh yeah, but i have http end point. How would you integrate with? Do i have to do any changes on my side? I don't have any resources to do those changes.

Then whoever is maintaining the event, even architecture for the whole claims process can say, oh, don't worry, we have even bridge and we can have api destinations calling your satp end point. Right. So any time a fraud not detected even comes in, we can have our target as a p a destination that actually is the satp end point for that settlement service. So the only change settlement service team has to do is send an even back called settlement finalized. So that the rest of the domain will know that settlement has been finalized.

How did i use even driven with amazon eks? So the vendor service that i have in the sample application that provided that enterprise rental car should be used for your uh claim again, same set up. But in this case, i'm using amazon eks, which is talking to different car rental service and asking for quotes. If i want to integrate with this, i can very well do that using the api destination that i showed earlier, but i want to go and try something different. In this case, i'll go with the sqs route. Any time a settlement finalized service event is published to the event broker. I want that to be sent to an sqsq and i want the eks uh service to pull from the desk sq and if you see the small icon i have on the dks icon, it's called k a. Has anybody heard about k a? Yes.

Ok. So i use k a. We'll talk about k a. What uh what it does anytime you have messages in sqsq, you want your eks pods to scale lambda scales automatically when you have uh messages in the queue that is taken care by us. Aws. When you're running it on eks, you want to scale those parts based on the messages that you are getting to the queue. That's what kira does k ic. It is even driven out of scaling so you can scale your containers and cities which are even driven and kr provides built in scalers for you.

So aws services are built in scaler provide present in, in that whole setup. So you have sqsqs kness data streams, dynamo, db, dynamo, db streams and more. And you can scale based on number of messages or events that are in those services. So in my example, how i did using k a for render service, i use aws cd k, which is our infrastructure as code tool. So this is similar to how you would write a manifest file in a community setup. I have a kah chat, i have a cluster if you see in the first line and i added a home chat for k a and i added a manifest for uh k a. So which is the a p a version? Uh k a dot shv one alpha one, the most important part is the kind that i have defined. The kind is scaled object, i define the scaled object. And then i see the community's deployment to scale. So i say once i define the scale scale object, i want this deployment to be scaled up or down, which is vendor service and my trigger is sqs. So the sqs that i had earlier, that is the trigger on. What trigger should i trigger this scale? My deployments up or down if you see a q depth of five messages in the queue. So you get even driven scaling for eks sports doing this and then you can model after the pods or role. So you don't have to build your own authentication role for it. So you get the whole setup and you can say that my pod should scale not just on cpu or uh memory, but now i want them to scale based on the number of messages that are in the queue.

Now this is ka which will scale the pods. But in order for you to scale the nodes, you would need something like carpenter. So there is a difference between how you use carpenter and k a.

So let's see what are the benefits of using this kind of setup, right? So as a recap uh we had an extensive architecture. So literally, i added vendor service to this entire system in three days. But the sample application i had already, the last thing i did was added vendor service and it took me two or three days to just add and test. So i was able to add them. I extended this architecture. If i need to add a few more blue boxes, i should be able to add them without touching whatever i have here, right? So it's extensible architecture, you also get polyglot architecture. So if you have different teams owning different domains and they work in different languages, you don't have to ask them to move to a single language of choice. They can have different languages of choice as long as they can publish and subscribe events back and forth to the broker. And this is very common in insurance industry like third party integrations. How would you do that with amazon, eventbridge and api destinations? You can call any third party integrations, they will do the work and then you can provide a web hook back so that the third party integration can call that web hook to your system and then you are back into the game of playing with events and choreographing events.

So the biggest takeaways as you have a highly extensive architecture based on events, you have a seamless integration with polyglot themes and domains. You add new capabilities without the need to test the entire architecture. So if i want to add any generative a i capabilities to my staff functions in document service, i can very well do that without touching any other domains, i can orchestrate inside a domain like for uh customer service, i have a step functions which does a kyc process and then sends a presigned url to upload images that is a whole orchestration using step functions. And i can seamlessly integrate with existing workload like settlement service and vendor services which are not actually built on lambda, but they are built on on continent services like far and eks. So i can bring different types of workload and make it work together seamlessly using even architecture.

Last but not the least. We reduced the area of impact by this set up. When i added vendor services, i did not practically test any other system. As long as vendor services were working, i was happy with that. A similar set up a similar application that you will see.

Here is a video. The one i talked about is insurance claims which i built from scratch. Our surve as d have built several videos similarly with even driven architecture. So you can scan this qr code and you will see you will be forwarded to an app where you can see our so d or amazon folks are streaming videos. You can check those videos in a playback and it's a live streaming if somebody is streaming right now. You should be able to see that what it does is a live streaming and on demand video. And when they are streamed, you have capabilities to actually like those videos, send some emojis. It's very interactive when you play with this and then there are multiple capabilities right after the stream is done, you should be also able to go and see what that app does with that streamed video that just completed. Some of the capabilities include the titles that are generated for each video. They are all driven by generative a i. So we'll we'll see how that works. But the most important part of this whole setup of surveillance video is the plug in architecture. The way i was able to add vendor service, uh settlement service to the domain. The way nick was able to showcase billing and pricing risk and fraud to the entire setup. Our d a team has added a plug in architecture setup where everybody contributed writing plugins to this surve video and those plugins will act on top of the video that got recorded.

So for example, i can do a validation whether the video is long enough to watch or not. The video has the right content to watch, whether i should translate the video to a different language, whether i should transcribe and get some text out of it. All of those are done over a set of plugins. And then i have a choice to figure out whether i should use a lambda function or i should go to ecs fargate to process some of the videos we'll see.

So the high level architecture is again similar to what we have covered boxes on the left which are all uh towards the front end, we have an even broker and our back end that are the boxes in the blue and in the bottom you will see, we have extensible plugins, we have different services and they emit events or produce events to the event bus. And on the right, you have services that consume those events and work on top of it. So the entire architecture looks like it, it looks overwhelming. But if you put a magnifying glass and go and look at each service and the whole concept of that we talked about it is pretty seamless. So what it does, we have a front end application that is hosted on s3 and then we have an access layer that provides the a p for the front end application to access like i did for sign up api and fnlap and then emit events are emitted to the event bus and then the video post processing service figures out which compute to choose if i have a video which is longer and i cannot have it work on a lambda function which has a 15 minutes time out. Maybe i can run them on es fargate.

So those are the choices that can be done in this service. And then it there we have a Step Functions with a bunch of hooks, life cycle hooks that brings those plugins into life and those plugins are the extension that bring those capabilities. Like do I want to moderate? Do I want to figure out if the transcription is required, whether transcribe is required, whether I should test before I publish et cetera.

And finally, similar to my notification service example where events are coming back to the browser, you have a real time IoT topic which sends back uh notifications to the front end. Ok. So this is the entire architecture.

So the extensible plug in architecture looks like this. Uh I contributed a content moderation plug in that you see on the top, right? So anytime anybody streams a video, my content moderation plug in looks at the video and figures out if there are any explicit language or if there are any different uh gestures used in the video, it will try to filter that and then it will notify that this is not good to be playback, playback. That's a Step Function using recognition, content moderation.

Now let's talk about the generative AI part. How did we use generative AI? So first we have uh we transcribe the video once we get it and then once we transcribe the video, we get a text of the entire video. And once we get the text, we emit an even back that the transcription has been done. Now, the next step can start in this case, we provide that transcripted text to a state machine and then that text goes to Bedrock and the prompt that we provided is generate a video title that doesn't contain any offensive words for the following truncated scripts.

Now, that script was gotten from the transcribed text from the video that we just stored in n s3 bucket. Now, once we got that uh to Bedrock, now Bedrock takes its time and then generates a title saying that AWS announced this new feature yesterday. I did uh a live stream here and it exactly provided me the title which which fits to that video. So see this, go to the expo floor and see how it works. You'll enjoy it.

Now, once the title is created, it is emitted back to the event bus so that a different plug in can use or the web front end can use it to show it in the front end.

We talked about this. If you have a core functionality of working on a video, you want to put that over from a lambda function to ECS, you can easily do this. If a video is 20 minutes or longer than 20 minutes, I want that to run on ECS on an ECS task, then I can put over the same code and run it in an ECS task as long as the signature matches.

So if I want to run it on a lambda function. If you see in the bottom right, that is the method signature to run it on a lambda function. Anything inside that can be ported over to my ECS task. And that can do the same thing if the video size is longer than 20 minutes. So you get the choice to choose a different compute. That's how you bring different workloads together.

So this is all purpose built. If you see that you have uh necessary that you have to move outside of lambda function and do something on ECS far, you have the choice and you can easily do that and orchestrate that work in stuff functions.

Now, let's look at some of the EDA patterns with containers on AWS. We have covered some of them, but let's look at most ones.

So how can you orchestrate containers for large scale data processing? So let's say you have documents that gets uploaded to an S3 bucket that creates an event called object created event and that goes to EventBridge and we have Step Functions as a target. Now, you have bigger files or bigger objects that are stored in S3 bucket Step Functions has a feature called Distributed Map. If you have not heard about it, you can ask that in uh the Suris video uh booth or the Surve Espresso booth or you can ask any of our Surve folks, we'll talk more about uh Distributed Map. It's one of the most powerful feature in Step Functions which is a map, reduce functionality which you can run at scale and the whole setup is serverless.

So I have a number of objects, I can run a distributed map and I can preprocess that data. And then ultimately, I can put a message in the queue. Let's say you already have that logic available in EKS somewhere. So you want to have systems that are already processing, you have a run job, you don't want to put them over to some other uh functionality or some other compute. You want to reuse what you have on EKS like I did with vendor services.

So I can from Step Functions, I can say, ok, call my EKS uh process run job and let me wait for the run job to complete once that one job is completed in EKS, it will send back a task token and Step Functions can move to the next step. So that will keep on continuing. So you are reusing what you have on EKS and you see I have put that KI on there because I want to scale EKS pods based on the SQS messages that are coming in.

So that's how it becomes even driven while you are still orchestrating your entire work using Step Functions. Once that process, raja is complete, I can reconcile in a similar manner and I can call her EKS job using KA again to scale based on the number of messages in the queue. And then I can wait for that. Once it is success, I can come back to the orchestration. I can do a reconcile. I can send an email, I can send another event back. I can do all of those things while also integrating with what I have earlier, what I had earlier, that is the EKS uh run jobs.

Similarly, how can you trigger ECS tasks based on EventBridge rules? So if you want to schedule a task, a single task, you can do that with EventBridge schedulers, EventBridge schedulers if you have not heard of it, it's a new feature that got added to EventBridge last year re invent. I highly encourage you to check out. You can schedule jobs. It is fine grain scheduling, it handles daylight savings as well. You can schedule jobs based on uh on, on different types of schedulers like cron or on a date or a, on a repeat, et cetera. And you can trigger a ECS task based on EventBridge scheduler.

What if you want to trigger multiple tasks? You can still use the EventBridge scheduler which can emit an event to the default event bus and you have a rule that can trigger multiple tasks in parallel at the same time, let's say 9 p.m. I want to trigger three different ECS tasks. You can do that.

How do you have streams and queue scaling EKS deployment? We saw that with uh SQSQ, you can do the same thing with DynamoDB streams using KA. This is how it will look like the scaled object with the trigger shown as AWS DynamoDB streams instead of SQSQ. And you can say that when my shard count is two or more, I want to scale same with Kinesis data streams uh based on the shard count. And what we saw was uh SQS in the previous example. So all of these are scaled objects now last but not the least.

This is my favorite part. And I have folks here who love this hexagonal architecture enables to build event driven architecture. So if you are already using hexagonal architecture, you will be able to easily adapt to event driven architectures.

So let's take an example of uh ECS far right. I have my business logic running on Spring Boot or Node.js Express or Python or Flask. That's my domain logic. Ok. I want to abstract that domain logic using hexagonal architecture. If you are using hexagonal architecture. The idea is you have your domain logic abstracted away from different uh ports and adapters. So that, that stays the same and then you can have multiple ports and adapters work with that domain logic.

So we have different ports. Like if I want to have an HTTP endpoint call, then that's a different port and a different adapter. If I have something else to be called, I have a different port and a different adapter. In this example, let's say you have an HTTP endpoint which is a HTTP adapter. You are doing a put request, you have a port that goes and calls your domain logic. The domain logic doesn't know whether it's an HTTP call or a or a EDA or a message driven call or a web socket call. It doesn't know you have a different adapter for DynamoDB as well for put items.

So when the domain logic does uh create order, it doesn't know whether it is actually persisting in DynamoDB or Aurora or any other database. All it has to do is called create order. So think about the interfaces that you can build, which are highly extensible.

So you use those interfaces and adapters, I can extend this kind of setup to have instead of slashed order as an input coming in, I can have it as Amazon EventBridge, sending it to SQSQ and that becomes my order requested event and then an adapter which sends a event back to EventBridge call order completed. So these are the choices.

Now, the choices of AWS services is not limited to what we have shown here. You can have any kind of uh compute service, any kind of a API service that are shown here. The event routers also event stores as well, whatever fits your business you can use this. But the concept of EDS stays the same as a summary.

The integrations becomes easy when you try event driven architecture and it facilitates evolutionary architecture, you can evolve really well. You get fine grain scaling because you only scale the domain that needs scaling. You can bring multiple compute options together and last but not the least, you have reduced your impact, uh fault impact drastically because you are dealing with events and event driven architecture with that.

There are other sessions that you can check out uh which talks more about it. The top one is my CH talk, which I'm giving this evening API 306. And I'll be repeating that tomorrow if you want to learn more. We have skill builder. You can scan this QR code, you can learn more about uh event driven architectures and serverless in general.

Again, I highly encourage visit the re invent expo and look at uh SAS espresso modern app zones and the village where you can see more of these examples.

Finally, I'll stop here. You can scan this QR code to get all the resources and eventually you'll get the slide deck as well where you can learn more and you can reach out to me as well if you have more questions and make more questions. Uh let's talk. If you have questions, let's talk outside. Uh but thank you for being here and please take your time to provide your feedback, how did we do? And uh we based on that?

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bringing workloads together with event-driven architecture

Yes.
复制链接

扫一扫