Accelerating innovation with high performance computing on AWS

Ravi: Hey everyone. So my name is Ravi F and I'm the Principal Semiconductor Industry Advisor at AWS. And with me, I have Henk Coenen who is with NXP and we're going to talk about HPC or High Performance Computing and NXP's movement to cloud.

So first of all, I just like to set the stage with what is high performance computing and how it's in our everyday lives. So essentially, you know, with C with HPC, we can build designs faster with simulation techniques like CFD or Computational Fluid Dynamics, we use it for doing drug discovery like the Covid vaccine, for example, lots of genomic analysis and that goes on and on.

And one of the things we've actually left out here, which is the key part of the talk today is semiconductor design, semiconductor design is an absolute HPC application. But what makes it different from other, what we, what we might call traditional HPC is the fact that it's not tightly coupled, right. So this is essentially embarrassingly parallel HPC.

All of that is changing a bit as we go into the future HPC traditionally requires a lot of compute and storage and the problems of dealing with just get bigger and bigger every day. And as a result in most customers environments, they tend to be limited in what they have in compute and storage. And as a result, many customers wind up waiting to get results. And from survey data that we have, we know that 72 point over 72% of these organizations have reported delayed or canceled jobs and also running on old hardware.

But the older the hardware you have, the longer it takes for these workloads to complete. That's where the benefit of the cloud comes in because you get access to the latest and greatest and you can basically really accelerate your time to results and get rid of technical debt, which is a real problem in these spaces.

So talking about chips, some of you may or may not know that a that amazon themselves does a lot of chip design, right? So within the own without within our cloud itself, we have chips based on arm processors called the graviton series. We have our own train training and influencing chips too, training and inferential. And in addition to that, there's a whole slew of uh chips that we have. They go into the fulfillment centers, they go into our robots into all of the consumer devices you have and it goes on and on.

So amazon being a big chip designer has actually also figured out how to use the cloud to do very advanced chip design, uh our graviton processors for the example are on sub 10 nanometer nodes, five nanometer maybe even uh progressing even further down the chain. And they are fully aware of as to how to design these very complex chips in aws.

This is the chip design flow. You know some of you may they may be familiar to some not familiar to to others, but essentially the chip design flow is a very complex process. It often takes 60 to 100 different 100 applications at very high levels of parallelism and using a ton of storage and storage of different types. And I won't belabor this chart. But essentially, you know, we have customers like nxp who are running the entire process from front to back on aws and achieving tape out and production successfully.

One of the very interesting use cases of which there are many in aws is um secure collaboration. So in this era of multiparty design or multi ip party design, you want to be able to secure in a very sorry, collaborate in a very secure and auditable way. And one of the solutions we have published is the secure collaboration chamber which is being used across the industry, not just in semicon design, but across hpc across uh multipart multi company, even multi country collaborations.

Um just that's an example of some of the stuff we have, we have, we have many solutions in this space and i pointed out towards the end of the presentation where you can get more data. But at this point, let me turn it over to hank, who's gonna talk about nx p's journey? the cloud.

Henk: So hello. Um yeah, the cloud. Is that something new? Uh everybody is talking about the cloud 2006. But uh already in 1974 there was a German guy who made a song about the cloud. Uber. The w vulcan must freiheit. W came a low sign. So above the cloud, uh all the freedom is unlimited. So it's nothing new.

Um but that freedom, if you look from a more compliancy point of view or socs point of view or security point of view, yeah, we have to protect ourselves. And for those who know it, um at the beginning of the, of the 2000 and new century, there was a book written by henry chesbrough about open innovation and open innovation is everything about the ecosystem with businesses work together and have to work together to invent new innovations.

Now, also in 2023 0 sorry, 2003, i said for myself in my signature, we should not secure our sellers or control ourselves out of business. Otherwise there will be no innovation.

Now going forward nxp and my name is hein kun heading a cloud center of excellence working for nxpnxp is a semiconductor company, semiconductors, highs or chips and we are a global company 34,000 employees of which one third, around 11,000 engineers and it's typically an high tech companies and those engineers are distributed all around the world.

So we have design centers either in china, in korea, in, in the uh in the us even and in northern europe all distributed. Now, what are we making lexi is making i cs semiconductor devices for smart devices or iod or as we call it? And what is characteristic around those devices, they all sense, they think they connect and they act.

Let's take an example uh nhp makes devices for uh automotive. So think about your car for the most modern cars, they have all keep your lanes. So there's a radar in it, it sense whether it's left from the lane or right from the lane, it has to think, hey, i'm on left. Am i right? Have i have to do something i have to connect probably to the engine or the steer wheel to do something and to x now we do that in a secure way and in a safe way and the safe is ma do not change too fast and secure. Am i only allowed to connect to everybody?

Now, this is typically i cs which you make also by the way in an uh a car which is the most complex smart device then followed by your smartphone and then followed maybe by your thermos state at home.

I think uh this picture already have been presented. But this is basically the flow what we do. So you do a front end design verification orders. Typically in that verification, there is a lot of simulations in it. And if you think about them, i see sometimes you need 20,000 corks to do one simulation and millions and millions of small files just for one design.

If you don't go to the back end verifications where we uh try to, to simulate how the ic is working uh and shoot working and is it producible in manufacturing sites? Then you have massive compute environments and, and jobs which are running sometimes for 3 to 4 weeks in the long run now and last but not least, we also manufacturing our i cs ourselves or externally and test them.

Now, if you look to those front end and back end verification, then typically uh we're running 40 designs in parallel and there's a high demand on it and we never can predict when typically the tape out is. So what are our challenges time to market? We have to meet the right time that our product is ready for our customers.

Think about the mobile phones that are all sold around the christmas period. If we are two weeks, too late, then we can better wait for a year now. That's not good. The same is true also for the slots which we have in the manufacturing sites. So time to market really essential then resource intensive.

And if you think about technology advancements we make in technology notes, i see sometimes five nanometer or 40 nanometer, it's even moving to three nanometer. But between those technology notes, you typically have 2.5 times more compute power difficult to forecast. We never know when the tape out is exactly. And we have a very heterogeneous resource profile.

So if you look at over the last 10 years, no, i think it's 10 years, then we see steep growth both in storage and in compute. And if you know that sometimes we do the design in the us than in india. Um so on the last of on the, on the former slide, uh it's really the cloud which helps us in our flexibility in scaling up and have sufficient capacity available and last but not least to be in time.

Now, all the capacity is always to balance between cost functionality and quality cost will go up mr cloud in our case. Now, in another case, i told you design work production test, if you now typically look to, then this very simplification of our process. On the left hand, you see the logical design that's called the new product introduction. Then it goes through in the in a foundry it, you get it on a waiver that waiver is sliced and then it goes on reels and then it's wire packaged. And at the end of the day, it's in a solution in this case, a car, our intention is always to build products which are 100%. Ok? So we have parts per billion, so zero parts per billion. So no failure at all. But sometimes we get a question that a product of nhp in this case, for the car is not perfectly working in that car. I'm not saying failing but not perfectly working. So we get a question and then we have to connect back to all our data. So you see that all those red ro uh red dots there. So we have all the information in the different steps in different databases. But now that question comes and we typically have to answer in two days or three days only question. And in the past, we had an enormous lead time in finding all the data back for that specific device.

Uh so what did we do? We basically collected all the data via graph technologies and neptune and aws to very rapidly connect the data back. So we can now, in the past, we it took us 2 to 3 days to find all the data. We can now do it in a couple of hours. And then we have more time to really analyze what is going on.

Actually, there were two examples and that is everything for nhp. It is about reducing the cost. The right balance with time to market and quality time to market is speed. Of course. Uh equality is obviously the best and reduce the cost, optimize the cost. So that from an nhp perspective, that is right and the cloud helps that. But how did we do that? Because that's essential, it's not about one product, it's about the total co-operation i gave you now the example of the simulation and the verification fuel, but we have more applications running in the cloud.

So we said, hey, we are going to set up a cloud center of excellence not reporting to the it infrastructure team. And we did that on purpose. The reason for doing that is because cloud means a paradigm shift also for your way of working. So you should not do that in the same way as traditional it, the vision there is um accelerated sus sustainable do use of cloud technologies across nhp. But on the other hand, also reducing the risks both from a financial security.

What have your perspective in place the team, the cloud center of excellence that focus on four topics, the governance and finance. So we set the policies, we control it, but we also safeguard where everybody lives to that one. I gave you two examples. Then uh we call it the security and operations. We provide cloud landing zones where all the applications and systems of nhp run upon. And obviously that is uh fully automated. Uh that is we call it uh uh automate our operations using c i cd pipelines infrastructure as a code we operate with zero downtime during changes and we have the best in class security posture management.

Then we have development all the new applications which come into the cloud, we contribute to that, we help them to move that to the cloud. And last but not least, we also train our people, not only my own team, but a total nxp organization is trained on cloud technologies, not only on cloud technologies, but also how should you do dev ops and how should you change your way of working? And training is in our case, not only giving uh uh classroom trainings but also hands on trainings and, and practice, et cetera. Here is the remaining formatted transcript:

Henk: Uh oh be shut out a few strategic decisions. We started actually with a cloud first strategy and we now rephrase it to cloud smart strategy because some people consider that as a mandatory step to go to the cloud, which you should also do as a business case, restricted multi cloud uh clouds are loosely cod. So if you bring your data to aws and take care that all the data and all those applications remain in aws or you know, a competitor of aws, because otherwise you get your egress cost which go sky high workload driven. What is meant is that the application owner, the business owner, he or she is responsible to the site and make the business case to go to the cloud in order to get that running? We said, hey do we start with everything? No, we only used for one application, the proof that everything is working and then committed to only one cloud provider and which cloud provider we throw over a coin and we started with that as simple as that because you have to prove that it works.

No about the paradigm shift. We all know traditional idea uh at the end of the day, traditional id and by the way between quotes so no uh uh value attached to that one they control and they have to send responsibility for the infrastructure and the applications as a consequence, they also enforce it. If you go to the cloud and if you were to really benefit from all the uh f uh the the the flexibility, what cloud offers, you should give them more freedom, that freedom comes with responsibilities. And that means that the it organization should go to an enable and support function instead of an enforce and control function in order to do that, no, the um made a cloud policy and actually we defined seven golden rules.

Uh so our consumer, we call it cloud consumer. That is you, in this case, he or she is accountable for that application and mandatory, they have to use the landing zones which we have in place. The funding is not in it but is at the cloud consumer. He is responsible to keep it running to secure it. Now you can read all the things. Now you can ask, of course, what is that cloud center of excellence doing in terms of uh the funding?

So you see here the s the cloud consumer and the role of the cloud center of excellence. So about the spends the cloud consumer gets immediately all the costs. So if he spends 1000 or 100 k per month, he get it forecasted or not. Now, the cloud center of excellence, they provide or we provide a tool where they exactly can see how much they are spending. And obviously we also help them and consult them on all this bad stuff.

The same is true for optimization. At the end of the day, the cloud consumer is responsible to make his solution as cost effective as possible. My team is obviously helping and giving them a strong advice to move forward on that one. That is how we cope with uh with the cost.

If you look now more from a security perspective, because there's always a risk in the in the in the security world together with cyber who sets the cyber uh governance under controls. My team uh implements the policies. So the guard rails, et cetera. In this case, on aws at the moment, we see something is wrong at a certain application we have tooling for that in place than via service. Now, uh an incident is managed to the cloud consumer and it's his or her responsibility to get it resolved can ask my team to help there that it is their responsibility.

Um so they have to fix it if there is something. And in this way, we have also the uh security governance portal in place. So once per month, we report out what is the exact state of all the applications we have, by the way, 200 plus applications running in the cloud. What is the status there? So my team not only sets the policies but also reports out on all those aspects.

Uh mm how to move forward, how to move forward here. So initially we started uh i would say three years back. Yeah, and then we called that face built a runway. So build a landing platform, hire ac ce team and uh start with the policies, et cetera and start it with one workload to prove that everything is working at the moment that was working and we had also the business sponsors for that. And uh we said, hey, let's start what we then called the enablement and promote phase.

We found a partner and by the way, they are t cs who, who's doing our operations and we built and uh the c cv was extended also as an enablement team. So not only the technical part but also helping consulting uh the other solutions forward in the architects board which you have in big enterprises.

We also introduced the cloud smart policy, we have mandatory trainings around cloud and dev ops. We had a number of moor promotion projects which we called internally execute lighthouse projects. And we started a cloud migration office and that office was there to help everybody who wants to move to the cloud when that was running after a year.

Well, we basically enforced it that we said, hey, let's close a number of data centers and make smart decisions for those applications uh which run to the cloud. One of the benefits there was uh when we started moving the cost from it to the application owners, certainly we could clean up 100 applications because the application owner thought it was not important anymore because he had to pay the bill themselves.

Um with that, i think i would like to stop and i would like to thank you.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值