How City of Boston and Granite Construction modernize Oracle ERP apps

So um they were gonna be a little bit of intro about uh being a doctor in construction. Um so i do have a bachelor's degree in computer science. But how many have you ever in my last story and create?

So uh went a little bit in about bringing contraction too, but just that is um we're involved in um take the um little b road and we were in the transportation sector um right now. Test nice.

Ok. So we are in the uh water and wastewater sector, industrial sector, um renewable energy tunneling, mining and uh federal for uh military airfield, paving, uh dams, fish ladders. And then uh finally, we are in the materials business as well. Uh where we um do crushed aggregates, concrete aggregates and uh specialty sands and rocks.

So, um a little setting the stage before we dive into um the content. So, um prior to 2023 and, but i guess prior to 2023 most of our landscape was a mix of um on prem applications and sas applications. So we hosted um a lot of our, our applications on prem and specifically looking at our jd edwards footprint. It was all on prem windows virtual mach machines on a two node um oracle rack cluster and within jd edwards itself, uh if you know anything about er p we utilize most of the um the main modules. So, distribution, finance management, manufacturing inventory. Um we have approximately 3000 users across several 100 sites. And prior to the project that we're gonna talk to, we were on um 9254 platform.

So why am i here? And why did we eventually decide that we needed to complete some type of migration? Well, in september of 2022 our data center gave us a notice to vacate. So we had a year from that notice to migrate every single one of our applications out of the data center, including our er p system and our enterprise architecture from a strategic standpoint said, well, let's go to the cloud. Everybody else is doing it.

So while the presentation today will mostly dive into our oracle er p system on aws, i do want to call attention to all of the other applications that were migrating to aws at the same time as our er p system. And it's funny the things that you find as you go through an exercise like this, we we found applications that were running um that we didn't even know about. So it was a good um a good exercise.

So the approach that we took um was to stand up this new infrastructure in this new cloud platform, whatever that might be. And instead of migrating our existing environment, we said, let's just stand up a new instance of jd edwards in the cloud. And in doing that, we're able to run both instances in parallel. So we don't really impact the business other than for testing and the weekend of go life.

So with that in mind, i was like, hmm, i feel like this is an opportunity for a little bit of scope creep disclaimer. By the way, my presentation was very funny until i met with amazon legal and they moved all my gifs. So if you don't laugh today, please thank amazon legal because it's not gonna be as funny anyway.

So scope creep um at the same time, why not upgrade to release 23 which is the latest on jd edwards and why not upgrade to 64 bit. So with our current, with our platform prior to 2023 we were stuck, we couldn't do any other upgrades because we were in 32 bit land. So with this big of an undertaking and because it was essentially every single application that grant owned, we decided that we needed help.

So we kicked off a vendor selection process and we started with an rfp or a request for proposal. And in that rfp, we laid out everything, well, what we thought was everything that we wanted as part of this migration and upgrade. So we wanted to know about the company's strength. What's the size of the company? What does their profile look like? What's their experience? Have they done a migration like this in the past? What's their partnership look like with, with the cloud, with the different cloud providers? What's their cost? How, how is it structured? Is it fixed? Is it time and materials? Is it a hybrid model? What's their proposed schedule? Um how, how long are they, how long do they think it's going to take to migrate our oracle, er p? And how are they planning to do this? What's, what's their solution for for doing that?

And granite had, we, we did some research and we found several vendors that were um pretty well known for doing a project like this. So we had submitted this rfp to five different vendors and we really waited for, for their responses. And then when those responses came back, we put together an rfp selection committee who rated all these responses and there were certain categories and we'll go through those in a second as far as like what we rated each of the, the rfp responses on, but they were rated 1 to 5 and then some of the categories had more weight than the others. Like the company may have had a lower weight than what their solution looked like.

So, speaking of company, um you know, looking at their company as far as a rating 1 to 5. How comfortable are we with this vendor? Um, what's their size? What's their reputation? Could we talk to other people that have used them in the past? How about their jde cloud partnerships? What's their cloud migration experience look like? How is their solution look? Is it feasible? Does it look like it's something that we at granite are gonna be able to accomplish knowing how granite is? And dr was a really big thing for granite regarding our oracle, er p. Um so dr solution was, was one of the ones that had the most weight for us.

And out of this, we also let the vendors know that there was a possible managed services option that we could also get some information on. We looked at cost, we looked at schedule people process and you can tell, i mean that this was a super timely sensitive process. We spend a lot of time going through and rating all of these vendors. And then we decided in the end that none of that mattered and we picked the cheapest one. I'm just kidding. We pick the one with the highest score.

So in the end, we ended up selecting a vendor called er p suites. And um i give a lot of the, you know, the successful that i give the success of the project to granite, but also to, to the vendor. Um they brought a lot of knowledge and expertise that we didn't have in house. So this partnership really helped to make this project successful.

So with er p suites now on board, we said, ok, we have this er p system, what cloud should we put it on? Can you help us determine what's the best option for granite, given this set of requirements? So, from a granite standpoint, we were really only interested in looking at um oracle cloud infrastructure and amazon web services.

So going through and looking at different areas of comparison, we focused on licensing around oracle database licensing. What does scaling up look like? What does scaling out look like if we needed to do either of these? Well, we have to purchase additional licenses. Um and there were some pros and cons to that. And then as well for weblogic licensing, we did the same and then for cost cost was a big one as well, especially around egress and ingress charges.

So granite has all of these other applications that i that i listed out here plus others and not all of those were going into say the same cloud provider. So we needed to understand what ingress and egress charges looked like from one cloud provider to the other. And like i said, um disaster recovery was a big one and um we wanted to know what cloud options there were for replicating between um the different tendencies and what technologies were there.

So disaster recovery, i've, you know, i've said that this is probably the biggest one here. I'm gonna highlight it again. Dr was the biggest one for granite. We didn't have any in-house knowledge on oc i other apps were already migrating to aws. So the cloud that we selected, anybody, wanna guess, anybody, guess anybody, anybody, anybody aws. Otherwise i wouldn't be here. Right. Come on.

So, yes, we decided to select aws and we felt like aws offered the overall lowest complexity, but also offered grant the best dr solution that we could get. So we move into implementation, you know, fast forward a little bit and um we start really diving into the project timeline.

So we spent 12 weeks planning um identifying third party vendors trying to get third party vendors on board and designing and building out this environment in um aws. And then we spent another 12 weeks implementing, getting concurrent completing retrofits. And um for those of you that aren't familiar whenever you do an upgrade on oracle jd edwards, if you have any customizations and you do an upgrade, all of those customizations get wiped out.

So we have to go back as part of the project and add all of your custom code back into the platform. So we spent a lot of time doing um retrofits testing mock cut overs and then you can see a little detailed um kind of breakdown of the timeline. We went through a couple of mock cut overs in there. We went, we spent a significant time doing system and integration testing where that was mostly it testing because why would we put users into a broken system? So let's make sure that it actually works before we open the doors for them to get in there and start testing.

And then we started building out the infrastructure. And here um aws terraform was super helpful. Um with terraform, we were able to create the vpc, the subnets, the network security groups, provision servers, provision, databases, apply firewall, um se segmentation to the servers. And we were able to uh configure backups and we did all of this in two hours. It's a pretty big win as we were going through the mock cut overs throughout the process, we started putting together what a go live would look like we started building out a punch list and including all of the gas, not only for specifically to er p but also to any of the bolt on applications to that er p and how those integrations would be impacted by the migration of e one.

So our goli punch list was 216 items. We were working around the clock um to get all of these items completed. And um as you marked an item complete on the punch list, it was creating a nice little graphic in the background where at any point in time you would be able to get into the system and see a status update of kind of where we were in which part of the project and what was happening and who was doing that.

So, prior to the migration, we knew that we had a lot of moving pieces here and we had to figure out a way to get everybody organized and everybody to stay on the same page. So we started a team's meeting and that team's meeting stayed open the entire weekend for go life. Um if we ran into issues, people were able to hop onto that bridge, we were able to work through those and you know, we use it for coordinating tagging people, tapping people on the shoulder saying, hey, you're up next and um the day.

So the friday before you know, the monday of go live, we brought the system down for everybody but a handful of users and we told these users like, hey, we need you to get into the system. Please don't complete any transactions that are gonna update the database. Like at this point, we want you to get in there. We want you to run some data integrity reports. We wanna do some data validation. Um we ended up running some sequels so that we could um sum row counts, but we didn't even sum just the row counts.

So specifically in like the payroll areas, we even summed totals to ensure that we could make audit happy that everything was migrated successfully and then post migration. We, we did the opposite. Right. So we brought the system back up for that handful of users and we let them run those same reports. We ran those same sequels and um we compared the before and the after data and in the end, it was a success now because i had to get rid of my gifs. This is my attempt at drawing a touchdown, touchdown stick figure.

So it was a success. Everybody was happy. We felt pretty comfortable at this point and we were not going to turn back. We were not going to go back to our on prem system no matter what happened from this point forward. So we were able to migrate our entire oracle er p in less than 48 hours of downtime because of the approach that we took.

Now, i'm a little bit nervous about presenting this performance section because i am i, i thought i was technical, i really thought i was technical as a developer until we start talking about performance and it just goes way over my head. So these slides that i'm gonna pre present, i really wanna give credit to um er p suites because they helped me put together this information to present for you today and i'm gonna do my best way, my best to articulate it to you. And hopefully it makes sense if it doesn't just tell me it does so that i can feel good. And i feel like i actually, you know, helped you guys learn today.

So when you're talking about jd edwards, there's essentially two big bottlenecks. So bottleneck number one is your database server. Bottleneck. Number two is your enterprise server. The bottleneck on the database server is your dis io and your bottleneck on the enterprise server is your single core clock speed.

So, and looking at these two in regards to batch processing or application processing, your database server is your bottleneck 90% of the time. The other 10% when your your um your database server isn't the bottleneck. It's your enterprise server. So how can we fix this? How do we help the database server so that it's not that bottleneck. Well, we minimize going to the disk because disk io is where we slow down. So let's try not to go to the disk. And by doing that by not going to the disk, we have to add more ra m. So i think there's a saying like just throw more ra m at it. Well, in this case, i think that's true. And when you do have to go to the disk, just make sure that you have a really fast disk.

And on the enterprise server, pretty simple, just increase the speed of the clock. But the question is when is increasing the clock speed actually helpful and to help explain it, like it was explained to me, we're gonna use um a, a road essentially. So we can think of a car or sorry, we can think of a core as the number of lanes on a highway to increa or to add another core. We're simply going to add another road or add another lane. And this is the traffic that you essentially have going back and forth. And the clock speed here in this example is equivalent to the speed limit on that road. So the single core clock speed is equivalent to the speed limit. So the faster the clock speed, the faster the speed limit for the individual car.

So adding another lane might be helpful when say you're in, when, when it's congested when we've got a lot of traffic. So in the situation of looking, say at adding four lanes, so now we've, we've got four lanes total and we've got gridlock. So if we were to double that and add another four lanes, now, maybe we don't have that gridlock problem, we don't have that congestion anymore. So in this scenario, this is where it would make sense to increase your clock speed and this is where you would see that performance

So if you have a relatively free highway, your clock speed um increasing that clock speed would be helpful at that point. So now we start talking about optimizing the storage um on aws and we'll start with two different classes in aws. Um for storage, one is the io two and i believe now it's called io two express and then the other one for storage is gp three.

So for io two, let's take a scenario where we have 400 gigabytes of disk space and we want to provision that with 12,000 iops. And if you were like me before the start of this project and you have no idea what iops are. It is the input and output per second on the database. So in this scenario where you've got the 400 gigs, 12,000 iops, we're thinking we're looking at a approximately a monthly price of about $830.

Now, if we were to look at gp three with a similar scenario where we've got that same 400 gigabyte disk space. Now, for gp three, the interesting thing about it is that for each volume that you provision, you automatically get 3000 worth of iops. So in this scenario, i'm not gonna provision this with 12,000 iops. Since i already have three, i only need to add an additional nine. And in this scenario, we dropped the price from $832 approximately to $77.

Now, we can take this a step further. And if you remember how i had mentioned that for every volume of gp three, you get those 3000 iops provisioned. So if we were to expand that 400 gigs into 4 100 gig volumes, and now right off the bat, i already have 3000 iops per volume. I don't have to provision any additional iops. So at this point, the bottom scenario is giving me the same amount of disk space and the same number of iops as the top scenario. And here the monthly price drops to $32 a month.

So taking that into consideration, we then have to start thinking about which of the two cla which of the two classes of the x two processor makes the most sense for granted. Is it the x two iezn which is the faster clock speed or is it the x two iedn which is giving us more ops?

So in order to make that decision, we had to look at our existing platform and understand kind of our, our youth characteristics. And to do that, we used a tool um called uh database current state investigation or uh c si. And we use this to, to capture those performance characteristics. This is a free tool provided by aws and there are two versions of this tool and this is where this gets really important.

The first version is a licensed version. So you have to be licensed for the diagnostic pack. If you run this tool and you are not licensed for that pack, it's going to generate the same pretty graphs as it would if you are licensed. And if you were to get into an audit with oracle, you would owe them a bunch of money. So before you run the license diagnostics pack version, make sure that you have the appropriate license.

The other version is called um the stats pack. And with this tool, we were able to grab about a week's worth of data and specifically focused around the month in time since that was gonna be our highest utilization time period.

So one of the graphs that came out of that tool um is our average database cpu usage. And from ac pu standpoint, i i know it's hard to read. but our max average here is sitting at about one point or 4.1 course. So from ac pu standpoint, we're not really concerned other than making sure that it fits in our current licensing landscape.

And as far as our memory allocation, you can see that we're pretty consistent and that's right at um 500 gigabytes there for memory. Uh so we were pretty consistent with, with that throughout that week's worth of data. And uh this will be important in the next step when um we, we dive in just a little bit deeper and then looking at our number of iops.

So a little bit interesting, we stayed pretty close to the bottom and then occasionally we would spike up to 50 60,000 iops. And uh what we learned is that that was due to uh database backups. And then we also run um a tool called star quest where it's replicating data out of our oracle database into a data lake that sits up in azure. So um the the spikes, you know, they're, they're brief but they do get uh fairly intense.

And then when you look at it from an average iops standpoint, our average max was right at 5200 or 52 30 52 30 is the max up there. Um and, you know, it's just so that was our average, but we have to keep in mind that we still had spikes up to that 50 or 60,000.

So thinking about this in an instance type where, you know, we know we need that 500 gigabytes of memory. Um you know, we, we have to look at the, the four x large on the x two izn instance. And um when we do that, if you remember when i talked about earlier from the gp three storage, we get 3000 iops automatically.

So here in this scenario, we're gonna look at 51 terabyte drives and we automatically get 3000 iops per volume and we're gonna go ahead and we're gonna max those out and add an additional 13,000 iops to total it to be 80,000 iops. So we're looking at 80,000 iops in 24 megabits per second.

Now, if you look at the, the instance under the, the max iops, you see that we're kind of at a threshold there of 20,000 iops. So even though on the um even though on the storage side, we're generating 80,000 iops, the vm is gonna throttle us down to 20,000 iops. So in this scenario, the vm throttles our iops by almost 75%. And again, i had another giff in here, but instead i did some more cartooning and this is me sad because it throttles us so far.

Take that same scenario and now we're going to look at the x two iedn and um same scenario with the 80,000 iops, the 24 megabytes per second. And here you can see that it throttles us to 65,000 iops. So we're still limited, it still, it still throttles us down. But we're in a much, we're in much better shape here because we're only being throttled down now by 18.75%. So this one, i mean, made me happy. So i drew a little happy cartoon.

Coming back to this screen. So which one is better for granite? So in the end, granite chose the more iops.

And then we said, ok, so we've done all this work. We've done all this work of trying to architect a really good system. How are we gonna figure out if we're doing worse or better than our current on prem platform?

So we looked at measuring performance and we analyzed our record or so bats perform. We looked at bats performance and we analyzed records per second on the upgrade versus on prem. And we sorted those results by something that's called the misery index.

And if you know anything about economics, the misery index is um it refers to the inflation rate added to the unemployment rate. But in here for performance analysis, the misery index is the average wait time multiplied by the number of executions. And we're able to use it to gauge the impact to the business.

So we did all of this analysis. We sorted all of this data by the misery index and we're surprised to find we had an 81% average batch performance increase post upgrade. That is a significant win. And this is where again, i had another gif of doing a mic drop and i had to take it out because i feel like this is a mic drop moment. So i drew myself doing a little mic drop for you.

Um th this was incredible and, and our users could not be, could not be more happy.

So looking at disaster recovery, uh i had mentioned early on that this was something that was super important to, to granite. So um over here on the left, that's where our e one environments hosted. It's hosted in the uh west two region. And on the right, we are hosted in the east two region for the, for the dr piece between the two regions. We've set up vpc peering to set up the network connectivity.

And then at the target region. Um we have the dr staging subnet and the dr recovery subnet and we utilize amazon uh drs um to perform the recovery from the source to the target. So this was huge for grant to be able to get ad r solution like this.

And thinking about rpo and rt o. So rpo recovery point objective, it's a maximum amount of data measured in time that granite can tolerate to lose our target before we started the project was one hour. And when we did our dr test, we found that we could do it in approximately 10 minutes and then looking at the rt o.

So recovery time objective, which is the maximum amount of length of time, it takes us to restore normal operations after some type of event, our target was around four hours. And when we did our dr test, we found that we could do it in about three hours and 10 minutes. So for both of those, we were under our under our target, which was super awesome.

Now, not all projects, uh i guess i should say it sounds like it went really well, right? But all projects have challenges. And um one thing that we found out is, you know, you can plan all you want to, but until you actually get into the weeds, you can't really plan for all of the risks that that are gonna come out.

All of our third party vendors, they were great until we needed them. No, i'm just kidding. Um they were great, but we quickly found out that our timeline just because we had to get out of the data center in a year. They didn't really care like they had their own priorities and their own projects and their own things that they were dealing with.

So we found out that our pro i mean, like our timeline was not their timeline. I took some negotiating there. And despite all of our testing efforts and our system integration testings and our uat testings and grant has 900 automation test scripts that we ran that we run as part of any upgrade that we do. And despite doing all of that, even after go live, we still found issues, we ran into some resource constraints.

I don't know if anybody else has those out there, but that's, that's a big thing. I think. Um it's a big problem at granite. And um the other challenge was while we moved to aws and did an upgrade, somebody somewhere said, hey, this is a great time to do some security hardening. Let's try this and see how it goes.

So if we had to do it over, i would recommend that we remove that from, from the project scope, uh things that went really well. So we felt like our timeline was well, we aligned with uh the vendor on the timeline and we had adequate time for testing. And um you know, we, we ended up having to, to switch our go live date one time, uh, because of one of our third party vendors that is over a pretty important part of our, er, p so, um, it's the software that does, uh, sales and use tax.

We couldn't get them on board early enough. So we did have to delay, go live, uh, for a couple of weeks to get through that. But overall, our timeline worked out pretty well. Our communication, i mean, we were constantly communicating probably to the point that it was annoying. Um we were sending out executive updates, we were holding meetings to talk to the business about kind of where we were in the process.

The sign off process we used for this project, it's the same sign off, sign off process we use for anything e one related. So the business already knew what we expected of them and knew exactly what they needed to do, team expertise. Like i said, we couldn't go wrong, didn't go wrong hiring er p. So that's nice.

Our approach standing up the new environment in the cloud and having both of them run in parallel. We had really no business. Um we didn't have any pushback from the business really at all because we, we didn't impact their, their day to day lives.

Our budget, we ended up coming several $100,000 under budget for this project. So that made um lots of people happy our outcomes. So our director of it, infrastructure and ops after we did our first dr test, he said, man, that was fast. The last vendor took two days to complete the dr and our cio said that this was the smoothest jd edwards migration project that his, that he and the business have ever experienced.

And uh this slide, i mean, it's, it's really a fun slide and that's why i included it, but we had 1048 sign offs. So all of our testing that we completed, um the business has to go through and they have to sign off that they agree with the test results. So we had 1000 48 of those.

We had 19 final approvals. So that goes up to the director level where the directors then sign off on the sign offs. We had a total of 1593 pdf documents that constituted our proof of testing and we had 100 and 10 before and after migration validation reports, ah reports proving that there was no data integrity issues.

All of this totaled up to seven gigs of information that we gave to our auditors and the, the screenshot. i'm about to show you. it is a real depiction of the auditor's face.

All right guys. That is all i have for you today. Thank you for coming to my session.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值