DCASS Final PRE

最新推荐文章于 2024-10-11 14:12:23 发布

凤雏喵

最新推荐文章于 2024-10-11 14:12:23 发布

阅读量158

点赞数

文章标签： java

本文链接：https://blog.csdn.net/ezioetay/article/details/129411836

版权

Hi, everyone! Thank you all for coming for this presentation. I’m Li Shuai,a project intern. This presentation will introduce a project I finished during my intern.

--------------------next page------------------

My presentation mainly consists of three parts,background of my project, comparison between old practice and new solution, and benefits after rewriting.

--------------------next page------------------

So, first part is the background of my project. First of all,let me introduce a system ,DCASS, which is related to my project. DCASS stands for Derivatives Clearing and Settlement System and serves as a single clearing, settlement and risk management system, whose role in my project is to provide the zip file I need. These files contain Clearing and Settlement files, risk management reports and pre-trade risk management reports. The use of these files are different. Part of them are needed by downstream job, and another part of them will be uploaded to OnDemand after download, which I will mention later.

-------------------next page-------------------

The picture here shows the architecture of our business.And the task of my project is the ‘Presto’ part. First of all, I will request files from Comet,and comet app retrieves certain files through SFTP from DCASS. After the files downloaded into shared folder, I will handle the files and upload part of them. Briefly, my project’s task is to trigger comet job to download files, then scan the shared folder to check files, and handle these files based on different file type.

------------------next page----------------------------

Here I want to mention two things, the first one is that program will deal with three kind of files, they need to be handled differently after downloading and their priorities and numbers of them are different. In the table, from top to bottom, the priority of files is decreasing ,but the number of files is increasing. The second one is that DCASS has a limitation that there can’t be more than 5 times of download within 5 minutes,otherwise the account will be locked. So if we want to automatically download all files, the interval between two downloads should be at least one minute.

-------------------next page-------------------------

Second part is the comparison of old solution and new solution. Here is a flow chart illustrating the procedure of current program. Firstly, it will read file list and save the filenames in memory. Then for each file, it will trigger the comet job and loop to scan the folder, waiting for the download to complete. After all the files’ download finished, it will traverse the files in the order of filename list and do follow-up operation according to file type. This solution has a few of performance problems which I list in the left part. To sum up, the main point of the problems is the efficiency. It won’t start to provide files that users need until every single file has been downloaded. Downloading all the files is a long process which cost more than four hours because of the limitation mentioned before. So the downstream users have to wait for hours to get a file they need which have been downloaded hours ago. Not only that, the files are handled in the order of the filename list instead of file type, which may lead to the consequence that a file with high priority is handled last, while files that are not urgently needed by users are ready first. Another problem is that there is long interval between folder scans and it will do nothing during this interval, which is definitely waste of time and cause the whole program lasting longer. Last point is that it will just exit when a file operation throws an exception. When we find the failure of the process, we have to repeat it from beginning and wait for another 4 hours.

----------------------next page-------------------------------

So, to solve current problems, new practice will work following different flow.When I save the list into memory, I’ll classify them by file types and download them in the order of file priority. For each file in the list, after I find the file in the target folder, instead of busy waiting, I’ll continue to handle the file I just got. In this way, downstream job even don’t need to wait for the project finishing to know whether file they need are ready. They can depend on an autosys job called File Watcher to check file instead of the result of the program. In this way, downstream job can get the first file and start to execute at about 30s after project starts instead of waiting for hours. Moreover, because the waiting time is effectively used, the duration of the whole process is reduced. Last point is exception handling. When exception encountered, program won’t just exit, instead it will record it in the log and email. To get these lost file, you can either rerun the whole program ,which won’t cost much time because most of files have been downloaded, or create a new file list only containing lost files,and pass its path into program as an argument.

--------------next page---------------------

This table compares the time cost of old practice and new practice. For a single file downloading, new solution is about 90 seconds faster, and for downstream files, new solution only need 2 min to prepare them. And for the onDemand users, new solution can also let them get the files three and a half hours in advance compared with old one. At last, its total run time is reduced to 2 hours 40 min,which means it can save two hours in total.

--------------next page---------------------

This page shows the comparison between logs of two programs. The upper one records a process that old program downloads a file, and the lower one shows the whole process of the new program from downloading a file to subsequent processing. Let’s focus on the timestamp on the left in the log of old program, from 2 minutes to 3 minutes and 56 seconds ,the old program needs 116 seconds just to download a file, but in the log of new program, from 24 seconds to 52 seconds, it only needs 28 seconds to prepare a required file, including download, unzip and rename.

---------------next page--------------------

The benefits of new practice are briefly listed here,it can shorten the running time of program,downstream job can get modified file and start quickly, files can be handled based on their priority, it better handles exception, and arguments and profiles is more flexible and reasonable. All these benefits can save time and improve efficiency.

--------------next page--------------------

Q&A