hadoop 处理mysql,Hadoop和MySQL集成

最新推荐文章于 2023-06-28 17:37:21 发布

weixin_39820244

最新推荐文章于 2023-06-28 17:37:21 发布

阅读量130

点赞数

文章标签： hadoop 处理mysql

We would like to implement Hadoop on our system to improve its performance.

The process works like this:

Hadoop will gather data from MySQL database then process it.

The output will then be exported back to MySQL database.

Is this a good implementation? Will this improve our system's overall performance?

What are the requirements and has this been done before? A good tutorial would really help.

Thanks

解决方案

Altough it is not a regular hadoop usage. It migh make sense in following scenario:

a) If you have good way to partition your data into the inputs (like existing partitioning).

b) The processing of each partition is relatively heavy. I would give the number of at least 10 seconds of CPU time per partition.

If both conditions are met - you will be able to apply any desired amount of CPU power to make your data processing.

If your are doing simple scan or aggregation - I think your will not gain anything. On other hand - if your are going to run some CPU intensive algorithms on each partition - then indeed your gain can be significant.

I would also mention a separate case- if your processing require massive data sorting. I do not think that MySQL will be good in sorting billions of records. Hadoop will do it.