We would like to implement Hadoop on our system to improve its performance.
The process works like this:
Hadoop will gather data from MySQL database then process it.
The output will then be exported back to MySQL database.
Is this a good implementation? Will this improve our system's overall performance?
What are the requirements and has this been done before? A good tutorial would really help.
Thanks
解决方案
Altough it is not a regular hadoop usage. It migh make sense in following scenario:
a) If you have good way to partition your data into the inputs (like existing partitioning).
b) The processing of each partition is relatively heavy. I would give the number of at least 10 seconds of CPU time per partition.
If both conditions are met - you will be able to apply any desired amount of CPU power to make your data processing.
If your are doing simple scan or aggregation - I think your will not gain anything. On other hand - if your are going to run some CPU intensive algorithms on each partition - then indeed your gain can be significant.
I would also mention a separate case- if your processing require massive data sorting. I do not think that MySQL will be good in sorting billions of records. Hadoop will do it.