Topic of Interest: Improving the efficiency of machine learning algorithms in real-time data process

最新推荐文章于 2024-09-28 20:33:52 发布

Bowen博文

最新推荐文章于 2024-09-28 20:33:52 发布

阅读量58

点赞数 1

文章标签：机器学习人工智能学习经验分享

本文链接：https://blog.csdn.net/pewdiepie/article/details/132400425

版权

Research Question: How can the efficiency of machine learning algorithms be improved for real-time data processing applications?
Annotated Bibliography:

Title: Data Dimension Reduction makes ML Algorithms efficient
Authors: Wisal Khan, Muhammad Turab, Waqas Ahmad, Syed Hasnat Ahmad, Kelash Kumar, Bin Luo
Summary: This study investigates how data dimension reduction (DDR) methods affect the effectiveness of machine learning algorithms. The authors show that supervised and unsupervised learning may both benefit from pre-processing utilizing DDR methods like auto-encoders (AE) and Principal Component Analysis (PCA), which can greatly speed up machine learning algorithms. MNIST and FashionMNIST, two well-known datasets, were employed in the experimental for the research. After using DDR pre-processing methods, the findings demonstrated a significant increase in accuracy and a decrease in processing time. Prior to and during the use of PCA, the authors examined the performance of many supervised learning techniques, including support-vector machines (SVM), decision trees with GINI index, entropy, and stochastic gradient descent classifier (SGDC). The effectiveness of the unsupervised learning technique K-means clustering was also tested before and after AE representation learning. The results imply that DDR methods may significantly increase the effectiveness of machine learning algorithms.
URL: https://arxiv.org/abs/2211.09392v1
Title: Learning Scheduling Algorithms for Data Processing Clusters
Authors: Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh
Summary: This study describes Decima, a system that learns workload-specific scheduling algorithms for data processing clusters using reinforcement learning (RL) and neural networks. Without any human guidance beyond a high-level goal, such as reducing the average work completion time, Decima may automatically design scheduling strategies that are very efficient. In order to handle continuous stochastic job arrivals, the authors created novel representations for the dependency networks of jobs, scalable RL models, and RL training techniques. On a 25-node cluster, the prototype integration of Decima with Spark revealed that Decima reduces the average task completion time by at least 21% when compared to hand-tuned scheduling algorithms, and can even double it at times of heavy cluster traffic. The findings suggest that very effective scheduling strategies for data processing clusters may be created using contemporary machine learning methods.
URL: https://arxiv.org/abs/1810.01963v4
Title: Enhancement of Healthcare Data Transmission using the Levenberg-Marquardt Algorithm
Authors: Angela An, James Jin Kang
Summary: This study investigates how machine learning might improve the precision and efficacy of healthcare data transfer. The authors show how the trade-off issue may be solved by using machine learning to examine complicated health data indicators, such as accuracy and data transfer efficiency. The Levenberg-Marquardt method is used in the research to improve both metrics by reducing the number of samples sent while preserving accuracy. To compare the measures, the algorithm was evaluated using a common heart rate dataset. The Levenberg-Marquardt method was found to perform best, with a performance efficiency of 3.33 times for decreased sample data size and an accuracy of 79.17%, which has identical accuracies in 7 distinct sampling situations selected for testing but displays increased efficiency. The results imply that without favoring one parameter over another, machine learning may be utilized to improve healthcare data transmission metrics.
URL: https://arxiv.org/abs/2206.04240v1
Title: Enhancement of Healthcare Data Performance Metrics using Neural Network Machine Learning Algorithms
Authors: Qi An, Patryk Szewczyk, Michael N Johnstone, James Jin Kang
Summary: This study examines how machine learning may be used to improve healthcare data transfer performance measures. The authors show how the trade-off issue may be solved by using machine learning to examine complicated health data indicators, such as accuracy and data transfer efficiency. By transmitting fewer samples, the study’s time series nonlinear autoregressive neural network methods improve both data metrics. A common heart rate dataset was used to assess the algorithms’ efficacy and accuracy. The Levenbery-Marquardt method performed the best, with an efficiency of 3.33 and an accuracy of 79.17%, which is comparable to the accuracy of previous algorithms but shows better efficiency. The results show that machine learning may boost performance measures for healthcare data without favoring one indicator over another.
URL: https://arxiv.org/abs/2201.05962v1
Title: LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Authors: Zhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang
Summary: This study suggests a lightweight random shuffling implementation (LIRS) to improve the effectiveness of machine learning algorithms. In order to immediately access the chosen training instances from NVM-based storage, LIRS shuffles the indexes of the whole training dataset at random. The authors show that LIRS may increase the final testing accuracy for DNN by 1.01% and decrease the total training time of SVM and DNN by 49.9% and 43.5%, respectively. The findings show that machine learning algorithms may be much more effective when random shuffling is implemented lightly and applied to NVM-based storage.
URL: https://arxiv.org/abs/1810.04509v1
Hypothesis Statement: The use of data dimension reduction methods, reinforcement learning for scheduling, and lightweight implementations of random shuffling on NVM-based storage may all increase the effectiveness of machine learning algorithms for real-time data processing applications.
Organization Profile:
Organization: Google
Profile: Google is a global technology business with a focus on services and goods for the Internet. Search engines, cloud computing, software, and hardware are just a few of Google’s offerings. Google is very interested in machine learning and artificial intelligence, and it incorporates these technologies into a number of its services and products, including cloud computing, advertising, and search.
Reason for Interest: Google would find the study results beneficial since it would improve the performance of its goods and services, save costs, and improve user experience by making machine learning algorithms more effective for real-time data processing applications.