Topic of Interest: Improving the efficiency of machine learning algorithms in real-time data process

Research Question: How can the efficiency of machine learning algorithms be improved for real-time data processing applications?
Annotated Bibliography:

  1. Title: Data Dimension Reduction makes ML Algorithms efficient
    Authors: Wisal Khan, Muhammad Turab, Waqas Ahmad, Syed Hasnat Ahmad, Kelash Kumar, Bin Luo
    Summary: This study investigates how data dimension reduction (DDR) methods affect the effectiveness of machine learning algorithms. The authors show that supervised and unsupervised learning may both benefit from pre-processing utilizing DDR methods like auto-encoders (AE) and Principal Component Analysis (PCA), which can greatly speed up machine learning algorithms. MNIST and FashionMNIST, two well-known datasets, were employed in the experimental for the research. After using DDR pre-processing methods, the findings demonstrated a significant increase in accuracy and a decrease in processing time. Prior to and during the use of PCA, the authors examined the performance of many supervised learning techniques, including support-vector machines (SVM), decision trees with GINI index, entropy, and stochastic gradient descent classifier (SGDC). The effectiveness of the unsupervised learning technique K-means clustering was also tested before and after AE representation learning. The results imply that DDR methods may significantly increase the effectiveness of machine learning algorithms.
    URL: https://arxiv.org/abs/2211.09392v1
  2. Title: Learning Scheduling Algorithms for Data Processing Clusters
    Authors: Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh
    Summary: This study describes Decima, a system that learns workload-specific scheduling algorithms for data processing clusters using reinforcement learning (RL) and neural networks. Without any human guidance beyond a high-level goal, such as reducing the average work completion time, Decima may automatically design scheduling strategies that are very efficient. In order to handle continuous stochastic job arrivals, the authors created novel representations for the dependency networks of jobs, scalable RL models, and RL training techniques. On a 25-node cluster, the prototype integration of Decima with Spark revealed that Decima reduces the average task completion time by at least 21% when compared to hand-tuned scheduling algorithms, and can even double it at times of heavy cluster traffic. The findings suggest that very effective scheduling strategies for data processing clusters may be created using contemporary machine learning methods.
    URL: https://arxiv.org/abs/1810.01963v4
  3. Title: Enhancement of Healthcare Data Transmission using the Levenberg-Marquardt Algorithm
    Authors: Angela An, James Jin Kang
    Summary: This study investigates how machine learning might improve the precision and efficacy of healthcare data transfer. The authors show how the trade-off issue may be solved by using machine learning to examine complicated health data indicators, such as accuracy and data transfer efficiency. The Levenberg-Marquardt method is used in the research to improve both metrics by reducing the number of samples sent while preserving accuracy. To compare the measures, the algorithm was evaluated using a common heart rate dataset. The Levenberg-Marquardt method was found to perform best, with a performance efficiency of 3.33 times for decreased sample data size and an accuracy of 79.17%, which has identical accuracies in 7 distinct sampling situations selected for testing but displays increased efficiency. The results imply that without favoring one parameter over another, machine learning may be utilized to improve healthcare data transmission metrics.
    URL: https://arxiv.org/abs/2206.04240v1
  4. Title: Enhancement of Healthcare Data Performance Metrics using Neural Network Machine Learning Algorithms
    Authors: Qi An, Patryk Szewczyk, Michael N Johnstone, James Jin Kang
    Summary: This study examines how machine learning may be used to improve healthcare data transfer performance measures. The authors show how the trade-off issue may be solved by using machine learning to examine complicated health data indicators, such as accuracy and data transfer efficiency. By transmitting fewer samples, the study’s time series nonlinear autoregressive neural network methods improve both data metrics. A common heart rate dataset was used to assess the algorithms’ efficacy and accuracy. The Levenbery-Marquardt method performed the best, with an efficiency of 3.33 and an accuracy of 79.17%, which is comparable to the accuracy of previous algorithms but shows better efficiency. The results show that machine learning may boost performance measures for healthcare data without favoring one indicator over another.
    URL: https://arxiv.org/abs/2201.05962v1
  5. Title: LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
    Authors: Zhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang
    Summary: This study suggests a lightweight random shuffling implementation (LIRS) to improve the effectiveness of machine learning algorithms. In order to immediately access the chosen training instances from NVM-based storage, LIRS shuffles the indexes of the whole training dataset at random. The authors show that LIRS may increase the final testing accuracy for DNN by 1.01% and decrease the total training time of SVM and DNN by 49.9% and 43.5%, respectively. The findings show that machine learning algorithms may be much more effective when random shuffling is implemented lightly and applied to NVM-based storage.
    URL: https://arxiv.org/abs/1810.04509v1
    Hypothesis Statement: The use of data dimension reduction methods, reinforcement learning for scheduling, and lightweight implementations of random shuffling on NVM-based storage may all increase the effectiveness of machine learning algorithms for real-time data processing applications.
    Organization Profile:
    Organization: Google
    Profile: Google is a global technology business with a focus on services and goods for the Internet. Search engines, cloud computing, software, and hardware are just a few of Google’s offerings. Google is very interested in machine learning and artificial intelligence, and it incorporates these technologies into a number of its services and products, including cloud computing, advertising, and search.
    Reason for Interest: Google would find the study results beneficial since it would improve the performance of its goods and services, save costs, and improve user experience by making machine learning algorithms more effective for real-time data processing applications.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值