vsb asc_vsb电力线故障检测kaggle竞争

vsb asc

案例研究概述: (Overview of the case study:)

Step 1: Explanation of the problem which includes details about the source, problem statement, explanation of relevant terms, and how the problem can be viewed as a business problem.

步骤1:对问题的解释,包括有关源,问题陈述,相关术语的解释以及如何将问题视为业务问题的详细信息。

step 2: Conversion of the business problem into a machine learning problem.Details of the existing performance metrics, previous solutions, approaches, and improvements.

步骤2:将业务问题转换为机器学习问题。现有性能指标,以前的解决方案,方法和改进的详细信息。

step 3: Complete Exploratory data analysis.

步骤3:完成探索性数据分析。

step 4: Feature Engineering involving various techniques.

步骤4:涉及各种技术的特征工程。

step 5: Trying out different machine learning models and selecting the best model and predicting on the test dataset.

步骤5:尝试不同的机器学习模型,选择最佳模型并根据测试数据集进行预测。

step 6: Future works that can be done to improve the performance metric

第6步:可以做的改进性能指标的未来工作

step 7:References

步骤7:参考

1.业务/实际问题: (1.Business/Real-world Problem:)

1.1资料来源: (1.1 Source:)

This was posted as a kaggle challenge by the ENET Centre which researches and develops renewable energy resources with the goal of reducing or eliminating harmful environmental impacts.

这是ENET中心发布的一项尖锐挑战,该中心研究和开发可再生能源,旨在减少或消除有害的环境影响。

Source: https://www.kaggle.com/c/vsb-power-line-fault-detection/overview

资料来源: https : //www.kaggle.com/c/vsb-power-line-fault-detection/overview

Data: Enet Centre, VSB — T.U. of Ostrava

数据:Enet中心,VSB —俄斯特拉发TU

1.2什么是局部放电? (1.2 What is Partial Discharge?)

Here we deal with medium voltage overhead powerlines which are spread over hundreds of miles making manual fault detection almost impossible

在这里,我们处理分布在数百英里之外的中压架空电力线,几乎无法进行人工故障检测

These lines on some occasions get damaged by either a tree branch or due to a flaw in the insulator. These damages lead to a power outage gradually over the passage of time. This phenomenon is called partial discharge.

在某些情况下,这些线会因树枝或绝缘子的损坏而损坏。 随着时间的流逝,这些损坏逐渐导致断电。 这种现象称为局部放电。

Its textbook definition is an electrical discharge that does not bridge the electrodes between an insulation system completely.

它的教科书定义是一种放电,它不能完全桥接绝缘系统之间的电极。

1.3问题陈述 (1.3 Problem Statement)

The main objective of this case study is to detect these partial discharge patterns in signals acquired from lines with a new meter. Effective classifiers using this data will make it possible to continuously monitor power lines for faults.

本案例研究的主要目的是使用新的仪表检测从线路获得的信号中的这些局部放电模式。 使用此数据的有效分类器将使连续监视电源线是否有故障成为可能。

1.3。 现实世界/业务目标和约束。 (1.3. Real-world/Business objectives and constraints.)

1.Minimize binary-class error2.probability estimates3. There’s no time limitation as partial discharge faults do damage over time and not immediately so limit can be in hours4. Detecting the partial discharge early can be helpful financially

1.最小化二元类错误2.概率估计3。 没有时间限制,因为局部放电故障会随时间推移而不是立即损坏,因此限制可以在几小时之内4。 尽早发现局部放电对财务有帮助

2.机器学习问题 (2. Machine Learning Problem)

2.1。 资料总览 (2.1. Data Overview)

  • Source:https://www.kaggle.com/c/vsb-power-line-fault-detection/data

    资料来源:https://www.kaggle.com/c/vsb-power-line-fault-detection/data

In total 4 files are given in which 2 correspond to train data and the rest correspond to test data1.A file containing signal data2.A file containing metadata

总共给出了4个文件,其中2个对应于训练数据,其余2个对应于测试数据1.一个包含信号数据的文件2.一个包含元数据的文件

Each signal contains 800,000 points and in total data of 8712 signals were given for training and 20337 signals were given for testing in the form of parquet data

每个信号包含80万个点,总共有8712个信号用于训练,而20337个信号以木地板数据的形式提供给测试

Metadata consists of the phase of the signal and the target label0-if partial discharge is not there1-if partial discharge is present

元数据由信号和目标标记的相位组成0-如果不存在局部放电1-如果存在局部放电

2.2。 将实际问题映射到ML问题 (2.2. Mapping the real-world problem to an ML problem)

2.2.1. Type of Machine Learning Problem

2.2.1。 机器学习问题的类型

There are 2 different classes of malware that we need to classify a given a data point => Binary class classification problem

我们需要对给定数据点进行分类的2种不同类别的恶意软件=>二进制类别分类问题

2.2.2. Performance Metric

2.2.2。 绩效指标

Source: https://www.kaggle.com/c/vsb-power-line-fault-detection/overview/evaluation

资料来源: https : //www.kaggle.com/c/vsb-power-line-fault-detection/overview/evaluation

Metrics:*Matthews correlation coefficient(MCC)*Confusion matrix

指标:*马修斯相关系数(MCC)*混淆矩阵

2.2.3. Machine Learning Objectives and Constraints

2.2.3。 机器学习目标和约束

Objective: Predict the probability of each data-point belonging to each of the 2 classes.

目标:预测每个数据点属于2类的概率。

Constraints:

限制条件:

* Class probabilities are needed. * Penalize the errors in class probabilities => Metric is Matthews’s correlation coefficient.* Some Latency constraints.

*需要班级概率。 *惩罚类概率中的错误=>度量标准是Matthews的相关系数。*一些延迟约束。

2.3.1。 现有方法 (2.3.1. Existing approaches)

Most of the notebooks present in https://www.kaggle.com/c/vsb-power-line-fault-detection/notebooks used deep learning techniques.

https://www.kaggle.com/c/vsb-power-line-fault-detection/notebooks中提供的大多数笔记本电脑都使用深度学习技术。

In most of the approaches, each signal is divided into equal chunks of data of size 1000. So in total, there would be 800 chunks. Now from each chunk statistical features are extracted which would result in a 3-dimensional array. Now, most of the solutions used LSTMs as the data is sequential. Some solutions have used attention layers and some notebooks used transformers.

在大多数方法中,每个信号被分为大小为1000的相等数据块。因此,总共将有800个数据块。 现在,从每个块中提取统计特征,这将导致3维数组。 现在,大多数解决方案都使用LSTM,因为数据是连续的。 一些解决方案使用了注意层,一些笔记本使用了变压器。

Some approaches used signal denoising techniques like DWT(discrete wavelet transform) and some other solutions relied on finding peaks in the signal data and then models were built using deep learning techniques.

一些方法使用了诸如DWT(离散小波变换)之类的信号去噪技术,而另一些解决方案则依靠在信号数据中寻找峰值,然后使用深度学习技术来建立模型。

2.3.2。 改进之处 (2.3.2. Improvements)

As most of the solutions used deep learning techniques I’ve used machine learning models like boosting models. In terms of feature engineering I’ve used new techniques like power spectral density and Fourier transform and found the top features using peak detection. I’ve also used peak detection in the spectra of the signal which was already mentioned in a notebook. In the modeling part, I’ve four different machine learning models and other techniques like Randomsearchcv and stratifiedKfold cross-validation.

由于大多数解决方案都使用深度学习技术,因此我使用了机器学习模型,例如增强模型。 在特征工程方面,我使用了功率谱密度和傅立叶变换等新技术,并使用峰值检测发现了最重要的特征。 我还在笔记本中已经提到的信号频谱中使用了峰值检测。 在建模部分,我有四种不

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个简单的测试程序示例,用于测试 `VSB::UploadDataToBlackBox()` 函数: ```cpp #include <iostream> #include <queue> #include "vsb_axis_data.h" // 假设包含了 vsb_axis_data_t 和 vsb_axis_data_array 的定义 // 模拟一个接收到的数据队列 std::queue<vsb_axis_data_t> received_tpdo_queue; // 模拟 AxisDataPool 类 struct AxisDataPool { static vsb_axis_data_array* in_vsb_data_array; }; // 初始化 AxisDataPool 类的静态成员变量 vsb_axis_data_array* AxisDataPool::in_vsb_data_array = nullptr; // 模拟 m_received_tpdo_queue 是否为空的判断 bool isReceivedTpdoQueueEmpty() { return received_tpdo_queue.empty(); } // 模拟 m_received_tpdo_queue 的 pop() 操作 bool popFromReceivedTpdoQueue(vsb_axis_data_t& data) { if (!received_tpdo_queue.empty()) { data = received_tpdo_queue.front(); received_tpdo_queue.pop(); return true; } return false; } class VSB { public: static void UploadDataToBlackBox(); }; void VSB::UploadDataToBlackBox() { if (isReceivedTpdoQueueEmpty()) return; vsb_axis_data_t vsb_data_tmp; vsb_axis_data_array m_vsb_data_array; m_vsb_data_array.data_array.clear(); while (popFromReceivedTpdoQueue(vsb_data_tmp)) { m_vsb_data_array.data_array.push_back(vsb_data_tmp); } (*AxisDataPool::in_vsb_data_array) = m_vsb_data_array; } int main() { // 添加一些测试数据到 received_tpdo_queue 中 for (int i = 0; i < 5; ++i) { vsb_axis_data_t data; data.data_sampling_timestamp_us = i; data.data_received_timestamp_us = i + 10; data.x_axis_data = i + 100; data.z_axis_data = i + 200; received_tpdo_queue.push(data); } // 创建一个 vsb_axis_data_array 对象,并将其指针赋值给 AxisDataPool 的静态成员变量 vsb_axis_data_array vsb_data_array; AxisDataPool::in_vsb_data_array = &vsb_data_array; // 调用 UploadDataToBlackBox() 函数进行测试 VSB::UploadDataToBlackBox(); // 打印存储的数据 for (const auto& data : vsb_data_array.data_array) { std::cout << "data_sampling_timestamp_us: " << data.data_sampling_timestamp_us << std::endl; std::cout << "data_received_timestamp_us: " << data.data_received_timestamp_us << std::endl; std::cout << "x_axis_data: " << data.x_axis_data << std::endl; std::cout << "z_axis_data: " << data.z_axis_data << std::endl; std::cout << std::endl; } return 0; } ``` 此测试程序模拟了 `received_tpdo_queue` 的数据队列,并在其中添加了一些测试数据。然后创建了一个 `vsb_axis_data_array` 对象,并将其指针赋值给 `AxisDataPool` 的静态成员变量 `in_vsb_data_array`。接下来调用 `VSB::UploadDataToBlackBox()` 函数进行测试,并打印存储的数据。运行该程序,您将看到测试数据被存储在 `vsb_data_array.data_array` 中,并打印出来。请注意,为了简化示例,省略了一些实际代码的细节和依赖项。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值