机器学习 噪声敏感_具有高度敏感数据的机器学习

机器学习 噪声敏感

Most machine learning models in academia are trained and evaluated on publicly available datasets. For companies working in the finance sector or in identity verification this isn’t always possible. Identity verification transactions are highly sensitive, because each transaction contains one or more snapshots of the client’s passport, national ID or other photo ID, along with a selfie. This constant stream of images provides a great opportunity to develop computer vision algorithms by training machine learning models in several domains like data extraction, biometrics or fraud detection. However, the opportunities provided by this data carry a cost in the need for accountability to both government and industry regulators.

学术界的大多数机器学习模型都是在公开数据集上进行训练和评估的。 对于从事金融部门或身份验证的公司而言,这并非总是可能的。 身份验证交易非常敏感,因为每笔交易都包含一个或多个客户护照,国民身份证或其他照片身份证的快照以及自拍照。 通过在数个领域(例如数据提取,生物识别或欺诈检测)中训练机器学习模型,这种恒定的图像流为开发计算机视觉算法提供了绝佳的机会。 但是,此数据提供的机会带来了对政府和行业监管者问责的需求。

Image for post

Companies working with sensitive data have to comply with a number of regulations. PCI (Payment Card Industry) certification regulates environments where credit card data is processed. ISO 27001 determines best practices for managing sensitive data. Companies operating in the European region have to comply with GDPR (General Data Protection Regulation). So what are the consequences of being compliant and how is it possible to train models in such environments?

处理敏感数据的公司必须遵守许多法规。 PCI(支付卡行业)认证规范了处理信用卡数据的环境。 ISO 27001确定了管理敏感数据的最佳实践。 在欧洲地区运营的公司必须遵守GDPR(通用数据保护法规)。 那么,合规性的后果是什么?在这样的环境中如何训练模型呢?

资料存取 (Data Access)

Machine learning needs exist in tension with requirements coming from best practices of managing sensitive data. ML engineers and data scientists require maximum flexibility to carry out their work without compromising security and the privacy of customers. They need to access production images, to view them and to run experiments but in a strictly controlled manner. The environment they are working in has to enable running exploratory data analysis, quick prototyping and visually inspecting images while maintaining strong access control, audit logging and physical security standards.

机器学习需求的紧张关系来自管理敏感数据的最佳实践。 机器学习工程师和数据科学家需要最大的灵活性来执行他们的工作,同时又不影响客户的安全性和隐私。 他们需要访问生产图像,以严格控制的方式查看它们并进行实验。 他们正在使用的环境必须能够进行探索性数据分析,快速制作原型并视觉检查图像,同时保持强大的访问控制,审计日志记录和物理安全标准。

Access controls. Only those people who need access to sensitive data can obtain it. As a rule of thumb, engineers working on machine learning infrastructure and tooling for ML have no access to data. Multi-factor authentication is mandatory for ML engineers/data scientists to access any of the production data/infrastructure.

访问控制。 只有那些需要访问敏感数据的人才能获得它。 根据经验,从事机器学习基础架构和ML工具的工程师无法访问数据。 对于ML工程师/数据科学家来说,要访问任何生产数据/基础结构,必须进行多因素身份验证。

Audit logging. All access to images and metadata is logged, tracking who, when and from where has accessed a given file. Data must not leave the environment where access is controlled and logged (meaning no copies on laptops or other external storage media).

审核日志记录。 记录了对图像和元数据的所有访问,跟踪谁,何时何地访问了给定文件。 数据不得离开用于控制和记录访问权限的环境(这意味着笔记本电脑或其他外部存储介质上没有副本)。

Physical security. Computers connected to the network that permits direct access to images are in a special room with extra physical security. This network does not allow public internet access. Additional security measures include cameras and fingerprint readers.

人身安全。 允许直接访问图像的连接到网络的计算机位于具有特殊物理安全性的特殊房间中。 该网络不允许公共互联网访问。 其他安全措施包括照相机和指纹读取器。

Exploratory data analysis. ML engineers and data scientists require the ability to quickly infer basic statistical properties of the input data stream to understand data modality. This includes analysis of im

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值