数据处理注意事项_处理数据的安全注意事项

数据处理注意事项

介绍 (Introduction)

When working on data projects, scientist/analysts are often focussed on so many different elements that security can easily be forgotten. However, when the security aspect is forgotten it might backfire, big time. Having a security breach can result in many undesirable consequences like loss of data, a hijack of cloud resources, legal problems, hacking, clients leaving and much more.

在进行数据项目时,科学家/分析师通常将重点放在许多不同的元素上,以至于很容易忘记安全性。 但是,如果忘记了安全方面,则可能适得其反,这很重要。 违反安全性可能导致许多不良后果,例如数据丢失,云资源劫持,法律问题,黑客入侵,客户端离开等等。

This all sounds a bit dramatic but luckily, there are very simple steps that one can implement which can already prevent a host of issues. Since I get a lot questions regarding this, I have decided to bundle a few key points into a post.

所有这些听起来都有些戏剧性,但是幸运的是,人们可以执行非常简单的步骤,这些步骤已经可以避免许多问题。 由于对此有很多疑问,因此我决定将一些要点捆绑到一个帖子中。

一些注意事项 (Some Considerations)

始终使用两因素身份验证。 (Always Use Two-Factor Authentication.)

Two-factor authentication (or 2FA) is a simple principle where your credentials are not limited to “something you know” (your password), but extended to “something you are” (biometric data) and/or “something you have (phone/token/…).

两因素身份验证(或2FA)是一个简单的原则,您的凭据不仅限于“您知道的事情”(您的密码),还可以扩展到“您的存在”(生物特征数据)和/或“您拥有的东西(电话)” / token / ...)。

2FA is usually very easy to implement and given the current technology (FaceID, smartwatches,…), it is rather hassle free. Implementing 2FA will greatly limit the risk of unwarranted access, since even when a password is stolen it would be useless without the second factor. If you don’t already have a 2FA in place on your most important accounts (email, cloud storage etc.) I would strongly advise you to get one today.

2FA通常非常易于实现,并且考虑到当前技术(FaceID,智能手表等),它相当轻松。 实施2FA将极大地限制不必要的访问风险,因为即使密码被盗,如果没有第二个因素,密码也将毫无用处。 如果您最重要的帐户(电子邮件,云存储等)上还没有2FA,我强烈建议您今天就购买2FA。

不要将数据存储在本地计算机上 (Do Not Store Data On a Local Machine)

I get it, storing data on a local machine is easy and fast. However, the owner of the data (client, business partner,…) will not be happy with you storing the data locally. There is this saying in IT security: “Physical Access = Game Over.” Although the reality carries more of a nuance, a stolen or lost laptop/drive might bear serious consequences. For this reason, cloud storage with the appropriate security measures implemented is what is preferred.

我明白了,将数据存储在本地计算机上既方便又快捷。 但是,数据的所有者(客户,业务伙伴等)对您将数据存储在本地不满意。 IT安全中有这样一句话:“物理访问=游戏结束。” 尽管现实会带来更多细微差别,但笔记本电脑/驱动器被盗或丢失可能会造成严重后果。 因此,首选实施了适当安全措施的云存储。

Aside from major platforms like AWS and Azure, there are plenty of possibilities to store your data on the cloud.

除了AWS和Azure等主要平台之外,还有很多将数据存储在云中的可能性。

In case you have no alternatives other than to store your data locally, you should always encrypt your drive in addition to locking your computer.

如果除了将数据存储在本地之外别无选择,除了锁定计算机之外,还应该始终加密驱动器。

Image for post
Physical access should be highly restricted. (Image source: pexels.com)
物理访问应受到严格限制。 (图片来源:pexels.com)

使用密码管理器 (Use A Password Manager)

I am often surprised by the number of people who still carry a sceptical attitude towards password managers even though they are more often than not the safer choice. Password managers allow you to store thousands of complex passwords, autofill them for you and generate new ones whenever you are asked to do so.

我经常对仍然对密码管理器持怀疑态度的人数感到惊讶,尽管他们通常是更安全的选择。 密码管理器允许您存储数千个复杂的密码,自动为您填充密码,并在需要时生成新的密码。

Aside from being data professional, using a password manager has also greatly improved my personal convenience. For example, nowadays you need a user account for almost any webshop, my personal managers simply suggest a difficult and unique password for this website and will fill it in every time I visit. In case the website were to get hacked and my information were to be breached, it would not mean much since the password was only allocated for that site access. Don’t forget to add 2FA to access your password manager.

除了成为数据专家之外,使用密码管理器还极大地提高了我的个人便利性。 例如,如今您几乎需要任何网上商店的用户帐户,我的个人经理只是为该网站建议一个困难且唯一的密码,并且每次我访问时都会填写该密码。 万一该网站被黑客入侵而我的信息遭到破坏,那意义不大,因为密码仅分配给该网站访问。 不要忘记添加2FA来访问您的密码管理器。

将IP限制应用于数据库或笔记本 (Apply IP Restrictions To Your DB or Notebooks)

tHiS one sIMpLe Trick prEVENTS Data THEft, HAckeRS Hate HIm!

他是一名绝招,数据盗窃,骇客讨厌。

Jokes aside, when hosting databases like MySQL or Jupyter Notebooks on a server, you can easily add IP restrictions. This means the database can only be accessed from certain locations like your home, office, client etc. This very simple, but effective measure has a great positive impact on your data security. Even though it is not foolproof, it is a very simple and extremely efficient way to prevent unwanted access.

除了玩笑,当在服务器上托管MySQL或Jupyter Notebook之类的数据库时,您可以轻松添加IP限制。 这意味着只能从家庭,办公室,客户等特定位置访问数据库。这种非常简单但有效的措施对数据安全性具有很大的积极影响。 即使不是万无一失,它也是防止不必要的访问的非常简单且极为有效的方法。

Be careful though, most non-professional internet plans have a dynamic IP. Applying IP-restrictions on a dynamic IP might result in locking yourself out.

但是请注意,大多数非专业的Internet计划都具有动态IP。 在动态IP上应用IP限制可能导致锁定自己。

追踪 (Keep Track)

“To measure is to know”. Keeping track is often a disregarded aspect of data security, yet its importance cannot be discouned. By keeping track what I mean is:

“衡量就是要知道” 。 跟踪通常是数据安全性不可忽视的方面,但是其重要性不可否认。 通过跟踪我的意思是:

  • who knows the password?

    谁知道密码?
  • when is the last time the passwords changed?

    密码上一次更改是什么时候?
  • who has IP access?

    谁拥有IP访问权限?
  • logging logins (and also attempts)

    记录登录信息(以及尝试次数)

Keeping track will not only prevent a loss of data (for example, by restricting access to those who no longer need it) but it may also help diagnose what went wrong in the event of data loss.

跟踪不仅可以防止数据丢失(例如,通过限制不再需要数据的人员访问数据),而且还可以帮助诊断在数据丢失的情况下出了什么问题。

不对密码进行硬编码 (Do Not Hardcode Passwords)

Do not, I repeat, do not hardcode passwords. Not once, not briefly, not internally, never. Even though the risk of hardcoding is limited when you implement the previous steps mentioned (i.e. 2FA, IP restriction), there is a big chance you didn’t and it will likely backfire. It often goes like this:

我重复一遍,不要密码进行硬编码。 没有一次,不是短暂的,不是内部的,永远不会。 尽管在实施上述步骤(例如2FA,IP限制)时硬编码的风险受到限制,但您没有这样做的机会很大,并且很可能适得其反。 它通常是这样的:

  1. Coder hardcodes database passwords because it’s easier. Currently, there is no risk because it’s only for internal use and will be removed later.

    编码器对数据库密码进行硬编码,因为它更容易。 目前,没有任何风险,因为它仅供内部使用,以后将被删除。
  2. Coder forgets to remove it.

    编码器忘记删除它。
  3. The code ends up on a private Github or some webscript

    该代码最终出现在私有的Github或某些Web脚本上
  4. A crawler picks it up

    爬虫捡起它
  5. Chaos ensues…

    混乱接…而至……

I am lucky to have never experienced this myself, but I have heard plenty of stories. There is however one good thing to hard coding passwords though: you will only need to do it once. Because once a crawler picks it up and wreaks havoc on your database/cloud instance/web server, you are unlikely to repeat your mistake.

我很幸运从未经历过这种情况,但是我听到了很多故事。 但是,对密码进行硬编码是一件好事:您只需要做一次。 因为一旦搜寻器将其捡起并对数据库/云实例/ Web服务器造成严重破坏,那么您就不太可能重蹈覆辙。

了解风险 (Understand The Risk)

For the last part, I will be a bit less strict. When it comes to security, it’s all about understanding the risk. Of course, ideally, you would implement all these measures and much more, but it might not be worth the effort. You need to distinguish between a simple personal project and a script for a big client with confidential data. The risk should match the measures and one can better overestimate the risk than underestimate them.

对于最后一部分,我将不那么严格。 当涉及到安全性时,这全都在于了解风险。 当然,理想情况下,您将实施所有这些措施以及更多措施,但这可能不值得付出努力。 您需要区分简单的个人项目和具有机密数据的大客户的脚本。 风险应与措施相匹配,与低估措施相比,可以更好地高估风险。

Lastly, you need to also understand the difference between data and insights. Most of the time (but not always!) raw data requires the highest degree of protection, whereas insights can be shared with less restriction. The reason I am emphasizing this difference is because it is often difficult for non-technical clients to comply with your security standards. However, if the insights don’t carry the same risk as the data itself, you can evaluate for yourself if you can simply email them or share them via other channels. Data often loses its confidentiality once rows are being aggregated.

最后,您还需要了解数据和见解之间的区别。 大多数时候(但并非总是如此!)原始数据需要最高程度的保护,而洞察力则可以更少的限制共享。 我之所以要强调这种差异,是因为非技术客户通常很难遵守您的安全标准。 但是,如果洞察力与数据本身所带来的风险不同,那么您可以简单地通过电子邮件发送或通过其他渠道共享它们来自己进行评估。 汇总行后,数据通常会失去其机密性。

结论 (Conclusion)

“Data is the new oil,” and as a data professional, you will often be responsible for storing and safekeeping a lot of this valuable oil. While it might sound daunting to carry this responsibility, there are a lot of simple ways to reduce this risk. Implementing some or all of the steps as described in this post, will make it very difficult to access your data for those who are not supposed to. However, do keep in mind:

“数据就是新油” ,作为数据专业人员,您通常将负责存储和保管大量这种有价值的油。 虽然承担这一责任听起来很艰巨,但是有很多简单的方法可以降低这种风险。 如本文所述,执行一些或所有步骤将使那些本来应该去的人很难访问您的数据。 但是,请记住:

  • No measure is 100% guaranteed.

    没有任何措施可以保证100%。
  • Your security is only as strong as its weakest point.

    您的安全性只有最弱点。

The list of considerations in this article is not exhaustive. There are plenty of other measures you could (and maybe should) implement like, encryption, injection vulnerabilities, hashing and many more.

本文中的注意事项列表并不详尽。 您还可以(也许应该)实施许多其他措施,例如加密,注入漏洞,哈希等。

Image for post
(Image source: https://xkcd.com/327/)
(图片来源: https : //xkcd.com/327/ )

When you are dealing with data of truly great importance or confidentiality, I suggest you solicit the consult of a certified IT Auditor who could guide you through the steps you ought to implement.

当您处理真正重要或机密的数据时,建议您咨询经过认证的IT审计员,该审计员可以指导您完成应执行的步骤。

About me: My name is Bruno and I work as a data scientist, currently specializing in Power BI. You can connect with me on GitHub or LinkedIn.

关于我:我叫Bruno,我是一名数据科学家,目前专门研究Power BI。 您可以在 GitHub LinkedIn 上与我联系

翻译自: https://towardsdatascience.com/safety-considerations-for-working-with-data-5442b23ce0af

数据处理注意事项

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值