武汉国家网络安全与创新_网络安全创新者的数据运营：当今使用的五种最佳实践

最新推荐文章于 2024-09-23 14:24:34 发布

weixin_26722031

最新推荐文章于 2024-09-23 14:24:34 发布

阅读量527

点赞数

文章标签： python

原文链接：https://medium.com/nexla/data-operations-for-cybersecurity-innovators-five-best-practices-to-use-today-57d910217571

版权

武汉国家网络安全与创新

Today’s enterprises face new attack surfaces such as deepfakes and ransomware, while countering traditional threats like denial of service. It is well regarded that countering these threats requires the ability to leverage vast amounts of diverse data and derive real-time intelligence from it. In addition, increasingly the cybersecurity industry also recognizes the importance of “the human element”, and that technology alone isn’t the magic bullet for data-driven cybersecurity. This is particularly true for all the data operations required to build the latest cybersecurity innovation whether it be threat detection, endpoint security, network monitoring, or security analytics.

牛逼 ODAY的企业面临新的攻击面如deepfakes和勒索，在打击像拒绝服务传统的威胁。众所周知，应对这些威胁需要能够利用大量多样的数据并从中获取实时情报。此外，网络安全行业也越来越认识到“ 人为因素 ”的重要性，而且仅凭技术并不是数据驱动的网络安全的灵丹妙药。对于构建最新的网络安全创新所需的所有数据操作，无论是威胁检测，端点安全，网络监控还是安全分析，尤其如此。

So what should you consider when scaling up your ability to leverage data for building and adopting cutting edge cybersecurity technology? We share some best practices from our experience working with cybersecurity innovators and helping them stay ahead of the game. Read along to also learn about enabling human collaboration, as well as human-machine collaboration.

那么，在扩大利用数据的能力来构建和采用最先进的网络安全技术时，应该考虑什么？我们与网络安全创新者一起分享经验，并分享一些最佳实践，帮助他们保持领先地位。请继续阅读，以了解有关实现人工协作以及人机协作的信息。

1.利用来自许多来源的数据 (1. Leverage data from many sources)

Advanced cybersecurity algorithms need to leverage data in many dimensions to enrich existing or in-house data. External data sources can provide a variety of data including IP addresses, whois, user-agent, DNS data, security advisories, threat lists etc. Data providers include Pulsedive, Cymon, Virustotal, Fingerbank, UrlVoid and hundreds more. Often there are multiple providers for similar types of data, each with varying coverage and quality. That makes it necessary to combine data from several sources. Once you have complete and accurate information it can be used to enrich internal information such as event logs.

先进的网络安全算法需要在多个维度上利用数据来丰富现有或内部数据。外部数据源可以提供各种数据，包括IP地址，whois，用户代理，DNS数据，安全公告，威胁列表等。数据提供者包括Pulsedive ， Cymon ， Virustotal ， Fingerbank ， UrlVoid等数百种。通常，有多个提供商提供相似类型的数据，每个提供商的覆盖范围和质量都有所不同。这使得必须合并来自多个来源的数据。一旦您获得了完整而准确的信息，就可以用来丰富内部信息，例如事件日志。

The Challenge: Each Data Provider has:

挑战：每个数据提供者都有：

Their own feed delivery. Information could be coming in files, APIs, or even Emails.
自己喂饲料。信息可能来自文件，API甚至电子邮件。
A unique payload structure in how they name fields and organize data.
它们如何命名字段和组织数据的独特有效负载结构。
Varying data quality. For example: If the input signal is an IP address two providers may have different recency and quality of information on the same IP.
变化的数据质量。例如：如果输入信号是IP地址，则两个提供程序可能在同一IP上具有不同的新近度和信息质量。

The Solution: Tools that allow you to easily connect and unify data from multiple sources while making real time decision on what information to use from which provider

解决方案：这些工具可让您轻松地连接和统一来自多个来源的数据，同时实时决定从哪个提供商使用什么信息

2.合作！授权分析人员，以人为因素进行扩展 (2. Collaborate! Empower analysts, scale with human element)

The Challenge: The promise of Data Operations is to deliver speed and reliability in data management. Achieving this is not easy, and leading cybersecurity technology companies dedicate some of their best engineers and product leaders to solve this mission critical need. Unfortunately this never ending work can be overwhelming and even feel like grunt work for the best engineers.

挑战：数据运营的承诺是提供数据管理的速度和可靠性。要做到这一点并不容易，领先的网络安全技术公司会竭尽所能，聘请一些最好的工程师和产品负责人来解决这一关键任务。不幸的是，这项永无止境的工作可能会令人不知所措，甚至对于最好的工程师而言，这简直是艰巨的工作。

The Solution: Empower and leverage the broader team. For example Threat Analysts have all the know how to:

解决方案：授权并利用更广泛的团队。例如，威胁分析人员具有以下全部知识：

Expertly look at data from different providers
专业地查看来自不同提供商的数据
Decide how to combine similar data from different sources
确定如何合并来自不同来源的相似数据
Enrich data to create high quality curated datasets specifically for your technology
丰富数据以创建专门针对您的技术的高质量精选数据集
Set up validation rules and quality checks
设置验证规则和质量检查
Translate data from the structure of an external provider to your internal canonical view
将数据从外部提供程序的结构转换为内部规范视图
Further collaboration: Share datasets to enable teams to leverage the work of each other. Comments and descriptions are essential for enabling such collaboration.
进一步的协作：共享数据集，以使团队能够相互利用。注释和描述对于实现这种协作至关重要。

What prevents Threat Analysts from doing this is lack of tooling designed for business analysts. With the right tool your Threat Analysts become collaborators and act as a force multiplier for your data engineering team.

阻止威胁分析师执行此操作的原因是缺少为业务分析师设计的工具。使用正确的工具，您的Threat Analysts可以成为协作者，并充当数据工程团队的力量倍增器。

3.创建一种灵活的方法来将数据传递到您的应用程序 (3. Create a Flexible approach to delivering data to your Application)

Every Application has its optimal way of consuming data. A database may work best for your analytics tool, a file feed for your ML training sets, and data APIs for your applications. Sometimes the same data may be used in multiple applications, and therefore staying flexible is the key to success.

每个应用程序都有使用数据的最佳方式。数据库可能最适合您的分析工具，ML训练集的文件供稿以及应用程序的数据API。有时，相同的数据可能会在多个应用程序中使用，因此保持灵活性是成功的关键。

Let’s consider the case of an application that needs to get data via an API. This is classic Service Oriented Architecture that allows applications to only depend on API response and performance while keeping them independent of the system that delivers the API.

让我们考虑一个需要通过API获取数据的应用程序的情况。这是经典的面向服务的体系结构，它允许应用程序仅依赖于API响应和性能，同时使它们独立于提供API的系统。

Challenge 1: Underlying data source is an API

挑战1：底层数据源是API

Your application calls an API which in turn calls the data provider’s API. Let’s say that data is for blacklisted IPs.
您的应用程序调用一个API，该API依次调用数据提供者的API。假设数据是针对列入黑名单的IP的。
This data can change periodically so you need real-time, fresh data.
这些数据可能会定期更改，因此您需要实时的最新数据。
Data has to be sourced from multiple API providers in order to ensure quality and coverage. Each provider may have their own authentication and payload structure.
数据必须来自多个API提供商，以确保质量和覆盖范围。每个提供程序可以具有自己的身份验证和有效负载结构。
While data needs to be fresh, you may not want to call the data providers API too many times due to rate limits and usage based costs. A little bit of intelligent caching can go a long way without negative impacts on your application.
尽管数据需要更新，但是由于速率限制和基于使用的成本，您可能不想调用数据提供者API太多次。一点点的智能缓存可以很长的路要走，而不会对您的应用程序产生负面影响。

The Solution: An API Proxy acts as a transparent layer between your application making the API call and the API provider giving you a unified interface to data from an API in a format you need, but without having to write code. In any API proxy technology you should look for:

解决方案： API代理充当应用程序进行API调用和API提供程序之间的透明层，从而使您可以以所需的格式(但无需编写代码)为来自API的数据提供统一的接口。在任何API代理技术中，您都应该寻找：

A self-service UI to set up any data provider API and translate its response payload.
自助服务用户界面，用于设置任何数据提供者API并转换其响应有效负载。
No-code usability so that threat analysts can act as collaborators.
没有代码的可用性，因此威胁分析人员可以充当协作者。
Ease of use in adding new providers or removing old ones.
易于添加新的提供程序或删除旧的提供程序。
Manage authentication information and notify in case of issues.
管理身份验证信息，并在出现问题时进行通知。
Chain API calls if needed.
如果需要，链API调用。
Handle and convert across response formats e.g. XML to JSON.
处理和转换响应格式，例如XML到JSON。
Provide API proxy mechanism to access data.
提供API代理机制以访问数据。
Cache recently fetched data.
缓存最近获取的数据。

Challenge 2: Underlying data source is a Database, File, Email or other static source

挑战2：基础数据源是数据库，文件，电子邮件或其他静态源

API response needs to come from the underlying source data
API响应需要来自基础源数据
Performance management becomes even more critical
绩效管理变得更加关键
Change management is also needed for data that has a periodic update via a file
通过文件定期更新的数据也需要更改管理
As a user you need to design the API response format, such as JSON, to fit your needs since the underlying data is most likely flat structure
作为用户，您需要设计API响应格式(例如JSON)来满足您的需求，因为基础数据很可能是平面结构

The Solution: A few companies offer a unified solution for this. Nexla is one of them. An ideal solution should have:

解决方案：一些公司为此提供了统一的解决方案。 Nexla是其中之一。理想的解决方案应具有：

Automated generation of API from any underlying data source
从任何基础数据源自动生成API
A no-code UI to design the API response
用于设计API响应的无代码UI
Change management to keep the data fresh and caching to keep the API fast
变更管理以保持数据新鲜并进行缓存以保持API快速

Additional Data Integrations: In addition to API access applications need traditional ETL and ELT data integration mechanisms to support analytics, search, and machine learning use cases. We will be covering this in greater detail in a follow-on post.

其他数据集成：除API访问外，应用程序还需要传统的ETL和ELT数据集成机制来支持分析，搜索和机器学习用例。我们将在后续文章中对此进行更详细的介绍。

4.监视，管理并保持数据质量 (4. Monitor, manage, and stay on top of Data Quality)

The Challenge: Quality of data providers can vary, especially when it comes to amount of coverage across geographic regions or categories of data. It would be excellent if you could consistently rely on one provider for every dimension of a particular data set but the reality is more likely that a patchwork of providers will be needed. Provider data quality can also vary over time, adding the complexity of monitoring your data providers for quality changes a best practice.

挑战：数据提供者的质量可能会有所不同，尤其是涉及跨地理区域或数据类别的覆盖范围时。如果您可以始终依靠一个提供程序来处理特定数据集的每个维度，那就太好了，但现实情况是，可能需要更多的提供程序。提供者数据质量也可能随时间变化，这增加了监视数据提供者质量变化的最佳实践的复杂性。

The Solution: Modern data-driven tools will give you one consistent interface for merging data from multiple providers and automatically monitor your data and its changing quality. In addition they allow you to set your own rules and thresholds. These are all tasks that have traditionally needed engineering. Look for design oriented tools where the process can be code-free with instantaneous updates. The best of breed tools will let you extend the no-code functionality with plugin code modules, once again enabling collaboration and reusability between engineers and the rest of the organization.

解决方案：现代化的数据驱动工具将为您提供一个一致的界面，用于合并来自多个提供商的数据，并自动监视您的数据及其不断变化的质量。此外，它们还允许您设置自己的规则和阈值。这些都是传统上需要工程设计的任务。寻找面向设计的工具，使该过程可以通过即时更新而无需代码。最好的工具将使您能够使用插件代码模块扩展无代码功能，从而再次实现工程师与组织其他成员之间的协作和可重用性。

5.不要局限于特定的数据执行引擎 (5. Don’t get tied down to a specific data execution engine)

Companies innovating cutting edge cybersecurity technology need to leverage a wide variety of data at varying complexity of size, scale, and quality. That is why when it comes to the underlying execution engine they may need:

创新尖端网络安全技术的公司需要利用大小，规模和质量各不相同的复杂数据。这就是为什么当涉及到底层执行引擎时，它们可能需要：

A real-time system for API data
API数据的实时系统
A streaming system such as Kafka for events and logs
流式系统，例如Kafka，用于事件和日志
A batch Spark or Map reduce system for large scale data.
批处理Spark或Map缩减系统可处理大规模数据。

The Challenge: Determining the right execution engine is not often easy. Having access to multiple such execution engines is even harder, as it requires dedicated infrastructure and operations expertise. This is in addition to customized engineering to plug data into the particular execution engine.

挑战：确定正确的执行引擎通常并不容易。访问多个这样的执行引擎更加困难，因为这需要专门的基础架构和运营专业知识。这是用于将数据插入特定执行引擎的定制工程的补充。

The Solution: Data teams need the ability to use the execution engine that is best suited for a given use case. Systems like Nexla encapsulate multiple execution engines. This approach has three benefits.

解决方案：数据团队需要使用最适合给定用例的执行引擎的能力。像Nexla这样的系统封装了多个执行引擎。这种方法具有三个好处。

The entire execution process is taken care of automatically, and the user doesn’t need to deploy, manage, or monitor anything.
整个执行过程将自动完成，用户不需要部署，管理或监视任何内容。
The user doesn’t have to worry about which execution engine to use because the system can make that decision.
用户不必担心要使用哪个执行引擎，因为系统可以做出决定。
Most importantly the user’s workflow or interface doesn’t change because they are working with more data, or a different format or streaming instead of batch.
最重要的是，用户的工作流或界面不会更改，因为他们正在处理更多数据，或者使用不同的格式或流式处理而不是批处理。

Managed data execution with full flexibility again empowers the non-engineers. They can make decisions about what to do with data without having expertise in the underlying data systems.

具有完全灵活性的托管数据执行再次赋予了非工程师权力。他们无需使用底层数据系统的专业知识就可以决定如何处理数据。

Summary: By adopting a well thought out strategy on how to manage data and its ongoing operations, companies innovating Cybersecurity technology can accelerate their pace of innovation and stay nimble for future change. Agile data integration, automated operations, adaptive data access mechanisms and collaborative ease of use as a team are all key elements of any solution.

简介：通过对如何管理数据及其正在进行的操作采用经过深思熟虑的战略，创新网络安全技术的公司可以加快创新步伐，并灵活应对未来的变化。敏捷数据集成，自动化操作，自适应数据访问机制以及团队协作的易用性都是任何解决方案的关键要素。

网络安全创新者的Nexla工作流程 (The Nexla Workflow for Cybersecurity Innovators)

At Nexla the platform we have built helps teams at the forefront of leveraging data for intelligent cybersecurity technology.

在Nexla，我们建立的平台可帮助处于前沿的团队利用数据来实现智能网络安全技术。

Integrate with any cybersecurity data source.

与任何网络安全数据源集成。

Analysts configure. Nexla handles all complexities and limitations including authentication, rate-limits, etc. to deliver the latest data.
分析师配置。 Nexla处理所有复杂性和限制，包括身份验证，速率限制等，以提供最新数据。
Pre-built configurations provided by Nexla for popular data sources
Nexla为流行数据源提供的预配置

Self-serve tools for analysts. No data engineering necessary

面向分析师的自助服务工具 。无需数据工程

Analysts set up no-code transformations
分析师设置无代码转换
Enrich data with internal or third party information
利用内部或第三方信息丰富数据
Modify and normalize data schema/model
修改和规范化数据模式/模型
Set validation rules and error conditions
设置验证规则和错误条件

Share, reuse, and collaborate on datasets, transformations, and error management. Create shared knowledge through comments, notes, and annotations.

在数据集，转换和错误管理上共享，重用和协作。通过评论，注释和注释创建共享知识。

Data Integration flows including ETL, ELT, API Proxy and Push API

数据集成流程，包括ETL，ELT，API代理和Push API

Fully managed and monitored data streams and pipelines

全面 管理和监控的数据流和管道

Interested in learning more? Get an introductory demo to Nexla.

有兴趣了解更多吗？ 获取 Nexla 的入门演示。