关键的超越：.NET SDK和Apache Spark

最新推荐文章于 2022-10-30 18:42:40 发布

cullen2012

最新推荐文章于 2022-10-30 18:42:40 发布

阅读量215

点赞数

文章标签：大数据编程语言 python 人工智能 java

原文链接：https://habr.com/en/post/508964/

版权

When Alex Garland’s series Devs (on FX and Hulu) came out this year, it gave developers their own sexy Hollywood workup. Who knew that coders could get snarled into murder plots and love triangles just for designing machine learning programs? Or that their software would cause a philosophical crisis? Sure, the average day of a developer is more code writing than murder but what a thrill to author powerful new program.

当Alex Garland的系列《 Devs》(在FX和Hulu上)今年问世时，它给开发人员带来了自己的性感好莱坞作品。谁知道程序员可能只是为了设计机器学习程序而陷入谋杀案和三角恋中？还是他们的软件会引发哲学危机？当然，开发人员的日常工作多于编写代码而不是谋杀，但这对编写功能强大的新程序是一种刺激。

Machine learning, big data and AI advancements seem like a giant leap forward both for technology and human experience. In 2017 CEO’s of major companies told MIT’s Sloan Management Review that AI, machine learning and big data will be the biggest disruptions (in a good way!) of the future.

机器学习，大数据和AI的进步似乎在技术和人类体验上都是一个巨大的飞跃。 2017年，大型公司的首席执行官在接受MIT的《斯隆管理评论》(Sloan Management Review)采访时说，人工智能，机器学习和大数据将是未来最大的颠覆(以一种好的方式！)

Already the big 3 are revolutionizing industries. For example, Metlife uses these machine learning to improve speech recognition so doctors can file their patient notes in real time. Medical offices can now transfer information faster to improve decision-making and care. B2C corporations use it to analyze audiences for engagement and leverage marketing to reduce time and money on intermittent customers. B2B corporations want to analyze the massive data they collect, so they hire developers to create programs that anticipate their clients’ needs before anyone sends an order. Imagine how that might have played in the COVID 19 crisis, if manufacturers saw Google searches or subtle demand spikes for certain products. What if software helped them "identify new local suppliers" so they could pivot production within hours instead of weeks.

三大巨头已经在颠覆行业。例如，大都会人寿使用这些机器学习来改善语音识别，以便医生可以实时归档患者病历。医务室现在可以更快地传输信息，以改善决策和护理。 B2C公司使用它来分析受众的参与度并利用营销来减少间歇性客户的时间和金钱。 B2B公司希望分析他们收集的海量数据，因此他们雇用开发人员创建程序，以在任何人发送订单之前预见其客户的需求。想象一下，如果制造商看到Google搜索或某些产品的微妙需求激增，在COVID 19危机中这可能是怎么回事。如果软件能够帮助他们“确定新的本地供应商”，以便他们可以在数小时而不是数周内完成生产，该怎么办？

So it may not sound sexy to say that every development towards openness and transcendence in SDKs is transformative, but it is. It’s why we should celebrate Microsoft’s development vision to ramp up their Azure SQL partnerships and then to integrate Apache Spark into their .NET offerings.

因此，说SDK中对开放性和超越性的每一项发展都具有变革性，这听起来可能并不性感，但事实确实如此。这就是为什么我们应该庆祝微软的发展愿景，以扩大其Azure SQL合作伙伴关系，然后将Apache Spark集成到他们的.NET产品中。

湖泊，工厂和分析的简史 (A Short History of Lakes, Factories, and Analytics)

Late in 2019, Microsoft’s Azure SQL Data Warehouse got a snappy new branding, Synapse Analytics. Synapse integrated its Azure Data Lake Storage, Azure Data Factory and the popular Apache Spark. Spark, which began in 2009, is the premier big data framework. It distributes the power to crunch enormous data sets across computers through an API that eases the workload of developers. Developers love Spark because it provides native bindings across Java, Scala, Python and R programming. What was missing was .NET SDK, and Microsoft’s participation in the world of big data processing. That is until recently.

在2019年末，微软的Azure SQL数据仓库获得了一个崭新的品牌Synapse Analytics。 Synapse集成了其Azure Data Lake Storage，Azure Data Factory和流行的Apache Spark。 Spark始于2009年，是首要的大数据框架。它可以通过API分配强大的功能来处理计算机上的大量数据，从而减轻了开发人员的工作量。开发人员喜欢Spark，因为它提供了跨Java，Scala，Python和R编程的本地绑定。缺少的是.NET SDK，以及Microsoft在大数据处理领域的参与。那是直到最近。

In November 2019, Microsoft released a new version of SQL Server and made it available for Linux, which open-source developers love, and don’t show any signs of abandoning. Working with the open-source community is always a step toward computing transcendence, but also something of a gamble. It offers growth and feedback from developers but also reduces ownership (thus it may affect profit). Yet when companies like Microsoft choose to transcend, everyone benefits. This time, the improvements in SQL Server 2019, which dovetailed with Azure Synapse Analytics, laid the foundation for opening up to .NET frameworks. For the time being, it empowers .NET 3.1 but when .NET 5 is released later this year, Microsoft’s capacities will expand further. .NET 5 will be a unified code with new technology enhancements.

在2019年11月，微软发布了新版本SQL Server并将其用于Linux，这是开源开发人员所钟爱的，并且没有任何放弃的迹象。与开源社区合作始终是朝着超越计算迈出的一步，但这也是一场赌博。它提供了增长和开发人员的反馈，但也减少了所有权(因此可能会影响利润)。但是，当像Microsoft这样的公司选择超越时，每个人都会受益。这次，SQL Server 2019中的改进与Azure Synapse Analytics相吻合，为开放.NET框架奠定了基础。暂时，它支持.NET 3.1，但是当.NET 5于今年晚些时候发布时，微软的能力将进一步扩大。 .NET 5将是具有新技术增强功能的统一代码。

微软采用Apache Spark (Microsoft Moves In with Apache Spark)

The 2019 integration of Azure SQL Data Warehouse (2015-2018) with other services, including data warehouse, data lake, machine learning, and data pipelines allows the data bricks to be bound together. Here’s how it works: Spark tables are queryable without code calling for the creation of an external table. This works at the provisioning of a Synapse cluster. The Azure Data Lake Storage (ADLS) now stores Spark SQL tables and requisitions those along with native ADLS tables. The engines powering this query integrate with Apache Parquet as well. Furthermore, Azure Synapsis accommodates the development and execution of non-C# or other languages such as Python, Scala and native Spark SQL. The integration improves Synapse's ability to manage machine learning (it works with Spark Mllib), and makes Synapse’s studio competitive with AWS (Amazon Web Services).

2019年将Azure SQL数据仓库(2015-2018)与其他服务(包括数据仓库，数据湖，机器学习和数据管道)的集成允许将数据块绑定在一起。它是这样工作的：Spark表可查询，而无需代码来创建外部表。这在配置Synapse群集时起作用。 Azure数据湖存储(ADLS)现在存储Spark SQL表并将其与本机ADLS表一起征用。支持该查询的引擎也与Apache Parquet集成。此外， Azure Synapsis可以适应非C＃或其他语言(例如Python，Scala和本机Spark SQL)的开发和执行。集成提高了Synapse的机器学习管理能力(与Spark Mllib配合使用)，并使Synapse的工作室与AWS(Amazon Web Services)竞争。

Apache Spark和.NET (Apache Spark and .NET)

What the world needs is for every major coding language to marry Apache Spark to its own popular frameworks. Why? Because Spark eclipses all other software for big data crunching and machine learning. Apache Spark maintains a reputation for speed compared to other software programs. It offers in-memory functions. It supports SQL along with real-time data and graph processing. If organizations need machine learning, Apache Spark enables it. It’s hard to name an industry that doesn’t employ Apache Spark. Think financial institutions, gaming, telecoms, tech giants, and government sources, which brings us to .NET news. Microsoft announced .NET for Apache Spark with bindings for C# and F# languages.

世界需要的是每一种主要的编码语言都可以将Apache Spark与它自己的流行框架结合起来。为什么？因为Spark使所有其他软件黯然失色，从而无法进行大数据处理和机器学习。与其他软件程序相比，Apache Spark在速度方面享有盛誉。它提供内存功能。它支持SQL以及实时数据和图形处理。如果组织需要机器学习，则Apache Spark会启用它。不使用Apache Spark的行业很难命名。想想金融机构，游戏，电信，科技巨头和政府资源，这使我们了解了.NET新闻。微软发布了针对Apache Spark的.NET，并带有针对C＃和F＃语言的绑定。

Considering that twenty years of lines of .NET code could be unified with big data through this move, the evolution of Microsoft’s once soiled systems are crumbling. This is the opposite of an empire crumbling. Rather it’s a case study in how to build longevity and power so that one of the leading empires of software can remain powerful in the fast-changing software geography. What does this mean for .NET based software systems? First, big data analysis, with the power to stream data and enhance machine learning cannot be ignored. We live in a data-driven, data-science culture. Data science improves every enterprise. The integration of Apache Spark with .NET makes it pop. ZDNet reports that it “seems to be more than just a bundling of the open-source big data analytics framework.” It’s a “true” integration.

考虑到通过这一举动，二十年的.NET代码行可以与大数据统一，微软曾经肮脏的系统的发展正在崩溃。这与帝国崩溃的相反。相反，这是一个有关如何提高寿命和功能的案例研究，以使领先的软件帝国之一能够在瞬息万变的软件地理环境中保持强大的实力。这对基于.NET的软件系统意味着什么？首先，大数据分析具有流传输数据和增强机器学习的能力，这一点不容忽视。我们生活在数据驱动的数据科学文化中。数据科学可以改善每个企业。 Apache Spark与.NET的集成使其流行。 ZDNet报告说，“它似乎不仅仅是捆绑开源大数据分析框架。” 这是一个“真正的”整合。

2020年发展| 微软 (2020 Developments | Microsoft)

In Spring 2020, Microsoft added support of in-memory .NET Dataframes for and created Spark.NET. In-memory functions allow for faster management, return, and analysis of big data sets. Spark.NET boasts new convenience APIs specifically for two kinds of user-defined functions (UDFs): vector and scalar. Spark works through Arrow format, which standardizes a language-independent format for working with data in-memory. The two new APIs should speed up serialization and make data transfers more efficient. Because of these APIs, Spark.NET eliminates the overhead of converting data in and out of formats to process. Also, the APIs for vector and scalar can reduce lines of code for .NET developers to write.

2020年Spring，Microsoft增加了对内存.NET数据帧的支持，并创建了Spark.NET。内存中功能可以更快地管理，返回和分析大数据集。 Spark.NET拥有专门用于两种用户定义函数(UDF)的新便利API：向量和标量。 Spark通过Arrow格式工作，该格式标准化了一种独立于语言的格式，用于处理内存中的数据。这两个新的API应该可以加快序列化速度，并使数据传输更加高效。由于有了这些API，Spark.NET消除了将数据以格式转换为格式进行处理的开销。同样，矢量和标量的API可以减少.NET开发人员编写的代码行。

In Microsoft’s blog, Brigit Murtaugh provides several examples of how the new API’s will make for cleaner code and more efficient programs. But that’s not all that Microsoft has done to make Spark.NET accessible to coders. Andrew Brust, developer and writer for ZDNet, gives a solid run-down of all the ways that Microsoft makes it easy for developers to fire up Spark.NET. First, Microsoft provides robust onboarding guidance. Framework installation support leads to the creation of a sample application and running it. It guides developers through the required dependencies to install, the configuration steps for the framework, then the installation of Spark.NET, including the creation and execution of the Spark sample application. This is a ten- minute process. Developers who prefer to work in Visual Studio can access Spark.NET as well.

在Microsoft的博客中，Brigit Murtaugh提供了几个示例，说明新API如何使代码更简洁，程序更高效。但这还不是Microsoft使编码人员可以访问Spark.NET的全部工作。 ZDNet的开发人员和作家Andrew Brust全面介绍了Microsoft使开发人员轻松启动Spark.NET的所有方式。首先，Microsoft提供了强大的入门指南。框架安装支持可导致创建示例应用程序并运行它。它指导开发人员完成所需的依赖项以进行安装，框架的配置步骤，然后安装Spark.NET，包括创建和执行Spark示例应用程序。这是一个十分钟的过程。喜欢在Visual Studio中工作的开发人员也可以访问Spark.NET。

What’s not to love? No one was murdered in the making of this union. I’m sure there’s healthy jealousy about which language and framework is best, but I cannot prove any love triangles have estranged actual humans. While .NET’s integration with Apache Spark may not solve the philosophical conundrum of determinism, it does move forward functions and capacities that transform a multitude of industries. With thousands of .NET code, now those programs can leverage the efficiency and power of big data to make transcendental changes to the industry.

不去爱的种种？建立这一联盟的过程中没有人被谋杀。我敢肯定，哪种语言和框架是最好的，这是一种健康的嫉妒，但是我不能证明任何三角恋已经使实际的人疏远了。 .NET与Apache Spark的集成可能无法解决确定性的哲学难题，但它确实推动了改变众多行业的功能和能力。借助成千上万的.NET代码，这些程序现在可以利用大数据的效率和功能来对行业进行超越性的改变。

翻译自: https://habr.com/en/post/508964/

cullen2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
关键的超越：.NET SDK和Apache Spark

When Alex Garland’s series Devs (on FX and Hulu) came out this year, it gave developers their own sexy Hollywood workup. Who knew that coders could get snarled into murder plots and love triangles jus...
复制链接

扫一扫