适用于数据工程师的javascript

数据工程 (DATA ENGINEERING)

The latest StackOverflow Developer survey deemed JavaScript as the most popular technology closely followed by SQL as the third most popular technology. The former was considered to be a client-side scripting/front end language until a number of years ago when JavaScript based servers got widespread attention. Since then, JavaScript projects have been initiated for almost all major areas of work in the umbrella of software development. Data Engineering is one such field where JavaScript is being used more than ever.

最新的StackOverflow Developer调查认为JavaScript是最流行的技术,紧随其后的是SQL,是第三大流行的技术。 前几年被认为是一种客户端脚本/前端语言,直到几年前基于JavaScript的服务器受到广泛关注。 从那时起,在软件开发领域几乎所有主要工作领域都启动了JavaScript项目。 数据工程就是这样一个领域,JavaScript的使用比以往任何时候都多。

There are already a lot of visualisation libraries written in JavaScript such as D3.js, C3.js, Charts.js and so on. Much has been written about them but not about hardcore data engineering tools related to handling databases, data cleansing, ETL, data pipeline orchestration and so on. Let’s take a look at some of the most popular and useful active JavaScript projects that data engineers could learn and use in their current work.

已经有很多用JavaScript编写的可视化库,例如D3.js,C3.js,Charts.js等。 关于它们的文章很多,但没有涉及与处理数据库,数据清理,ETL,数据管道编排等有关的核心数据工程工具。 让我们看一下数据工程师可以在当前工作中学习和使用的一些最受欢迎和最有用的活动JavaScript项目。

Knex.js (Knex.js)

It is a query builder for PostgreSQL, MySQL, MariaDB, MSSQL, Oracle and Amazon Redshift. With over 300 contributors and about 150 releases till date, it is by far the most popular query builder available today. Fun fact — the author of Slonik wrote this about why using Knex.js is bad for dynamic query building, got a lot of flak — he’s made some general good points in the following piece but NOT enough to convince me that Knex.js is bad!

它是PostgreSQL,MySQL,MariaDB,MSSQL,Oracle和Amazon Redshift的查询生成器。 迄今为止,它已有300多个贡献者和大约150个版本,是迄今为止最流行的查询生成器。 有趣的事实 -Slonik的作者写了这篇文章, 说明了为什么使用Knex.js不利于动态查询构建,产生了很多麻烦 -他在下一篇文章中提出了一些一般性的观点,但不足以说服我说Knex.js不好!

AlaSQL (AlaSQL)

A JavaScript based database for the browser that works for mobile apps, browsers and node.js applications. It’s really good at handling CSVs & Excel files. This project has 5.2K stars on GitHub and has 51K downloads/month from npm. It is being maintained pretty well with the last code push done a week ago.

浏览器的基于JavaScript的数据库,适用于移动应用程序,浏览器和node.js应用程序。 它非常擅长处理CSV和Excel文件。 该项目在GitHub上有5.2K颗星,每月从npm下载51000次。 一个星期前完成的最后一次代码推送使它保持得很好。

KoopJS (KoopJS)

There are two projects worth looking at from the Koop project — the first one is, not surprisingly, called Koop — it’s an ETL utility for geospatial data. The project is well maintained and is sponsored by ESRI which is the company to talk about when we talk about location intelligence.

Koop项目中有两个值得研究的项目-毫无疑问,第一个项目称为Koop-它是用于地理空间数据的ETL实用程序。 该项目维护良好,由ESRI赞助,当我们谈论位置智能时,ESRI就是要谈论的公司

Calling it a complete ETL tool would be a mistake. Firstly, it is just meant for geospatial data as the transform it supports relate to geospatial data. For example, transforming geospatial data on the fly into GeoJSON and Vector Tiles.

称其为完整的ETL工具将是一个错误。 首先,它仅用于地理空间数据,因为它支持的转换与地理空间数据有关。 例如,将地理空间数据即时转换为GeoJSON和Vector Tiles。

NoFlo (NoFlo)

A component based programming environment following the principles of Flow-Based Programming (the logic of a program is defined in a graph). Many people will relate this to Airflow or similar orchestrators. While there are similarities in the sense that NoFlo can be programmed to be used somewhat like an orchestrator, it is a bit vast in scope than the popular orchestrators. You can use noflo-nodejs to execute NoFlo code on Node.js.

遵循基于流程的编程原理(程序的逻辑在图形中定义)的基于组件的编程环境。 许多人会将其与气流或类似的协调器联系起来。 尽管从某种意义上说,可以相似地将NoFlo编程为像编排器一样使用,但是它的范围比流行的编排器要宽一些。 您可以使用noflo-nodejs在Node.js上执行NoFlo代码。

恩普哈尔 (Empujar)

This is TaskRabbit’s contribution to the open source data engineering community. Empujar is an ETL tool which can be used to do a lot of moving around of data, including a creating and storing backups. Currently, Empujar has support for MySQL, Amazon Redshift, Elasticsearch and S3. Custom connectors can be created easily to include other databases or data sources. Although this tool has been around for a while, it is worth mentioning that the latest PR was not merged as the build was failing.

这是TaskRabbit对开源数据工程界的贡献。 Empujar是一种ETL工具,可用于处理大量数据,包括创建和存储备份。 目前,Empujar支持MySQL,Amazon Redshift,Elasticsearch和S3。 可以轻松创建自定义连接器,以包括其他数据库或数据源。 尽管此工具已经存在了一段时间,但值得一提的是,由于构建失败,因此未合并最新的PR。

荣誉奖 (Honorary Mentions)

  • GruntJS — This is a simple task runner that helps you automate your grunt work like making sure the code is formatted, linted, minified and so on.

    GruntJS —这是一个简单的任务运行器,可以帮助您自动化完成Grunt工作,例如确保代码格式化,齐整,缩小等。

  • BookshelfJS — An ORM built on Knex.js with transaction support, eager relational loading and support for 1:1, 1:n and n:n relations.

    BookshelfJS —在Knex.js上构建的ORM,具有事务支持,热切的关系加载和对1:1、1:n和n:n关系的支持。

  • ObjectionJS — An ORM for Node.js based on Knex.js which fully supports MySQL, MariaDB and PostgreSQL.

    ObjectionJS —基于Knex.js的Node.js的ORM,它完全支持MySQL,MariaDB和PostgreSQL。

  • Slonik — A Node.js based SQL client for PostgreSQL that promotes writing raw SQL and discourages ad-hoc dynamic generation of SQL.

    Slonik —基于PostgreSQL基于Node.jsSQL客户端,可促进编写原始SQL并阻止临时SQL动态生成。

There are many other projects on GitHub but I haven’t picked many of them because most of them are not up to date and haven’t seen a code checkin in a long time.

GitHub上还有许多其他项目,但是我没有选择很多项目,因为其中大多数不是最新的,并且很长一段时间都没有看到代码签入。

To conclude, we can say that there are a lot of JavaScript based, well maintained, open-source repositories to help with the day-to-day data engineerie stuff — generating SQL, interacting with databases, moving data around from one place to another, integrating data and visualising it too. With JavaScript being one of the promising languages of the future, it is worth investing in learning JavaScript as it will be more widely used further down the line.

总而言之,我们可以说有很多基于JavaScript的,维护良好的开源存储库可以帮助处理日常数据工程工作 -生成SQL,与数据库进行交互,将数据从一个地方移动到另一个地方,还可以集成数据并对其进行可视化。 随着JavaScript成为未来有前途的语言之一,值得学习JavaScript进行投资,因为它会在以后得到更广泛的使用。

普通英语JavaScript (JavaScript In Plain English)

Did you know that we have three publications and a YouTube channel? Find links to everything at plainenglish.io!

您知道我们有三个出版物和一个YouTube频道吗? 在plainenglish.io上找到所有内容的链接!

翻译自: https://medium.com/javascript-in-plain-english/javascript-for-data-engineers-ccce214e9aff

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值