ETL定义

ETL是数据仓库建设的核心过程,包括数据抽取、转换和加载。它将来自不同源头的原始数据整合到单一存储库中,进行预处理以便分析。抽取阶段从多个源获取数据,如CRM或ERP系统、IoT设备等;转换阶段则对数据进行标准化、清洗和格式调整;加载阶段将处理后的数据安全地存储并供其他用户和部门使用,确保企业拥有统一、一致的数据来源。
摘要由CSDN通过智能技术生成

ETL的定义,转自snowflake

ETL stands for “extract, transform, load,” the three processes that, in combination, move data from one database, multiple databases, or other sources to a unified repository—typically a data warehouse. It enables data analysis to provide actionable business information, effectively preparing data for analysis and business intelligence processes.

As data engineers are experts at making data ready for consumption by working with multiple systems and tools, data engineering encompasses ETL. Data engineering involves ingesting, transforming, delivering, and sharing data for analysis. These fundamental tasks are completed via data pipelines that automate the process in a repeatable way. A data pipeline is a set of data-processing elements that move data from source to destination, and often from one format (raw) to another (analytics-ready).

PURPOSE
ETL allows businesses to consolidate data from multiple databases and other sources into a single repository with data that has been properly formatted and qualified in preparation for analysis. This unified data repository allows for simplified access for analysis and additional processing. It also provides a single source of truth, ensuring that all enterprise data is consistent and up-to-date.

PROCESSES
There are three unique processes in extract, transform, load. These are:

Extraction, in which raw data is pulled from a source or multiple sources. Data could come from transactional applications, such as customer relationship management (CRM) data from Salesforce or enterprise resource planning (ERP) data from SAP, or Internet of Things (IoT) sensors that gather readings from a production line or factory floor operation, for example. To create a data warehouse, extraction typically involves combining data from these various sources into a single data set and then validating the data with invalid data flagged or removed. Extracted data may be several formats, such as relational databases, XML, JSON, and others.

Transformation, in which data is updated to match the needs of an organization and the requirements of its data storage solution. Transformation can involve standardizing (converting all data types to the same format), cleansing (resolving inconsistencies and inaccuracies), mapping (combining data elements from two or more data models), augmenting (pulling in data from other sources), and others. During this process, rules and functions are applied, and data cleansed to prevent including bad or non-matching data to the destination repository. Rules that could be applied include loading only specific columns, deduplicating, and merging, among others.

Loading, in which data is delivered and secured for sharing, making business-ready data available to other users and departments, both within the organization and externally. This process may include overwriting the destination’s existing data.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值