数据仓库的基本架构是什么_什么是数据仓库的基本架构

数据仓库的基本架构是什么

A Data Warehouse is a component where your data is centralized, organized, and structured according to your organization's needs. It is used for data analysis and BI processes.

数据仓库是一个组件,可以根据组织的需要对数据进行集中,组织和结构化。 它用于数据分析和BI流程。

Data warehouses are not a new concept. In fact, the concept was developed in the late 1980s. But, it evolved over time.

数据仓库不是一个新概念。 实际上,该概念是在1980年代后期提出的 。 但是,它随着时间的流逝而发展。

The aim of this post is to explain the main concepts related to Data Warehouses and their use cases. Also, we’ll talk about Data Lakes and how these two components work together.

这篇文章的目的是解释与数据仓库及其用例有关主要概念 。 此外,我们还将讨论数据湖以及这两个组件如何协同工作。

TL;DR — This post comprises basic information about data lakes and data warehouses. So, if you are familiar with these topics and their basic architecture, this post may not be for you. If that is not your case, please go ahead an enjoy the reading.

TL; DR —此帖子包含有关数据湖和数据仓库的基本信息。 因此,如果您熟悉这些主题及其基本体系结构,则此职位可能不适合您。 如果不是您的情况,请继续阅读。

为什么需要数据仓库? (Why do you need a Data Warehouse?)

In the beginning, there was chaos. At least this is my point of view when I arrived at an organization that was doing data analysis using old spreadsheets and a bunch of CSV files. No one didn’t know where the files would come from. They were just…there.

一开始,情况很混乱。 至少当我到达一个使用旧电子表格和一堆CSV文件进行数据分析的组织时,这就是我的观点 没有人不知道文件从哪里来 。 他们只是……那里。

Inconsistent metrics, unreproducible processes, and a bunch of manual — copy/paste — work was common at that time.

当时,度量标准不一致,流程不可重复以及一堆手动操作(复制/粘贴)很普遍。

No one even knew what was the real value of the metrics they were tracking. For example, for a metric like Monthly Active Users (MAU) the answer would always depend on who you asked.

没人知道他们正在追踪的指标的真正价值是什么。 例如,对于像月度活跃用户(MAU)这样的指标,答案将始终取决于您询问的人。

If you are still with me and this rings a bell, you may know it is important to have a single source of truth. Mainly, because you don’t want to have a lot of business users making decisions based on inconsistent metrics.

如果您仍然与我同在,并且这敲响了钟声,那么您可能知道拥有唯一的真理来源很重要。 主要是因为您不想让许多业务用户基于不一致的指标来做出决策。

Also, you don’t want your data engineers/analyst doing a bunch of manual work that can be automated. Certainly, they can do more interesting stuff than copy/paste spreadsheets.

另外,您也不希望数据工程师/分析师进行大量可以自动化的手动工作。 当然,与复制/粘贴电子表格相比,他们可以做更多有趣的事情。

If this is a problem your organization is facing in a daily manner, you may need a Data Warehouse.

如果您的组织每天都遇到此问题,则可能需要数据仓库。

So, let me now define what is a Data Warehouse…

现在,让我定义什么是数据仓库...

A Data Warehouse is a component where your data is centralized, or

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值