程式设计 (Programming)
Stuck behind the paywall? Read this article with my friend link here.
卡在收费墙后面? 在这里与我的朋友链接阅读本文。
Did you know that in 2020 around 147 GB of data is generated per day? And, we have already stored around 40 trillion GB of data until now. All these stored data are not even the same. Data types like text or numbers have different formats. That explains why we have different types of data sources.
您是否知道在2020年每天产生约147 GB的数据? 并且,到目前为止,我们已经存储了约40万亿GB的数据。 所有这些存储的数据都不相同。 文本或数字等数据类型具有不同的格式。 这就解释了为什么我们有不同类型的数据源。
When you are working with data, you should know how to ingest the data from different sources. In this article, we are going to ingest data from various sources with the help of python libraries.
在处理数据时,您应该知道如何从不同来源提取数据。 在本文中,我们将借助python库从各种来源提取数据。
We will go through the below Data sources.
我们将审阅以下数据源。
1. RDBMS Database
1. RDBMS数据库
2. XML file format
2. XML文件格式
3. CSV file format
3. CSV文件格式
4. Apache Parquet file format
4. Apache Parquet文件格式
5. Microsoft Excel
5. Microsoft Excel
Do we have one python library which fetches data from all the sources?
我们是否有一个python库可以从所有来源获取数据?
Nope, because every data source has its own protocol for data transfer. We have multiple python library which does this job. Consider this article as a one-stop place to know about these python libraries.
不会,因为每个数据源都有自己的数据传输协议。 我们有多个python库可以完成这项工作。 将本文视为了解这些python库的一站式网站。
In this article, we explain why we save data in different sources and how we retrieve data using python library.
在本文中,我们解释了为什么我们将数据保存在不同的源中以及如何使用python库检索数据。
Let’s start with our data fetching story.
让我们从我们的数据获取故事开始。