​【Python基础】介绍一本零基础入门Python数据分析的书

  作者:Lemon

来源:Python数据之道


介绍一本零基础入门

Python数据分析的书

有不少读者在询问如何入门Python、如何入门Python数据分析,之前跟大家分享了两本零基础入门Python的书籍:《Python编程从入门到实践》《笨办法学Python3》

今天来给大家分享一本Python数据分析入门的书籍,书名是《Python数据分析基础》。

本书面向的读者

按照书中的介绍:

本书面向的读者主要是那些经常使用电子表格软件进行数据处理,但从未写过一行代码的人。

Lemon 觉得这个范畴还是很清晰的,当然,Lemon 阅读完这本书之后,觉得范围也可以适当的拓宽些,包括经常处理电子表格的、需要定期生成报表的、有一定的数据可视化需求的、以及需要对数据进行统计性描述的,都可以参考下本书的部分内容。

对于这点,Lemon 也是深有感触。前些年,Lemon 还没有开始学 Python ,当时管理许多的项目,用 excel 处理数据感觉有些力不从心,于是开始寻找解决方案,找了一圈,最后用微软的 access 用 sql 语句撸了一个小型的项目管理工具。现在看来,如果当时会 Python,会要方便很多。

书中的Python环境

书中的代码使用的Python版本是 Python 3.5版,本书作者是在 windows 平台对代码进行测试的, 如果你的电脑上的 Python 版本是 3.5 以及之上的,应该是可以运行的。

由于 Python 是跨平台的,因此,windows、Linux 以及 MacOS 系统应该都没问题。

对用 Python 的安装,书中推荐安装 Anaconda , Lemon 一般也是直接安装 Anaconda,主要是比较省事。

关于 Python 环境的安装,这个还是需要自己去弄,不同的电脑环境,有时候会出现一些小小的问题,这个主要还是需要自己通过搜索来解决。

书中的代码,原书作者已经发布在其 Github 上,大家可以免费去获取,地址如下:

https://github.com/cbrownley/foundations-for-analytics-with-python

数据文件处理

书中花了比较大的篇幅来讲解关于数据文件的处理,主要包括 csv文件、excel文件以及 数据库。

对于数据分析入门而言,可能遇到比较多的文件类型还是 excel 和 csv ,所以书中对这两类介绍也是比较多。

书中介绍了用纯 Python、内置模块、第三方库等途径来操作这两类文件,涉及到 内置 csv模块、xlrd、xlwt 等。对于需要读取数据,尤其是需要写入数据,用这些工具还是挺方便的。

如果只是读取数据,然后进行数据清洗等处理,Lemon 一般喜欢用 Pandas 。

此外,对于数据库,书中也介绍了 内置的 sqlite3 模块,以及 以及流行的数据库 MySQL 。

数据可视化

主要涉及四个可视化库,分别是 MatplotlibPandasggplotseaborn

除了 ggplot ,其余三个, Lemon 也是经常使用的。

下面分享一个随书的案例:

运行后效果如下:

关于 Matplotlib,前不久 Lemon 整理了一份 100个项目入门 Matplotlib 的内容,有兴趣的同学可以前往查看:

100个案例,Matplotlib从入门到大神

自动运行脚本

Lemon 觉得还有一个内容估计大家是感兴趣的,那就是使用 Python 来按计划自动运行脚本,由于是在商业环境中,有一些需要定期上报的内容,比如日报、周报、月报、季报、年报等(特别烦!!!)。

前不久,阿里取消周报,都能上热搜,可见这些报表都么重要,又是多么招人厌!

如果你用 Python 来自动运行一些格式高度一致的内容,那自然会省事很多。别人在埋头苦干的时候,估计你就有功夫摸鱼啦。

小结

在本书的封面,给本书的定位是“零基础经验也可学会用最火的Python语言进行数据分析”,如果你对使用Python来进行数据分析有兴趣,本书可以作为初级参考书来使用。

如果你觉得这本书不错,可以去购买一本,仔细阅读下。


往期精彩回顾



适合初学者入门人工智能的路线及资料下载机器学习及深度学习笔记等资料打印机器学习在线手册深度学习笔记专辑《统计学习方法》的代码复现专辑
AI基础下载机器学习的数学基础专辑
获取本站知识星球优惠券,复制链接直接打开:
https://t.zsxq.com/qFiUFMV
本站qq群704220115。

加入微信群请扫码:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Foundations for Analytics with Python by Clinton W. Brownley Copyright © 2016 Clinton Brownley. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. Overview of Chapters Chapter 1, Python Basics We’ll begin by exploring how to create and run a Python script. This chapter focuses on basic Python syntax and the elements of Python that you need to know for later chapters in the book. For example, we’ll discuss basic data types such as numbers and strings and how you can manipulate them. We’ll also cover Preface | xvii the main data containers (i.e., lists, tuples, and dictionaries) and how you use them to store and manipulate your data, as well as how to deal with dates, as dates often appear in business analysis. This chapter also discusses programming concepts such as control flow, functions, and exceptions, as these are important elements for including business logic in your code and gracefully handling errors. Finally, the chapter explains how to get your computer to read a text file, read multiple text files, and write to a CSV-formatted output file. These are important techniques for accessing input data and retaining specific output data that I expand on in later chapters in the book. Chapter 2, Comma-Separated Values (CSV) Files This chapter covers how to read and write CSV files. The chapter starts with an example of parsing a CSV input file “by hand,” without Python’s built-in csv module. It transitions to an illustration of potential problems with this method of parsing and then presents an example of how to avoid these potential problems by parsing a CSV file with Python’s csv module. Next, the chapter discusses how to use three different types of conditional logic to filter for specific rows from the input file and write them to a CSV output file. Then the chapter presents two dif‐ ferent ways to filter for specific columns and write them to the output file. After covering how to read and parse a single CSV input file, we’ll move on to discus‐ sing how to read and process multiple CSV files. The examples in this section include presenting summary information about each of the input files, concate‐ nating data from the input files, and calculating basic statistics for each of the input files. The chapter ends with a couple of examples of less common proce‐ dures, including selecting a set of contiguous rows and adding a header row to the dataset. Chapter 3, Excel Files Next, we’ll cover how to read Excel workbooks with a downloadable, add-in module called xlrd. This chapter starts with an example of introspecting an Excel workbook (i.e., presenting how many worksheets the workbook contains, the names of the worksheets, and the number of rows and columns in each of the worksheets). Because Excel stores dates as numbers, the next section illustrates how to use a set of functions to format dates so they appear as dates instead of as numbers. Next, the chapter discusses how to use three different types of condi‐ tional logic to filter for specific rows from a single worksheet and write them to a CSV output file. Then the chapter presents two different ways to filter for specific columns and write them to the output file. After covering how to read and parse a single worksheet, the chapter moves on to discuss how to read and process all worksheets in a workbook and a subset of worksheets in a workbook. The exam‐ ples in these sections show how to filter for specific rows and columns in the worksheets. After discussing how to read and parse any number of worksheets in a single workbook, the chapter moves on to review how to read and process mul‐tiple workbooks. The examples in this section include presenting summary infor‐ mation about each of the workbooks, concatenating data from the workbooks, and calculating basic statistics for each of the workbooks. The chapter ends with a couple of examples of less common procedures, including selecting a set of contiguous rows and adding a header row to the dataset. Chapter 4, Databases Here, we’ll cover how to carry out basic database operations in Python. The chapter starts with examples that use Python’s built-in sqlite3 module so that you don’t have to install any additional software. The examples illustrate how to carry out some of the most common database operations, including creating a database and table, loading data in a CSV input file into a database table, updat‐ ing records in a table using a CSV input file, and querying a table. When you use the sqlite3 module, the database connection details are slightly different from the ones you would use to connect to other database systems like MySQL, Post‐ greSQL, and Oracle. To show this difference, the second half of the chapter dem‐ onstrates how to interact with a MySQL database system. If you don’t already have MySQL on your computer, the first step is to download and install MySQL. From there, the examples mirror the sqlite3 examples, including creating a database and table, loading data in a CSV input file into a database table, updat‐ ing records in a table using a CSV input file, querying a table, and writing query results to a CSV output file. Together, the examples in the two halves of this chapter provide a solid foundation for carrying out common database operations in Python. Chapter 5, Applications This chapter contains three examples that demonstrate how to combine techni‐ ques presented in earlier chapters to tackle three different problems that are rep‐ resentative of some common data processing and analysis tasks. The first application covers how to find specific records in a large collection of Excel and CSV files. As you can imagine, it’s a lot more efficient and fun to have a computer search for the records you need than it is to search for them yourself. Opening, searching in, and closing dozens of files isn’t fun, and the task becomes more and more challenging as the number of files increases. Because the problem involves searching through CSV and Excel files, this example utilizes a lot of the material covered in Chapters 2 and 3. The second application covers how to group or “bin” data into unique categories and calculate statistics for each of the categories. The specific example is parsing a CSV file of customer service package purchases that shows when customers paid for particular service packages (i.e., Bronze, Silver, or Gold), organizing the data into unique customer names and packages, and adding up the amount of time each customer spent in each package. The example uses two building blocks, creating a function and storing data in a dictionary, which are introduced in Chapter 1 but aren’t used in Chapters 2, 3, and 4. It also introduces another new technique: keeping track of the previous row you processed and the row you’re currently processing, in order to calculate a statistic based on values in the two rows. These two techniques—grouping or binning data with a dictionary and keeping track of the current row and the previous row—are very powerful capabilities that enable you to handle many common analysis tasks that involve events over time. The third application covers how to parse a text file, group or bin data into cate‐ gories, and calculate statistics for the categories. The specific example is parsing a MySQL error log file, organizing the data into unique dates and error messages, and counting the number of times each error message appeared on each date. The example reviews how to parse a text file, a technique that briefly appears in Chapter 1. The example also shows how to store information separately in both a list and a dictionary in order to create the header row and the data rows for the output file. This is a reminder that you can parse text files with basic string oper‐ ations and another good example of how to use a nested dictionary to group or bin data into unique categories. Chapter 6, Figures and Plots In this chapter, you’ll learn how to create common statistical graphs and plots in Python with four plotting libraries: matplotlib, pandas, ggplot, and seaborn. The chapter begins with matplotlib because it’s a long-standing package with lots of documentation (in fact, pandas and seaborn are built on top of matplot lib). The matplotlib section illustrates how to create histograms and bar, line, scatter, and box plots. The pandas section discusses some of the ways pandas simplifies the syntax you need to create these plots and illustrates how to create them with pandas. The ggplot section notes the library’s historical relationship with R and the Grammar of Graphics and illustrates how to use ggplot to build some common statistical plots. Finally, the seaborn section discusses how to cre‐ ate standard statistical plots as well as plots that would be more cumbersome to code in matplotlib. Chapter 7, Descriptive Statistics and Modeling Here, we’ll look at how to produce standard summary statistics and estimate regression and classification models with the pandas and statsmodels packages. pandas has functions for calculating measures of central tendency (e.g., mean, median, and mode), as well as for calculating dispersion (e.g., variance and stan‐ dard deviation). It also has functions for grouping data, which makes it easy to calculate these statistics for different groups of data. The statsmodels package has functions for estimating many types of regression and classification models. The chapter illustrates how to build multivariate linear regression and logistic classification models based on data in pandas DataFrames and then use the mod‐ els to predict output values for new input data. Chapter 8, Scheduling Scripts to Run Automatically This chapter covers how to schedule your scripts to run automatically on a rou‐ tine basis on both Windows and macOS. Until this chapter, we ran the scripts manually on the command line. Running a script manually on the command line is convenient when you’re debugging the script or running it on an ad hoc basis. However, it can be a nuisance if your script needs to run on a routine basis (e.g., daily, weekly, monthly, or quarterly), or if you need to run lots of scripts on a routine basis. On Windows, you create scheduled tasks to run scripts automati‐ cally on a routine basis. On macOS, you create cron jobs, which perform the same actions. This chapter includes several screenshots to show you how to cre‐ ate and run scheduled tasks and cron jobs. By scheduling your scripts to run on a routine basis, you don’t ever forget to run a script and you can scale beyond what’s possible when you’re running scripts manually on the command line. Chapter 9, Where to Go from Here The final chapter covers some additional built-in and add-in Python modules and functions that are important for data processing and analysis tasks, as well as some additional data structures that will enable you to efficiently handle a variety of complex programming problems you may run into as you move beyond the topics covered in this book. Built-ins are bundled into the Python installation, so they are immediately available to you when you install Python. The built-in mod‐ ules discussed in this chapter include collections, random, statistics, iter tools, and operator. The built-in functions include enumerate, filter, reduce, and zip. Add-in modules don’t come with the Python installation, so you have to download and install them separately. The add-in modules discussed in this chapter include NumPy, SciPy, and Scikit-Learn. We also take a look at some additional data structures that can help you store, process, or analyze your data more quickly and efficiently, such as stacks, queues, trees, and graphs.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值