python面向过程_面向社会科学家的Python

最新推荐文章于 2023-03-11 16:15:00 发布

cumei1658

最新推荐文章于 2023-03-11 16:15:00 发布

阅读量301

点赞数

文章标签： python java 编程语言大数据人工智能

原文链接：https://www.pybloggers.com/2016/03/python-for-social-scientists/

版权

python面向过程

This is a guest blog post by Nick Eubank, a Ph.D. Candidate in Political Economy at the Stanford Graduate School of Business

这是博士学位的尼克·尤班克（ Nick Eubank ）的客座博客文章。斯坦福大学商学院政治经济学候选人

Python is an increasingly popular tool for data analysis in the social scientists. Empowered by a number of libraries that have reached maturity, R and Stata users are increasingly moving to Python in order to take advantage of the beauty, flexibility, and performance of Python without sacrificing the functionality these older programs have accumulated over the years.

Python是社会科学家中越来越流行的数据分析工具。在许多成熟的库的支持下，R和Stata用户越来越多地转向Python，以便在不牺牲这些旧程序多年来积累的功能的情况下利用Python的美观，灵活性和性能。

But while Python has much to offer, existing Python resources are not always well-suited to the needs of social scientists. With that in mind, I’ve recently created a new resource — www.pythonforsocialscientists.org (PSS) — tailored specifically to the goals and desires of the social scientist python user.

但是，尽管Python提供了许多功能，但是现有的Python资源并不总是很适合社会科学家的需求。考虑到这一点，我最近创建了一个新资源-www.pythonforsocialscientists.org （PSS）-专为社会科学家python用户的目标和需求量身定制。

The site is not a new set of tutorials, however — there are more than enough Python tutorials in the world. Rather, the aim of the site is to curate and annotate existing resources, and to provide users guidance on what topics to focus on and which to skip.

该站点不是一组新的教程-世界上有足够多的Python教程。而是，该站点的目的是整理和注释现有资源，并向用户提供有关要关注的主题和要跳过的主题的指南。

为什么要设立社会科学家网站？ (Why a Site for Social Scientists?)

Social scientists – and indeed, most data scientists – spend most of their time trying to wrestle individual, idiosyncratic datasets into the shape needed to run statistical analyses. This makes the way most social scientists use Python fundamentally different from how it is used by most software developers. Social scientists are primarily interested in writing relatively simple programs (scripts) that execute a series of commands (recoding variables, merging datasets, parsing text documents, etc.) to wrangle their data into a form they can analyze. And because they are usually writing their scripts for a specific, idiosyncratic application and set of data, they are generally not focused on writing code with lots of abstractions.

社会科学家-的确是大多数数据科学家-花费大量时间试图将个别的，特殊的数据集转化为进行统计分析所需的形状。这使得大多数社会科学家使用Python的方式与大多数软件开发人员的使用方式根本不同。社会科学家主要对编写相对简单的程序（脚本）感兴趣，该程序执行一系列命令（重新编码变量，合并数据集，解析文本文档等），将数据整理成可以分析的形式。而且由于他们通常是为特定的，特殊的应用程序和数据集编写脚本，因此他们通常不专注于编写具有大量抽象的代码。

Social scientists, in other words, tend to be primarily interested in learning to use existing tools effectively, not develop new ones.

换句话说，社会科学家通常对学习有效使用现有工具而不是开发新工具感兴趣。

Because of this, social scientists learning Python tend to have different priorities in terms of skill development than software developers. Yet most tutorials online were written for developers or computer science students, so one of the aims of PSS is to provide social scientists with some guidance on the skills they should prioritize in their early training. In particular, PSS suggests:

因此，学习Python的社会科学家往往在技能开发方面与软件开发人员具有不同的优先级。然而，大多数在线教程都是为开发人员或计算机科学专业的学生编写的，因此PSS的目的之一是为社会科学家提供有关他们在早期培训中应优先考虑的技能的一些指导。 PSS特别建议：

Need immediately:

立即需要：

Data types: integers, floats, strings, booleans, lists, dictionaries, and sets (tuples are kinda optional)
Defining functions
Writing loops
Understanding mutable versus immutable data types
Methods for manipulating strings
Importing third party modules
Reading and interpreting errors

数据类型：整数，浮点数，字符串，布尔值，列表，字典和集合（元组是可选的）
定义功能
编写循环
了解可变与不可变数据类型
字符串处理方法
导入第三方模块
阅读和解释错误

Things you’ll want to know at some point, but not necessary immediately:

在某些时候您想知道的事情，但不是立即需要的：

Advanced debugging utilities (like pdb)
File input / output (most libraries you’ll use have tools to simplify this for you)

高级调试实用程序（如pdb）
文件输入/输出（您将使用的大多数库都具有可为您简化此过程的工具）

Don’t need:

不需要：

Defining or writing classes
Understanding Exceptions

定义或编写课程
了解异常

大熊猫 (Pandas)

Today, most empirical social science remains organized around tabular data, meaning data that is presented with a different variable in each column and a different observation in each row. As a result, many social scientists using Python are a little confused when they don’t find a tabular data structure covered in their intro to Python tutorial. To address this confusion, PSS does its best to introduce users to the pandas library as fast as possible, providing links to tutorials and a few tips on gotchas to watch out for.

时至今日，大多数经验主义社会科学仍然围绕表格数据进行组织，这意味着在每一列中呈现出不同变量而在每一行呈现出不同观察结果的数据。结果，许多使用Python的社会科学家在找不到Python入门指南中介绍的表格数据结构时有些困惑。为了解决这种混乱，PSS尽其所能将用户尽快地引入pandas库，并提供了教程链接和一些需要注意的小窍门。

The pandas library replicates much of the functionality that social scientists are used to finding in Stata or R — data can be represented in a tabular format, column variables can be easily labeled, and columns of different types (like floats and strings) can be combined in the same dataset.

pandas库复制了社会科学家用来在Stata或R中发现的许多功能-数据可以表格格式表示，可以轻松标记列变量，并且可以组合不同类型的列（例如浮点数和字符串）在同一数据集中。

pandas is also the gateway to many other tools social scientists are likely to use, like graphing libraries (seaborn and ggplot2) and the statsmodels econometrics library.

熊猫还是社会科学家可能使用的许多其他工具的门户，例如图形库（ seaborn和ggplot2 ）和statsmodels计量经济学库。

按研究领域划分的其他图书馆 (Other Libraries by Research Area)

While all social scientists who wish to work with Python will need to understand the core language and most will want to be familiar with pandas, the Python eco-system is full of application-specific libraries that will only be of use to a subset of users. With that in mind, PSS provides an overview of libraries to help researchers working in different topic areas, along with links to materials on optimal use, and guidance on relevant considerations:

虽然所有希望使用Python的社会科学家都需要了解核心语言，并且大多数人都希望熟悉pandas ，但是Python生态系统充满了特定于应用程序的库，仅对部分用户有用。。考虑到这一点，PSS概述了图书馆，以帮助从事不同主题领域的研究人员，并提供最佳使用材料的链接以及有关注意事项的指南：

Network Analysis: iGraph
Text Analysis: NLTK, and if needed coreNLP
Econometrics: statsmodels
Graphing: ggplot and seaborn
Big Data: dask and pyspark
Geo-Spatial Analysis: arcpy or geopandas
Making code faster: %prun in iPython (for profiling) and numba (for JIT compilation)

网络分析：iGraph
文本分析：NLTK，如果需要，请输入coreNLP
计量经济学：统计模型
图形：ggplot和seaborn
大数据：dask和pyspark
地理空间分析：Arcpy或Geopandas
使代码更快：iPython中的%prun （用于分析）和numba（用于JIT编译）

想参与其中吗？ (Want to Get Involved?)

翻译自: https://www.pybloggers.com/2016/03/python-for-social-scientists/

python面向过程

cumei1658

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python面向过程_面向社会科学家的Python

python面向过程This is a guest blog post by Nick Eubank, a Ph.D. Candidate in Political Economy at the Stanford Graduate School of Business 这是博士学位的尼克·尤班克（ Nick Eubank ）的客座博客文章。斯坦福大学商学院政治经济学候选人 Pyt...
复制链接

扫一扫