python数据挖掘速成
This is not a comprehensive Python tutorial but instead is intended to highlight the parts of the language that will be most important to us (some of which are often not the focus of Python tutorials).
这不是一个全面的Python教程,而是旨在重点介绍对我们而言最重要的语言部分(其中某些部分通常不是Python教程的重点)。
面向数据科学家的Python编码基础 (Basics of Python Coding for Data Scientist)
1.空格格式化 (1. Whitespace Formatting)
Many languages use curly braces to delimit blocks of code. Python uses indentation:
许多语言使用花括号来分隔代码块。 Python使用缩进:
This makes Python code very readable, but it also means that you have to be very careful with your formatting. Whitespace is ignored inside parentheses and brackets, which can be helpful for long-winded computations:
这使Python代码具有很高的可读性,但这也意味着您必须格外小心。 括号和方括号内的空格将被忽略,这对于冗长的计算很有帮助:
and for making code easier to read:
为了使代码更易于阅读:
You can also use a backslash to indicate that a statement continues onto the next line, although we’ll rarely do this:
您也可以使用反斜杠表示语句继续到下一行,尽管我们很少这样做:
One consequence of whitespace formatting is that it can be hard to copy and paste code into the Python shell. For example, if you tried to paste the code:
空格格式的一个结果是,很难将代码复制和粘贴到Python shell中。 例如,如果您尝试粘贴代码:
into the ordinary Python shell, you would get a: IndentationError: expected an indented block because the interpreter thinks the blank line signals the end of the for loop’s block. IPython has a magic function %paste, which correctly pastes whatever is on your clipboard, whitespace and all. This alone is a good reason to use IPython.
进入普通的Python外壳程序,您将得到: IndentationError:期望缩进的块,因为解释器认为空行表示for循环的块结束。 IPython具有魔术功能%paste,可正确粘贴剪贴板,空白及所有内容。 仅此一个原因就是使用IPython的一个很好的理由。
2.模块 (2. Modules)
Certain features of Python are not loaded by default. These include both features included as part of the language as well as third-party features that you download yourself. In order to use these features, you’ll need to import the modules that contain them. One approach is to simply import the module itself:
Python的某些功能默认情况下未加载。 这些功能包括该语言的一部分功能以及您自己下载的第三方功能。 为了使用这些功能,您需要导入包含它们的模块。 一种方法是简单地导入模块本身:
Here re is the module containing functions and constants for working with regular expressions. After this type of import you can only access those functions by prefixing them with re..
re是包含用于正则表达式的函数和常量的模块。 进行这种类型的导入后,您只能通过为它们加上前缀re。来访问它们。
If you already had a different re in your code you could use an alias:
如果您的代码中已有其他资源,则可以使用别名:
You might also do this if your module has an unwieldy name or if you’re going to be typing it a lot. For example, when visualizing data with matplotlib, a standard convention is:
如果您的模块名称很笨拙,或者您将要多次键入它,则也可以执行此操作。 例如,当使用matplotlib可视化数据时,标准约定为:
If you need a few specific values from a module, you can import them explicitly and use them without qualification:
如果您需要模块中的一些特定值,则可以显式导入它们,并且无需限定即可使用它们:
You could import the entire contents of a module into your namespace, which might inadvertently overwrite variables you’ve already defined:
您可以将模块的全部内容导入名称空间,这可能会无意中覆盖您已经定义的变量:
3.功能 (3. Functions)
A function is a rule for taking zero or more inputs and returning a corresponding output. In Python, we typically define functions using def:
函数是获取零个或多个输入并返回相应输出的规则。 在Python中,我们通常使用def定义函数:
Python functions are first-class, which means that we can assign them to variables and pass them into functions just like any other arguments:
Python函数是一流的,这意味着我们可以将它们分配给变量,并将它们传递给函数,就像其他任何参数一样:
It is also easy to create short anonymous functions, or lambdas:
创建简短的匿名函数或lambda也很容易:
You can assign lambdas to variables, although most people will tell you that you should just use def instead:
您可以将lambda分配给变量,尽管大多数人会告诉您应该只使用def :
Function parameters can also be given default arguments, which only need to be specified when you want a value other than the default:
还可以为函数参数提供默认参数,仅当您需要默认值以外的其他值时才需要指定这些参数:
It is sometimes useful to specify arguments by name:
通过名称指定参数有时很有用:
4.琴弦 (4. Strings)
Strings can be delimited by single or double quotation marks (but the quotes have to match):
字符串可以用单引号或双引号引起来(但引号必须匹配):
Python uses backslashes to encode special characters. For example:
Python使用反斜杠编码特殊字符。 例如:
If you want backslashes as backslashes (which you might in Windows directory names or in regular expressions), you can create raw strings using r””:
如果要将反斜杠作为反斜杠(可能在Windows目录名或正则表达式中使用),则可以使用r””创建原始字符串:
You can create multiline strings using triple-[double-]-quotes:
您可以使用三引号[双]来创建多行字符串:
5.例外 (5. Exceptions)
When something goes wrong, Python raises an exception. Unhandled, these will cause your program to crash. You can handle them using try and except:
当出现问题时,Python会引发异常。 未经处理,这些将导致您的程序崩溃。 您可以使用try和except处理它们:
Although in many languages exceptions are considered bad, in Python there is no shame in using them to make your code cleaner, and we will occasionally do so.
尽管在许多语言中,异常都被认为是不好的,但是在Python中,使用它们来使您的代码更整洁没有什么可耻的,我们偶尔会这样做。
6.清单 (6. Lists)
Probably the most fundamental data structure in Python is the list. A list is simply an ordered collection. (It is similar to what in other languages might be called an array, but with some added functionality.)
列表中可能是Python中最基本的数据结构。 列表只是一个有序集合。 (它与其他语言中的称为数组相似,但具有一些附加功能。)
You can get or set the nth element of a list with square brackets:
您可以使用方括号获取或设置列表的第n个元素:
You can also use square brackets to “slice” lists:
您也可以使用方括号对列表进行“切片”:
Python has an in operator to check for list membership:
Python有一个in运算符来检查列表成员身份:
This check involves examining the elements of the list one at a time, which means that you probably shouldn’t use it unless you know your list is pretty small (or unless you don’t care how long the check takes). It is easy to concatenate lists together:
此检查涉及一次检查列表中的元素,这意味着除非您知道列表很小(或者除非您不在乎检查需要多长时间),否则您可能不应该使用它。 将列表连接在一起很容易:
If you don’t want to modify x you can use list addition:
如果您不想修改x,则可以使用列表添加:
More frequently we will append to lists one item at a time:
更常见的是,我们一次将一个项目附加到列表中:
It is often convenient to unpack lists if you know how many elements they contain:
如果您知道列表包含多少个元素,通常可以很方便地将列表解压缩:
although you will get a ValueError if you don’t have the same numbers of elements on both sides. It’s common to use an underscore for a value you’re going to throw away:
尽管如果双方的元素数都不相同,则会出现ValueError。 对于要丢弃的值,通常使用下划线:
翻译自: https://medium.com/analytics-vidhya/crash-course-in-python-for-data-science-part-1-40290de40bd4
python数据挖掘速成