Python学习（一）

最新推荐文章于 2024-09-01 00:00:00 发布

cuijhon

最新推荐文章于 2024-09-01 00:00:00 发布

阅读量321

点赞数

分类专栏： CodeNote 文章标签： python

本文链接：https://blog.csdn.net/cuijhon/article/details/54864361

版权

CodeNote 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

#!/usr/bin/env python3 # -*- coding: utf-8 -*-

#学习自廖雪峰老师的博客及慕课网python课程：

#廖雪峰的官方网站： http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000

#慕课网链接：http://www.imooc.com/course/list?c=python

- 字符串和编码

u'中文'

如果中文字符串在Python环境下遇到 UnicodeDecodeError，这是因为.py文件保存的格式有问题。可以在第一行添加注释

# -*- coding: utf-8 -*-

- 占位符

常见的占位符有：

%s	字符串
%d	整数
%f	浮点数
%x	十六进制整数

- 使用list、tuple、dict和set

- List ['value']

1.list.append() #list增加属性

2.list.insert(索引数，‘增加的字段’)

3.pop()方法总是删掉list的最后一个元素，并且它还返回这个元素，所以我们执行 L.pop() 后，会打印出 'Paul'。。由于Paul的索引是2，因此，用 pop(2)把Paul删掉

4.切片 L[0:3:1] 表示，从索引0开始取，直到索引3为止，中间间隔为0，但不包括索引3。即索引0，1，2，正好是3个元素。——>可以针对str类型截取字符串

- tuple（元组） ('value') 写死的list

- dict字典 {key:value,}

len() 函数可以计算任意集合的大小

更新dict

d['Paul'] = 72

迭代dict的key和value

for key, value in d.items():

- set (['value'])

 
 
  
  应用场景    
 
 
 
 
  
      weekdays = set(['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN']) 
 
 
 
 
  
  更新set    
 
 
 
 
  
      add()

- 1.切片： L[0:3]表示，从索引0开始取，直到索引3为止，但不包括索引3

    
    
     
     
      
      
     
     
     
     
     
           
      
       
       >>> 
       
       d = {
       
       'a'
       
       : 
       
       1
       
       , 
       
       'b'
       
       : 
       
       2
       
       , 
       
       'c'
       
       : 
       
       3
       
       }
      
      

      
      
       
       >>> 
       
       for
       
        key 
       
       in
       
        d:
      
      

      
      
       
       ... 
       
           print(key)
      
      
     
     判断可迭代
     
            
       
        
        >>> 
        
        from
        
         collections 
        
        import
        
         Iterable
       
       

       
       
        
        >>> 
        
        isinstance(
        
        'abc'
        
        , Iterable) 
        
        # str是否可迭代
        
        True
        
        >>> 
        
        isinstance([
        
        1
        
        ,
        
        2
        
        ,
        
        3
        
        ], Iterable) 
        
        # list是否可迭代
        
        True
        
        >>> 
        
        isinstance(
        
        123
        
        , Iterable) 
        
        # 整数是否可迭代
        
        False

- 3.列表生成式

    
    
     
     
     
     
     
     >>> [x * x for x in range(1, 11)]
     
     [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

写列表生成式时，把要生成的元素x * x放到前面，后面跟for循环，就可以把list创建出来，十分有用，多写几次，很快就可以熟悉这种语法。

for循环后面还可以加上if判断，这样我们就可以筛选出仅偶数的平方：

函数

filter() 函数filter()根据判断结果自动过滤掉不符合条件的元素，返回由符合条件元素组成的新list。

sorted() 函数可对list进行排序

len() 统计字符

list.append('') 追加元素到末尾

list.pop(i) 删除i索引位置的元素

range() 函数 range(5)生成的序列是从0开始小于5

list()函数可以转换为list

add(key)方法可以添加元素到set中

remove(key)方法可以删除元素

isinstance() 可以判断一个变量的类型

- get & set

@property---这是关键字，固定格式，能让方法当“属性”用。
@score.setter---前面的"score"是@property紧跟的下面定义的那个方法的名字，"setter"是关键字，这种“@+方法名字+点+setter”是个固定格式与@property搭配使用。

- 模块

1.datetime   Python处理日期和时间的标准库。  如果仅导入import datetime，则必须引用全名datetime.datetime。

now = datetime.now() # 获取当前datetime

    
    
     
     
     
     
      
      
      
      dt = datetime(2015, 4, 19, 12, 20) # 用指定日期时间创建datetime
     
     
    
    

    
    
     
     
     
     
      
      
      
      dt.timestamp() # 把datetime转换为timestamp

把str转换为datetime。转换方法是通过datetime.strptime()实现

    
    
     
     
     
     
      
      
      
      >>> from datetime import datetime, timedelta
     
     
    
    

    
    
     
     
     
     
      
      
      
      >>> now = datetime.now()
     
     
    
    

    
    
     
     
     
     
      
      
      
      >>> now
     
     
    
    
    
    
     
     
     
     
      
      
      
      datetime.datetime(2015, 5, 18, 16, 57, 3, 540997)
     
     
    
    

    
    
     
     
     
     
      
      
      
      >>> now + timedelta(hours=10)
     
     
    
    
    
    
     
     
     
     
      
      
      
      datetime.datetime(2015, 5, 19, 2, 57, 3, 540997)
     
     
    
    

    
    
     
     
     
     
      
      
      
      >>> now - timedelta(days=1)
     
     
    
    
    
    
     
     
     
     
      
      
      
      datetime.datetime(2015, 5, 17, 16, 57, 3, 540997)
     
     
    
    

    
    
     
     
     
     
      
      
      
      >>> now + timedelta(days=2, hours=12)
     
     
    
    
    
    
     
     
     
     
      
      
      
      datetime.datetime(2015, 5, 21, 4, 57, 3, 540997)

2. collections

namedtuple

deque

defaultdict

OrderedDict

Counter

3. hashlib

摘要算法MD5计算出一个字符串的MD5值：

     
     
      
      
      
      
       
       
        
        import hashlib
       
       
      
      
     
     

     
     
      
      
      
      
       
       
        
        md5 = hashlib.md5()
       
       
      
      
     
     
     
     
      
      
      
      
       
       
        
        md5.update('how to use md5 in python hashlib?'.encode('utf-8'))
       
       
      
      
     
     
     
     
      
      
      
      
       
       
        
        print(md5.hexdigest())

4. itertools

- 面向对象

举个例子，Python的网络服务器有TCPServer、UDPServer、UnixStreamServer、UnixDatagramServer，而服务器运行模式有多进程ForkingMixin 和多线程ThreadingMixin两种。

要创建多进程模式的 TCPServer：

class MyTCPServer(TCPServer, ForkingMixin)
    pass

要创建多线程模式的 UDPServer：

class MyUDPServer(UDPServer, ThreadingMixin):
    pass

如果没有多重继承，要实现上述所有可能的组合需要 4x2=8 个子类。

- 装饰器

@property,可以将python定义的函数“当做”属性访问，从而提供更加友好访问方式

- 错误、调试和测试

断言

凡是用print()来辅助查看的地方，都可以用断言（assert）来替代：

try:

print ( 'try...' )

r = 10 / 0

print ( 'result:' , r)

except

ZeroDivisionError as e:

print ( 'except:' , e)

finally :

print ( 'finally...' )

print ( 'END' )

IndexError: list index out of range

索引越界

    
    
     
     File "index.py", line 11, in <module>
    
    
    
    
     
         print d['Paul']
    
    
    
    
    
    
     
     KeyError: 'Paul'

字典key不存在

解决：dict本身提供的一个 get 方法，在Key不存在的时候，返回None

py->二级制文件（对语法进行检查）->

try-except e , continue

随机数

import random

number= random.randint(0, 100)

- IO编程

但是每次都这么写实在太繁琐，所以，Python引入了 with 语句来自动帮我们调用 close() 方法：

     
     
      
      with
      
       open(
      
      '/path/to/file'
      
      , 
      
      'r'
      
      ) 
      
      as
      
       f:
     
     

     
     
      
          print(f.read())

- 进程和线程【协程】

- 正则表达式

在正则表达式中，如果直接给出字符，就是精确匹配。用\d可以匹配一个数字，\w可以匹配一个字母或数字，所以：

'00\d'可以匹配'007'，但无法匹配'00A'；
'\d\d\d'可以匹配'010'；
'\w\w\d'可以匹配'py3'；

.可以匹配任意字符，所以：

'py.'可以匹配'pyc'、'pyo'、'py!'等等。

要匹配变长的字符，在正则表达式中，用*表示任意个字符（包括0个），用+表示至少一个字符，用?表示0个或1个字符，用{n}表示n个字符，用{n,m}表示n-m个字符：

来看一个复杂的例子：\d{3}\s+\d{3,8}。

我们来从左到右解读一下：

\d{3}表示匹配3个数字，例如'010'；
\s可以匹配一个空格（也包括Tab等空白符），所以\s+表示至少有一个空格，例如匹配' '，' '等；
\d{3,8}表示3-8个数字，例如'1234567'。

因此我们强烈建议使用Python的r前缀，就不用考虑转义的问题了：

s = r'ABC\-001'# Python的字符串# 对应的正则表达式字符串不变：# 'ABC\-001'

先看看如何判断正则表达式是否匹配：

     
     
      
      >>> 
      
      import
      
       re
     
     

     
     
      
      >>> 
      
      re.match(
      
      r'^\d{3}\-\d{3,8}$'
      
      , 
      
      '010-12345'
      
      )
     
     
     
     
      
      <_sre.SRE_Match object; span=(
      
      0
      
      , 
      
      9
      
      ), match=
      
      '010-12345'
      
      >
     
     

     
     
      
      >>> 
      
      re.match(
      
      r'^\d{3}\-\d{3,8}$'
      
      , 
      
      '010 12345'
      
      )
     
     
     
     
      
      >>>

match()方法判断是否匹配，如果匹配成功，返回一个Match对象，否则返回None。常见的判断方法就是：

- 图形界面

- Json

- 网络编程

- 访问数据库

- Mysql

- 框架

- BeautifulSoup

Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码

解析器	使用方法	优势	劣势
Python标准库	BeautifulSoup(markup, “html.parser”)	Python的内置标准库执行速度适中文档容错能力强	Python 2.7.3 or 3.2.2)前的版本中文档容错能力差
lxml HTML 解析器	BeautifulSoup(markup, “lxml”)	速度快文档容错能力强	需要安装C语言库
lxml XML 解析器	BeautifulSoup(markup, [“lxml”, “xml”])BeautifulSoup(markup, “xml”)	速度快唯一支持XML的解析器	需要安装C语言库
html5lib	BeautifulSoup(markup, “html5lib”)	最好的容错性以浏览器的方式解析文档生成HTML5格式的文档	速度慢不依赖外部扩展

soup = BeautifulSoup ( html ) #创建 beautifulsoup 对象

四大对象种类

Tag（通俗点讲就是 HTML 中的一个个标签）
NavigableString
BeautifulSoup
Comment

1.tag：

 
            1 
          
            2 
          
           print  
           soup 
           . 
           title 
          
           #<title>The Dormouse's story</title>

2.NavigableString

获取标签内部的文字怎么办呢？很简单，用 .string 即可

1 2	print soup . p . string #The Dormouse's story

遍历文档树

.contents

tag 的 .content 属性可以将tag的子节点以列表的方式输出，我们可以用列表索引来获取它的某一个元素

 
             1 
           
             2 
           
            print  
            soup 
            . 
            head 
            . 
            contents 
            [ 
            0 
            ] 
           
            #<title>The Dormouse's story</title>

.children

The Dormouse' s story

Once upon a time there were three little sisters ; and their names were

< a class = "sister" href = "http://example.com/elsie" id = "link1" > < ! -- Elsie -- > < / a > ,

< a class = "sister" href = "http://example.com/lacie" id = "link2" > Lacie < / a > and

.strings

如果一个标签里面没有标签了，那么 .string 就会返回标签里面的内容。如果标签里面只有唯一的一个标签了，那么 .string 也会返回最里面的内容,如果tag包含了多个子节点,tag就无法确定，string 方法应该调用哪个子节点的内容, .string 的输出结果是 None

.stripped_strings

输出的字符串中可能包含了很多空格或空行,使用 .stripped_strings 可以去除多余空白内容

 
            1 
          
            2 
          
            3 
          
            4 
          
            5 
          
            6 
          
            7 
          
            8 
          
            9 
          
            10 
          
            11 
          
            12 
          
           for 
           string 
           in 
           soup 
           . 
           stripped_strings 
           : 
          
           print 
           ( 
           repr 
           ( 
           string 
           ) 
           ) 
          
           # u"The Dormouse's story" 
          
           # u"The Dormouse's story" 
          
           # u'Once upon a time there were three little sisters; and their names were' 
          
           # u'Elsie' 
          
           # u',' 
          
           # u'Lacie' 
          
           # u'and' 
          
           # u'Tillie' 
          
           # u';\nand they lived at the bottom of a well.' 
          
           # u'...'

.parent (有点像pwd)

p = soup . p

print p . parent . name

#body

搜索文档树

方法一：

find_all( name , attrs , recursive , text , **kwargs )

name 参数可以查找所有名字为 name 的tag

下面方法校验了当前元素,如果包含 class 属性却不包含 id 属性,那么将返回 True:

 
       
              1 
            

              2 
            
 
             def  
             has_class_but_no_id 
             ( 
             tag 
             ) 
             : 
            
 
                  
             return 
             tag 
             . 
             has_attr 
             ( 
             'class' 
             ) 
             and 
             not 
             tag 
             . 
             has_attr 
             ( 
             'id' 
             ) 
            
 
      

将这个方法作为参数传入 find_all() 方法,将得到所有标签:

 
              1 
            
              2 
            
              3 
            
              4 
            
             soup 
             . 
             find_all 
             ( 
             has_class_but_no_id 
             ) 
            
             # [<p class="title"><b>The Dormouse's story</b></p>, 
            
             #  <p class="story">Once upon a time there were...</p>, 
            
             #  <p class="story">...</p>]

keyword 参数

 
              1 
            
              2 
            
             soup 
             . 
             find_all 
             ( 
             id 
             = 
             'link2' 
             ) 
            
             # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

如果传入 href 参数,Beautiful Soup会搜索每个tag的”href”属性

 
       
              1 
            

              2 
            
 
             soup 
             . 
             find_all 
             ( 
             href 
             = 
             re 
             . 
             compile 
             ( 
             "elsie" 
             ) 
             ) 
            
 
             # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] 
            
 
      

使用多个指定名字的参数可以同时过滤tag的多个属性

 
       
              1 
            

              2 
            
 
             soup 
             . 
             find_all 
             ( 
             href 
             = 
             re 
             . 
             compile 
             ( 
             "elsie" 
             ) 
             , 
             id 
             = 
             'link1' 
             ) 
            
 
             # [<a class="sister" href="http://example.com/elsie" id="link1">three</a>] 
            
 
      

在这里我们想用 class 过滤，不过 class 是 python 的关键词，这怎么办？加个下划线就可以

 
             1 
           
             2 
           
             3 
           
             4 
           
            soup 
            . 
            find_all 
            ( 
            "a" 
            , 
            class_ 
            = 
            "sister" 
            ) 
           
            # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
           
            #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
           
            #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

有些tag属性在搜索不能使用,比如HTML5中的 data-* 属性

 
              1 
            
              2 
            
              3 
            
             data_soup 
             = 
             BeautifulSoup 
             ( 
             '<div data-foo="value">foo!</div>' 
             ) 
            
             data_soup 
             . 
             find_all 
             ( 
             data 
             - 
             foo 
             = 
             "value" 
             ) 
            
             # SyntaxError: keyword can't be an expression

但是可以通过 find_all() 方法的 attrs 参数定义一个字典参数来搜索包含特殊属性的tag

 
              1 
            
              2 
            
             data_soup 
             . 
             find_all 
             ( 
             attrs 
             = 
             { 
             "data-foo" 
             : 
             "value" 
             } 
             ) 
            
             # [<div data-foo="value">foo!</div>]

limit 参数

 
            1 
          
            2 
          
            3 
          
           soup 
           . 
           find_all 
           ( 
           "a" 
           , 
           limit 
           = 
           2 
           ) 
          
           # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
          
           #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

（2）find( name , attrs , recursive , text , **kwargs )

它与 find_all() 方法唯一的区别是 find_all() 方法的返回结果是值包含一个元素的列表,而 find() 方法直接返回结果

方法二：

CSS选择器

soup.select()， 返回类型是 list

 
      soup 
      = 
      BeautifulSoup 
      ( 
      html 
      , 
      'lxml' 
      ) 
     

 
      print  
      type 
      ( 
      soup 
      . 
      select 
      ( 
      'title' 
      ) 
      ) 
     

 
      print  
      soup 
      . 
      select 
      ( 
      'title' 
      ) 
      [ 
      0 
      ] 
      . 
      get_text 
      ( 
      ) 
     

 
      for 
      title  
      in 
      soup 
      . 
      select 
      ( 
      'title' 
      ) 
      : 
     

 
           
      print  
      title 
      . 
      get_text 
      ( 
      ) 
     

以上的 select 方法返回的结果都是列表形式，可以遍历形式输出，然后用 get_text() 方法来获取它的内容。

- 其他

if-elif-else

迭代：

enumerate() 函数：自动把每个元素变成 (index, element) 这样的tuple，再迭代，就同时获得了索引和元素本身。

[x * x for x in range(1, 11) if x % 2 == 0]

传入**kw 即可传入任意数量的参数，并通过 setattr() 绑定属性，iteritems()用于字典kw的遍历
参考代码:
class Person(object):
    def __init__(self, name, gender, **kw):
        self.name = name
        self.gender = gender
        for k, v in kw.iteritems():
            setattr(self, k, v)

p = Person('Bob', 'Male', age=18, course='Python')
print p.age
print p.course

cuijhon

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python学习（一）

#!/usr/bin/env python3# -*- coding: utf-8 -*- - 字符串和编码u'中文'如果中文字符串在Python环境下遇到 UnicodeDecodeError，这是因为.py文件保存的格式有问题。可以在第一行添加注释# -*- coding: utf-8 -*- - 占位符常见的占位符有：
复制链接

扫一扫

专栏目录