python 基础知识（数据结构）

MusicDancing

已于 2023-12-04 17:03:45 修改

阅读量150

点赞数

分类专栏： python 文章标签： python

于 2021-01-07 15:42:25 首次发布

本文链接：https://blog.csdn.net/MusicDancing/article/details/112312908

版权

python 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

1. 好用的工具

1.1 Python性能分析工具Profile

Python性能分析工具Profile - -零 - 博客园

2. python 脚本起始行

#!/usr/bin/env pthon 
# 在类Unix 环境下，有起始行就能够仅输入脚本名来执行脚本，无需直接调用解释器。

# 交换两个变量
x, y = y, x

# 使用eval() 从repr()函数得到的字符串重建该对象
obj == eval(repr(obj))  # 两边恒等

# print 输出不换行
print("hello", end='')
# 输出重定向
print >> sys.stderr, 'Fatal error: invalid input!'
print >> my_file, 'Fatal error: invalid input!'

3. 数据结构

3.1 集合

集合差集

# set1-set2
set1.difference(set2)

3.2 字典

3.2.1 字典排序

# 字典按键排序
sorted(mydict)
# 字典按值排序
sorted(mydict.items(), key=lambda item:item[1])
# 使用dict2更新dict1(键存在则覆盖值)
dict1.update(dict2)

3.2.2 统计词频

文本aa.txt

hello word

hello dog

hello cat

hi dog

(1) 使用collections包

from collections import Counter
with open('aa.txt') as f:
    counter = Counter([item for line in f for item in line.strip('\n').split(' ')])
    # Counter({'hello': 3, 'dog': 2, 'word': 1, 'cat': 1, 'hi': 1})
    # 筛选出出现两次以上的单词
    counter_new = dict(filter(lambda x: x[1] >= 2, counter.items()))
    print(counter_new)
    # {'hello': 3, 'dog': 2}

    # print(counter.items())
    # dict_items([('hello', 3), ('word', 1), ('dog', 2), ('cat', 1), ('hi', 1)])

（2）统计文件某一列字段出现频次

df = pd.read_csv(input_file, dtype=str)
temp_dict = {}
for item in df['aa']:
    temp_dict[item] = temp_dict.get(item, 0) + 1
print(temp_dict)

3.3 深拷贝与浅拷贝

# 深拷贝
copy.deepcopy()
# 浅拷贝
copy.copy()

非容器类型（数字，字符串等）没有被拷贝的说法，都是新建一个对象；

浅拷贝：传递引用，指向原有对象。

深拷贝：创建一个新的对象。

3.4 迭代器

# 统计文件中单词的个数
# file 就是迭代器，不需要f.readline();
with open('data.txt') as f:
    len([word for line in f for word in line.split()])

3.5 生成器

# 生成器表达式
（expr for item in iter_obj if cond_expr）
# 类似列表推到
[expr for item in iter_obj if cond_expr]

# 统计文件中单词的个数
with open('data.txt') as f:
    len((word for line in f for word in line.split()))

备注：列表解析是将文件的所有行读取到内存，然后生成列表；而生成器却可以节约内存。

3.6 字符串

1. 系统自带的字符串常量

# 数字
string.digits
  '0123456789'
# 大写字母
string.ascii_uppercase
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# 小写字母
string.ascii_lowercase
  'abcdefghijklmnopqrstuvwxyz'
# 英文字母
string.ascii_letters
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

2. 字符串反转

# 字符串反转 (list 反转亦是)
aa = 'abcde'
aa_reverse = aa[::-1]

3. 字符串拼接&格式化输出

# 字符串拼接
"hello" + " " + "world"
"%s %s" %("hello", "world")
" ".join(["hello", "world"])
"{0}\t{1}".format("hello", "world")
print(f"{aa}\n{bb}")
# 格式化输出
 print("{one} + {two} = {three}".format(one='1', two='2', three='3'))

4. 字符串包含

# 一个字符串是否出现在另一个字符串中
'bc' not in 'abc'
# 是否为字符串类型
isinstance('bc', str)

5. 字符串替换

# 返回替换后的新串 
str = "123\n456"
str_new = str.replace('\n', '-')
print(str_new)

MusicDancing

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 基础知识（数据结构）

1. python 脚本起始行#/usr/bin/env pthon # 在类Unix 环境下，有起始行就能够仅输入脚本名来执行脚本，无需直接调用解释器。# 交换两个变量x, y = y, x
复制链接

扫一扫

专栏目录