在Python中,文件操作和数据格式化是处理数据的基础技能。以下是关键点总结和示例:
**一、文件操作**
1. **打开文件**
使用`open()`函数和`with`语句(推荐,自动关闭文件):
```python
with open('file.txt', 'r', encoding='utf-8') as f:
content = f.read()
```
- **模式**:`r`(读)、`w`(覆盖写)、`a`(追加)、`b`(二进制,如`rb`或`wb`)。
2. **读取文件**
- `f.read()`:读取全部内容。
- `f.readline()`:读取单行。
- `f.readlines()`:返回所有行的列表。
- **逐行高效读取**:
```python
with open('file.txt', 'r') as f:
for line in f:
print(line.strip())
```
3. **写入文件**
- `f.write("text")`:写入字符串。
- `f.writelines(list_of_strings)`:写入字符串列表(需手动添加换行符`\n`)。
```python
with open('output.txt', 'w') as f:
f.write("Hello, World!\n")
f.writelines(["Line 1\n", "Line 2\n"])
```
4. **异常处理**
处理文件不存在等错误:
```python
try:
with open('file.txt', 'r') as f:
print(f.read())
except FileNotFoundError:
print("文件不存在!")
```
---
**二、数据格式化**
1. **字符串格式化**
- **f-string(推荐)**:
```python
name = "Alice"
age = 30
print(f"{name} is {age} years old.")
```
- **格式控制**:
```python
price = 99.987
print(f"价格:{price:.2f}") # 输出:价格:99.99
```
2. **JSON处理**
使用`json`模块:
```python
import json
# 读取JSON文件
with open('data.json', 'r') as f:
data = json.load(f)
# 写入JSON文件
data['new_key'] = "value"
with open('data.json', 'w') as f:
json.dump(data, f, indent=4) # indent美化格式
```
3. **CSV处理**
使用`csv`模块:
```python
import csv
# 读取CSV
with open('data.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'], row['age'])
# 写入CSV
with open('output.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(["Name", "Age"])
writer.writerow(["Alice", 30])
```
---
**三、示例项目**
**统计文本文件中的单词频率**:
```python
from collections import defaultdict
import re
word_count = defaultdict(int)
with open('text.txt', 'r', encoding='utf-8') as f:
for line in f:
words = re.findall(r'\b\w+\b', line.lower()) # 分割单词并转小写
for word in words:
word_count[word] += 1
# 输出结果到文件
with open('word_counts.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(["Word", "Count"])
for word, count in word_count.items():
writer.writerow([word, count])
```
---
**四、注意事项**
1. **编码问题**:始终指定文件编码(如`encoding='utf-8'`)。
2. **路径处理**:使用`pathlib`或`os.path`处理跨系统路径。
3. **大文件处理**:避免一次性读取,使用逐行或分块处理。
通过实践这些操作和格式化方法,可以高效处理各类数据任务!