【打开任意编码格式文件】

最新推荐文章于 2024-07-20 17:12:48 发布

xfDenny

最新推荐文章于 2024-07-20 17:12:48 发布

阅读量369

点赞数 8

分类专栏： Python # 文件处理文章标签： python 全文检索

本文链接：https://blog.csdn.net/xfDenny/article/details/135982094

版权

Python 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

文件处理

1 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Python标准库和第三方库如chardet来读取并转换不同编码格式的文件，包括`read_file_with_encoding`和`convert_encoding_to_utf8`函数，以及`open_file`和`convert_to_utf8`函数的示例应用。

摘要由CSDN通过智能技术生成

一，使用标准库实现

import codecs

def read_file_with_encoding(filename):
    with open(filename, 'rb') as f:
        raw_data = f.read()
    file_encoding = 'utf-8'
    try_encodings = ['utf-8', 'gbk', 'big5', 'utf-16', 'latin1']
    for encoding in try_encodings:
        try:
            decoded_data = raw_data.decode(encoding)
            file_encoding = encoding
            break
        except UnicodeDecodeError:
            pass
    content = raw_data.decode(file_encoding)
    return content, file_encoding

def convert_encoding_to_utf8(source_file, target_file):
    with open(source_file, 'rb') as f:
        raw_data = f.read()
    try_encodings = ['utf-8', 'gbk', 'big5', 'utf-16', 'latin1']
    for encoding in try_encodings:
        try:
            decoded_data = raw_data.decode(encoding)
            new_data = decoded_data.encode('utf-8')
            with open(target_file, 'wb') as f:
                f.write(new_data)
            break
        except UnicodeDecodeError:
            pass

# 示例用法
content, encoding = read_file_with_encoding('test.txt')
print(content, encoding)

convert_encoding_to_utf8('test.txt', 'test_utf8.txt')

二，使用第三方库

import chardet

def open_file(file_path):
    with open(file_path, 'rb') as file:
        content = file.read()
        result = chardet.detect(content)
        encoding = result['encoding']
    return content, encoding

def convert_to_utf8(file_path, new_file_path='utf8_file.txt'):
    with open(file_path, 'rb') as file:
        content = file.read()
        result = chardet.detect(content)
        encoding = result['encoding']
    if encoding != 'utf-8':
        with open(new_file_path, 'wb') as new_file:
            new_content = content.decode(encoding).encode('utf-8')
            new_file.write(new_content)
            return new_file_path
    else:
        return "File is already in utf-8 encoding"

# 示例
file_content, file_encoding = open_file('sample_file.txt')
print(file_content, file_encoding)

new_file_path = convert_to_utf8('another_file.txt')
print(new_file_path)