【Python】开源包openpyxl.utils.cell的工具函数源代码欣赏

openpyxl是Python中比较优秀的,用于读写xlsx文件的第三方开源库,里面有个utils包封装了很多常用的单元格转换函数,写的简洁实用,便于学习参考,特粘贴如下。

以后我们在读写Excel表格文件时,就无需重复造轮子,直接就可以直接用下面的形式,使用这些函数:

import openpyxl.utils.cell
a,b = coordinate_to_tuple('B3')
print(a,b)
# 3 2

当然了,前提是你已经安装了第三方库 openpyxl。

使用优秀开源工具的函数,而不是自己手搓原始代码的好处就是:
开源工具的函数还对参数,如 A2:C3 等这些参数(不稳定因素)做了一次正则表达式校验,所以用起来省心又安全。

官方网址:官方页面地址

常用的Excel读写工具函数

openpyxl.utils.cell.absolute_coordinate(coord_string)
Convert a coordinate to an absolute coordinate string (B12 -> $B$12)
B12 -> $B$12 相对坐标转为固定坐标。

openpyxl.utils.cell.cols_from_range(range_string)
Get individual addresses for every cell in a range. Yields one row at a time.
获取单元格区域中的每个单元格的独立地址,每次生成一行。(应该是用于循环函数)

openpyxl.utils.cell.column_index_from_string(col)
Convert ASCII column name (base 26) to decimal with 1-based index
Characters represent descending multiples of powers of 26
“AFZ” == 26 * pow(26, 0) + 6 * pow(26, 1) + 1 * pow(26, 2)
AA列 转为 27,就是将用字母表示的Excel列名转为数字(从1开始计数)。

openpyxl.utils.cell.coordinate_from_string(coord_string)
Convert a coordinate string like ‘B12’ to a tuple (‘B’, 12)
字符串 ‘B12’ 转为 (‘B’,12) 元组

openpyxl.utils.cell.coordinate_to_tuple(coordinate)
Convert an Excel style coordinate to (row, column) tuple
Excel风格的 B4 转为 (4,2)

openpyxl.utils.cell.get_column_interval(start, end)
Given the start and end columns, return all the columns in the series.
The start and end columns can be either column letters or 1-based indexes.
给出起止列数,返回所有的列值。起止列可以用字母也可以用数字(从1开始计数)。

openpyxl.utils.cell.get_column_letter(col_idx)
Convert decimal column position to its ASCII (base 26) form.
Because column indices are 1-based, strides are actually pow(26, n) + 26 Hence, a correction is applied between pow(26, n) and pow(26, 2) + 26 to prevent and additional column letter being prepended
“A” == 1 == pow(26, 0) “Z” == 26 == pow(26, 0) + 26 // decimal equivalent 10 “AA” == 27 == pow(26, 1) + 1 “ZZ” == 702 == pow(26, 2) + 26 // decimal equivalent 100
数字列名转为字母列名,如 2 转为 B,28 转为AB。数字从1开始计数,即数字1转为A。
超过26的数字用26的幂反推,计算出字母表示。

openpyxl.utils.cell.quote_sheetname(sheetname)
Add quotes around sheetnames if they contain spaces.
工作表的名称如果有空格,给名称包裹上引号(可能是为了防止某些错误)。

openpyxl.utils.cell.range_boundaries(range_string)
Convert a range string into a tuple of boundaries: (min_col, min_row, max_col, max_row) Cell coordinates will be converted into a range with the cell at both end
将字符串 ’A2:D4’ 转为 (1,2,4,4)元组。(注意:结果是先列后行)
A -> 1
2 -> 2
D -> 4
4 -> 4

openpyxl.utils.cell.range_to_tuple(range_string)
Convert a worksheet range to the sheetname and maximum and minimum coordinate indices

openpyxl.utils.cell.rows_from_range(range_string)
Get individual addresses for every cell in a range. Yields one row at a time.
cols_from_range()函数类似。获取单元格区域的每个单元格地址,一次生成一行(应该是用于循环函数)。

源代码

# Copyright (c) 2010-2024 openpyxl

"""
Collection of utilities used within the package and also available for client code
"""
from functools import lru_cache
from itertools import chain, product
from string import ascii_uppercase, digits
import re

from .exceptions import CellCoordinatesException

# constants
COORD_RE = re.compile(r'^[$]?([A-Za-z]{1,3})[$]?(\d+)$')
COL_RANGE = """[A-Z]{1,3}:[A-Z]{1,3}:"""
ROW_RANGE = r"""\d+:\d+:"""
RANGE_EXPR = r"""
[$]?(?P<min_col>[A-Za-z]{1,3})?
[$]?(?P<min_row>\d+)?
(:[$]?(?P<max_col>[A-Za-z]{1,3})?
[$]?(?P<max_row>\d+)?)?
"""
ABSOLUTE_RE = re.compile('^' + RANGE_EXPR +'$', re.VERBOSE)
SHEET_TITLE = r"""
(('(?P<quoted>([^']|'')*)')|(?P<notquoted>[^'^ ^!]*))!"""
SHEETRANGE_RE = re.compile("""{0}(?P<cells>{1})(?=,?)""".format(
    SHEET_TITLE, RANGE_EXPR), re.VERBOSE)




def get_column_interval(start, end):
    """
    Given the start and end columns, return all the columns in the series.

    The start and end columns can be either column letters or 1-based
    indexes.
    """
    if isinstance(start, str):
        start = column_index_from_string(start)
    if isinstance(end, str):
        end = column_index_from_string(end)
    return [get_column_letter(x) for x in range(start, end + 1)]






def coordinate_from_string(coord_string):
    """Convert a coordinate string like 'B12' to a tuple ('B', 12)"""
    match = COORD_RE.match(coord_string)
    if not match:
        msg = f"Invalid cell coordinates ({coord_string})"
        raise CellCoordinatesException(msg)
    column, row = match.groups()
    row = int(row)
    if not row:
        msg = f"There is no row 0 ({coord_string})"
        raise CellCoordinatesException(msg)
    return column, row






def absolute_coordinate(coord_string):
    """Convert a coordinate to an absolute coordinate string (B12 -> $B$12)"""
    m = ABSOLUTE_RE.match(coord_string)
    if not m:
        raise ValueError(f"{coord_string} is not a valid coordinate range")

    d = m.groupdict('')
    for k, v in d.items():
        if v:
            d[k] = f"${v}"

    if d['max_col'] or d['max_row']:
        fmt = "{min_col}{min_row}:{max_col}{max_row}"
    else:
        fmt = "{min_col}{min_row}"
    return fmt.format(**d)




__decimal_to_alpha = [""] + list(ascii_uppercase)



@lru_cache(maxsize=None)
def get_column_letter(col_idx):
    """
    Convert decimal column position to its ASCII (base 26) form.

    Because column indices are 1-based, strides are actually pow(26, n) + 26
    Hence, a correction is applied between pow(26, n) and pow(26, 2) + 26 to
    prevent and additional column letter being prepended

    "A" == 1 == pow(26, 0)
    "Z" == 26 == pow(26, 0) + 26 // decimal equivalent 10
    "AA" == 27 == pow(26, 1) + 1
    "ZZ" == 702 == pow(26, 2) + 26 // decimal equivalent 100
    """

    if not 1 <= col_idx <= 18278:
        raise ValueError("Invalid column index {0}".format(col_idx))

    result = []

    if col_idx < 26:
        return __decimal_to_alpha[col_idx]

    while col_idx:
        col_idx, remainder = divmod(col_idx, 26)
        result.insert(0, __decimal_to_alpha[remainder])
        if not remainder:
            col_idx -= 1
            result.insert(0, "Z")

    return "".join(result)




__alpha_to_decimal = {letter:pos for pos, letter in enumerate(ascii_uppercase, 1)}
__powers = (1, 26, 676)



@lru_cache(maxsize=None)
def column_index_from_string(col):
    """
    Convert ASCII column name (base 26) to decimal with 1-based index

    Characters represent descending multiples of powers of 26

    "AFZ" == 26 * pow(26, 0) + 6 * pow(26, 1) + 1 * pow(26, 2)
    """
    error_msg = f"'{col}' is not a valid column name. Column names are from A to ZZZ"
    if len(col) > 3:
        raise ValueError(error_msg)
    idx = 0
    col = reversed(col.upper())
    for letter, power in zip(col, __powers):
        try:
            pos = __alpha_to_decimal[letter]
        except KeyError:
            raise ValueError(error_msg)
        idx += pos * power
    if not 0 < idx < 18279:
        raise ValueError(error_msg)
    return idx






def range_boundaries(range_string):
    """
    Convert a range string into a tuple of boundaries:
    (min_col, min_row, max_col, max_row)
    Cell coordinates will be converted into a range with the cell at both end
    """
    msg = "{0} is not a valid coordinate or range".format(range_string)
    m = ABSOLUTE_RE.match(range_string)
    if not m:
        raise ValueError(msg)

    min_col, min_row, sep, max_col, max_row = m.groups()

    if sep:
        cols = min_col, max_col
        rows = min_row, max_row

        if not (
            all(cols + rows) or
            all(cols) and not any(rows) or
            all(rows) and not any(cols)
        ):
            raise ValueError(msg)

    if min_col is not None:
        min_col = column_index_from_string(min_col)

    if min_row is not None:
        min_row = int(min_row)

    if max_col is not None:
        max_col = column_index_from_string(max_col)
    else:
        max_col = min_col

    if max_row is not None:
        max_row = int(max_row)
    else:
        max_row = min_row

    return min_col, min_row, max_col, max_row






def rows_from_range(range_string):
    """
    Get individual addresses for every cell in a range.
    Yields one row at a time.
    """
    min_col, min_row, max_col, max_row = range_boundaries(range_string)
    rows = range(min_row, max_row + 1)
    cols = [get_column_letter(col) for col in range(min_col, max_col + 1)]
    for row in rows:
        yield tuple('{0}{1}'.format(col, row) for col in cols)






def cols_from_range(range_string):
    """
    Get individual addresses for every cell in a range.
    Yields one row at a time.
    """
    min_col, min_row, max_col, max_row = range_boundaries(range_string)
    rows = range(min_row, max_row+1)
    cols = (get_column_letter(col) for col in range(min_col, max_col+1))
    for col in cols:
        yield tuple('{0}{1}'.format(col, row) for row in rows)






def coordinate_to_tuple(coordinate):
    """
    Convert an Excel style coordinate to (row, column) tuple
    """
    for idx, c in enumerate(coordinate):
        if c in digits:
            break
    col = coordinate[:idx]
    row = coordinate[idx:]
    return int(row), column_index_from_string(col)






def range_to_tuple(range_string):
    """
    Convert a worksheet range to the sheetname and maximum and minimum
    coordinate indices
    """
    m = SHEETRANGE_RE.match(range_string)
    if m is None:
        raise ValueError("Value must be of the form sheetname!A1:E4")
    sheetname = m.group("quoted") or m.group("notquoted")
    cells = m.group("cells")
    boundaries = range_boundaries(cells)
    return sheetname, boundaries






def quote_sheetname(sheetname):
    """
    Add quotes around sheetnames if they contain spaces.
    """
    if "'" in sheetname:
        sheetname = sheetname.replace("'", "''")

    sheetname = u"'{0}'".format(sheetname)
    return sheetname

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值