最新消息:20181230 VPS服务器已从Linode换到腾讯云香港,主题仍用朋友推荐的大前端D8

【已解决】Python中openpyxl处理excel去判断单元格是合并后的以及合并的范围

Excel crifan 1387浏览 0评论

用Python的openpyxl去处理excel文件,

遇到一个需求是:

用代码自动去判断,对于合并后的单元格中的内容:

(之前调试时发现的)自动检测出来后,对于后续的,单元格值是空:

但是属于同一系列的话:

那么后续单元格的值,在代码赋值时,也使用前面的值。

其他还有很多合并的cell:

所以问题就变成了:

看看openpyxl是否能检测出来excel的单元格,是否是被合并的,且知道,具体合并了多少行(多少列)

这样才有利于后续单元格值为空时,使用前面的同系列的值

openpyxl check merged cell

python – How to detect merged cells in excel with openpyxl – Stack Overflow

merged_cell_ranges就可以找到了合并了的cell了

Read merged cells in Excel with Python – Stack Overflow

自己写代码?不够方便

不过如果返回:

[‘A1:M1’, ‘B22:B27’]

自己去解析,貌似也不是很方便?

python – openpyxl: merged_cell_ranges and merged_cells are both empty – Stack Overflow

merged_cells

merged_cell_ranges

openpyxl / openpyxl / issues / #476 – Extract the value of merged cell — Bitbucket

-》

Simple usage — openpyxl 2.5.2 documentation

可以用:merge_cells

unmerge_cells

手动试试excel取消合并后,其他单元格是否有值:

很明显,只有第一个有值:

所以此处没法使用unmerge_cells

Working with styles — openpyxl 2.5.2 documentation

openpyxl / openpyxl / issues / #355 – It is no longer possible to tell if a cell is merged by without accessing private methods — Bitbucket

Working with Spreadsheets using Python (Part 2) – Hacker Noon

Simple usage — openpyxl 2.5.0 documentation

merged_cell_ranges openpyxl

如何在excep中使用openpyxl检测合并的单元格 Excel 中文网

openpyxl.worksheet.worksheet module — openpyxl 2.5.0 documentation

找到了文档:

“merged_cell_ranges

Return a copy of cell ranges

Note

Deprecated: Use ws.merged_cells.ranges”

让你用:

ws.merged_cells.ranges

merged_cells openpyxl

去试试

然后用代码:

<code>mergedCellsRangesOld = ws.merged_cell_ranges
logging.info("mergedCellsRangesOld=%s", mergedCellsRangesOld)
mergedCells = ws.merged_cells
logging.info("mergedCells=%s", mergedCells)
mergedCellsRangesNew = ws.merged_cells.ranges
logging.info("mergedCellsRangesNew=%s", mergedCellsRangesNew)
</code>

就可以去调试,找到对应的这些合并的cell了:

其中:

以及可以看到对应的行和列:

  • min_col

  • max_col

  • min_row

  • max_row

的值了。

接着就是去如何写代码了。

其中想到的是:

对于一个cell,想要判断其是否在一个CellRange范围内

去看看 openpyxl的CellRange是否自带这个功能

CellRange openpyxl

python – How to read range of cells using column=numbers instead of letters in OpenPyXL? – Stack Overflow

python – How to move cell range in openpyxl with its properties (hyperlinks, formatting etc.)? – Stack Overflow

python – How to specify a cell range with variables in openpyxl? – Stack Overflow

ws.iter_rows() or ws.iter_cols()

ws.iter_rows() or ws.rows or ws.columns

python – openpyxl: a better way to read a range of numbers to an array – Stack Overflow

excel – Python Openpyxl, copy and paste cell range – Stack Overflow

看来只能自己去写代码去检测了?

然后遇到个小问题:

【已解决】MongoDB的用户的密码中包含@如何写URI

【总结】

最后去写代码:

<code>from openpyxl import Workbook, load_workbook


def isInCellRange(cellToCheck, cellRange):
    """
    to check a cell whether in a cell range
    :param cellToCheck:
    :param cellRange:
    :return:
        True : if cell in range
        False: if cell not in range
    """
    # logging.debug("cellToCheck=[%d:%d]", cellToCheck.row, cellToCheck.col_idx)
    # logging.debug("cellRange: row=[%d:%d] col=[%d:%d]",
    #              cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col)
    if (cellToCheck.row &gt;= cellRange.min_row) and \
        (cellToCheck.row &lt;= cellRange.max_row) and \
        (cellToCheck.col_idx &gt;= cellRange.min_col) and \
        (cellToCheck.col_idx &lt;= cellRange.max_col):
        logging.info("cell[%d:%d] with in cell range: row=[%d:%d] col=[%d:%d]",
                     cellToCheck.row, cellToCheck.col_idx,
                     cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col)
        return True
    else:
        return False


def getCellRangeValue(cellRange):
    """
    get cell range value -&gt; the top left cell value
    :param cellRange:
    :return:
    """
    topLeftCell = ws.cell(row=cellRange.min_row, column=cellRange.min_col)
    topLeftCellValue = topLeftCell.value
    return topLeftCellValue

def getRealCellValue(ws, curCell):
    """
    for openpyxl, to get real value from row and column
    expecially for merged cell, will get its (same) value from top-left cell value

    :param row:
    :param column:
    :return:
    """

    realCellValue = curCell.value

    mergedCellsRangesList = ws.merged_cells.ranges
    # logging.info("mergedCellsRangesList=%s", mergedCellsRangesList)

    # Note:
    # to efficiency , we only check cell in range or not when its value is None
    # for all merged cell value is None
    if not realCellValue:
        for eachCellRange in mergedCellsRangesList:
            if isInCellRange(curCell, eachCellRange):
                cellRangeValue = getCellRangeValue(eachCellRange)
                realCellValue =  cellRangeValue
                break

    return realCellValue

# parse excel file
wb = load_workbook(ExcelFullFilename)
logging.info("wb=%s", wb)
# sheetNameList = wb.get_sheet_names()
# logging.info("sheetNameList=%s", sheetNameList)
ws = wb[StorybookSheetTitle]
logging.info("ws=%s", ws)

# process each row in excel
# for curRowNum in range(realContentRowStartNum, ws.max_row + 1):
for curRowNum, eachRow in enumerate(ws.iter_rows(min_row=realContentRowStartNum)):
    curRowNum += realContentRowStartNum
    logging.info("-"*30 + " row[%d] " + "-"*30, curRowNum)

    ...
    # authorColNumCellValue = ws.cell(row=curRowNum, column=AuthorColNum).value
    authorColNumCellValue = getRealCellValue(ws, eachRow[AuthorColNum - 1])
    logging.info("col[%d] authorColNumCellValue=%s", AuthorColNum, authorColNumCellValue)
    # contentAbstractColNumCellValue = ws.cell(row=curRowNum, column=ContentSimpleIntroColNum).value
    contentAbstractColNumCellValue = getRealCellValue(ws, eachRow[ContentSimpleIntroColNum - 1])
    logging.info("col[%d] contentAbstractColNumCellValue=%s", ContentSimpleIntroColNum, contentAbstractColNumCellValue)
    # publisherColNumCellValue = ws.cell(row=curRowNum, column=PublisherColNum).value
    publisherColNumCellValue = getRealCellValue(ws, eachRow[PublisherColNum - 1])
    logging.info("col[%d] publisherColNumCellValue=%s", PublisherColNum, publisherColNumCellValue)
    ...
</code>

可以检测到后续的,值是None的,处于被合并区域内的Cell单元格:

<code>2018/04/04 02:50:22 LINE 246  INFO    cell[147:14] with in cell range: row=[146:163] col=[14:14]
</code>

对应着147行,14列的:

然后就可以通过:

getCellRangeValue去得到合并区域的值==合并区域的最左上角(top left)的那个单元格的值:

了。

【后记】

后来已把相关 的函数整理到:

https://github.com/crifan/crifanLib/blob/master/python/crifanLib/crifanOpenpyxl.py

了,供参考。

转载请注明:在路上 » 【已解决】Python中openpyxl处理excel去判断单元格是合并后的以及合并的范围

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
62 queries in 0.065 seconds, using 9.47MB memory