用Python的openpyxl去处理excel文件,
遇到一个需求是:
用代码自动去判断,对于合并后的单元格中的内容:
(之前调试时发现的)自动检测出来后,对于后续的,单元格值是空:
但是属于同一系列的话:
那么后续单元格的值,在代码赋值时,也使用前面的值。
其他还有很多合并的cell:
所以问题就变成了:
看看openpyxl是否能检测出来excel的单元格,是否是被合并的,且知道,具体合并了多少行(多少列)
这样才有利于后续单元格值为空时,使用前面的同系列的值
openpyxl check merged cell
python – How to detect merged cells in excel with openpyxl – Stack Overflow
merged_cell_ranges就可以找到了合并了的cell了
Read merged cells in Excel with Python – Stack Overflow
自己写代码?不够方便
不过如果返回:
[‘A1:M1’, ‘B22:B27’]
自己去解析,貌似也不是很方便?
python – openpyxl: merged_cell_ranges and merged_cells are both empty – Stack Overflow
merged_cells
merged_cell_ranges
openpyxl / openpyxl / issues / #476 – Extract the value of merged cell — Bitbucket
-》
Simple usage — openpyxl 2.5.2 documentation
可以用:merge_cells
unmerge_cells
手动试试excel取消合并后,其他单元格是否有值:
很明显,只有第一个有值:
所以此处没法使用unmerge_cells
Working with styles — openpyxl 2.5.2 documentation
Working with Spreadsheets using Python (Part 2) – Hacker Noon
Simple usage — openpyxl 2.5.0 documentation
merged_cell_ranges openpyxl
如何在excep中使用openpyxl检测合并的单元格 Excel 中文网
openpyxl.worksheet.worksheet module — openpyxl 2.5.0 documentation
找到了文档:
“merged_cell_ranges
Return a copy of cell ranges
Note
Deprecated: Use ws.merged_cells.ranges”
让你用:
ws.merged_cells.ranges
merged_cells openpyxl
去试试
然后用代码:
<code>mergedCellsRangesOld = ws.merged_cell_ranges logging.info("mergedCellsRangesOld=%s", mergedCellsRangesOld) mergedCells = ws.merged_cells logging.info("mergedCells=%s", mergedCells) mergedCellsRangesNew = ws.merged_cells.ranges logging.info("mergedCellsRangesNew=%s", mergedCellsRangesNew) </code>
就可以去调试,找到对应的这些合并的cell了:
其中:
以及可以看到对应的行和列:
min_col
max_col
min_row
max_row
的值了。
接着就是去如何写代码了。
其中想到的是:
对于一个cell,想要判断其是否在一个CellRange范围内
去看看 openpyxl的CellRange是否自带这个功能
CellRange openpyxl
python – How to specify a cell range with variables in openpyxl? – Stack Overflow
ws.iter_rows() or ws.iter_cols()
ws.iter_rows() or ws.rows or ws.columns
python – openpyxl: a better way to read a range of numbers to an array – Stack Overflow
excel – Python Openpyxl, copy and paste cell range – Stack Overflow
看来只能自己去写代码去检测了?
然后遇到个小问题:
【总结】
最后去写代码:
<code>from openpyxl import Workbook, load_workbook def isInCellRange(cellToCheck, cellRange): """ to check a cell whether in a cell range :param cellToCheck: :param cellRange: :return: True : if cell in range False: if cell not in range """ # logging.debug("cellToCheck=[%d:%d]", cellToCheck.row, cellToCheck.col_idx) # logging.debug("cellRange: row=[%d:%d] col=[%d:%d]", # cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col) if (cellToCheck.row >= cellRange.min_row) and \ (cellToCheck.row <= cellRange.max_row) and \ (cellToCheck.col_idx >= cellRange.min_col) and \ (cellToCheck.col_idx <= cellRange.max_col): logging.info("cell[%d:%d] with in cell range: row=[%d:%d] col=[%d:%d]", cellToCheck.row, cellToCheck.col_idx, cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col) return True else: return False def getCellRangeValue(cellRange): """ get cell range value -> the top left cell value :param cellRange: :return: """ topLeftCell = ws.cell(row=cellRange.min_row, column=cellRange.min_col) topLeftCellValue = topLeftCell.value return topLeftCellValue def getRealCellValue(ws, curCell): """ for openpyxl, to get real value from row and column expecially for merged cell, will get its (same) value from top-left cell value :param row: :param column: :return: """ realCellValue = curCell.value mergedCellsRangesList = ws.merged_cells.ranges # logging.info("mergedCellsRangesList=%s", mergedCellsRangesList) # Note: # to efficiency , we only check cell in range or not when its value is None # for all merged cell value is None if not realCellValue: for eachCellRange in mergedCellsRangesList: if isInCellRange(curCell, eachCellRange): cellRangeValue = getCellRangeValue(eachCellRange) realCellValue = cellRangeValue break return realCellValue # parse excel file wb = load_workbook(ExcelFullFilename) logging.info("wb=%s", wb) # sheetNameList = wb.get_sheet_names() # logging.info("sheetNameList=%s", sheetNameList) ws = wb[StorybookSheetTitle] logging.info("ws=%s", ws) # process each row in excel # for curRowNum in range(realContentRowStartNum, ws.max_row + 1): for curRowNum, eachRow in enumerate(ws.iter_rows(min_row=realContentRowStartNum)): curRowNum += realContentRowStartNum logging.info("-"*30 + " row[%d] " + "-"*30, curRowNum) ... # authorColNumCellValue = ws.cell(row=curRowNum, column=AuthorColNum).value authorColNumCellValue = getRealCellValue(ws, eachRow[AuthorColNum - 1]) logging.info("col[%d] authorColNumCellValue=%s", AuthorColNum, authorColNumCellValue) # contentAbstractColNumCellValue = ws.cell(row=curRowNum, column=ContentSimpleIntroColNum).value contentAbstractColNumCellValue = getRealCellValue(ws, eachRow[ContentSimpleIntroColNum - 1]) logging.info("col[%d] contentAbstractColNumCellValue=%s", ContentSimpleIntroColNum, contentAbstractColNumCellValue) # publisherColNumCellValue = ws.cell(row=curRowNum, column=PublisherColNum).value publisherColNumCellValue = getRealCellValue(ws, eachRow[PublisherColNum - 1]) logging.info("col[%d] publisherColNumCellValue=%s", PublisherColNum, publisherColNumCellValue) ... </code>
可以检测到后续的,值是None的,处于被合并区域内的Cell单元格:
<code>2018/04/04 02:50:22 LINE 246 INFO cell[147:14] with in cell range: row=[146:163] col=[14:14] </code>
对应着147行,14列的:
然后就可以通过:
getCellRangeValue去得到合并区域的值==合并区域的最左上角(top left)的那个单元格的值:
了。
【后记】
后来已把相关 的函数整理到:
https://github.com/crifan/crifanLib/blob/master/python/crifanLib/crifanOpenpyxl.py
了,供参考。