折腾:
【已解决】把本地的音频字幕等数据存储到本地MongoDB数据库中
期间,需要去用python去读取和解析一个excel文件,并读取其中的内容,
包括包括多个sheet,以及每个sheet中行row和列column的单元格cell的值:
crifan python excel xlsx
python read excel xlsx
xlwings
openpyxl
pandas
win32com
xlsxwriter
只能写入
DataNitro
xlutils
此处已经可以基本确定用:openpyxl
但是为了想要顺带再去学学panda的话,去搜搜:
Creating Excel files with Python and XlsxWriter — XlsxWriter Documentation
Python操作excel的几种方式–xlrd、xlwt、openpyxl | 文强的个人空间
xlrd:用来读取很方便
openpyxl:虽然功能很强大,但是操作起来感觉没有xlwt方便
xls – Reading xlsx files using Python – Stack Overflow
openpyxl不错
xlrd也不赖
openpyxl – A Python library to read/write Excel 2010 xlsx/xlsm files — openpyxl 2.5.0 documentation
Python Excel Tutorial: The Definitive Guide (article) – DataCamp
看下来:
复杂点的:panda+excel解析器,比如panda+openpyxl
简单点的:openpyxl或xlrd
很久之前用过:xlrd
现在为了尝试新的,且不太麻烦,还是先直接用:openpyxl吧。
参考官网文档:
https://openpyxl.readthedocs.io/en/stable/
➜ 英语资源 pip install openpyxl
Collecting openpyxl
Downloading openpyxl-2.5.1.tar.gz (169kB)
100% |████████████████████████████████| 174kB 1.1MB/s
Collecting jdcal (from openpyxl)
Downloading jdcal-1.3.tar.gz
Collecting et_xmlfile (from openpyxl)
Downloading et_xmlfile-1.0.1.tar.gz
Building wheels for collected packages: openpyxl, jdcal, et-xmlfile
Running setup.py bdist_wheel for openpyxl … done
Stored in directory: /Users/crifan/Library/Caches/pip/wheels/98/5e/20/70cde417026f1e168acdac7babf47b204a7b752b1a8e6bb795
Running setup.py bdist_wheel for jdcal … done
Stored in directory: /Users/crifan/Library/Caches/pip/wheels/0f/63/92/19ac65ed64189de4d662f269d39dd08a887258842ad2f29549
Running setup.py bdist_wheel for et-xmlfile … done
Stored in directory: /Users/crifan/Library/Caches/pip/wheels/99/f6/53/5e18f3ff4ce36c990fa90ebdf2b80cd9b44dc461f750a1a77c
Successfully built openpyxl jdcal et-xmlfile
Installing collected packages: jdcal, et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.0.1 jdcal-1.3 openpyxl-2.5.1
Manipulating a workbook in memory — openpyxl 2.5.0 documentation
Simple usage — openpyxl 2.5.0 documentation
Manipulating a workbook in memory — openpyxl 2.5.0 documentation
想要去写代码,但是发现对于excel中的列的表示不清楚:
然后就可以去使用L13这种写法去读取cell的值了。
代码:
# parse excel file
excelFullfilename = "/Users/crifan/dev/dev_root/xxx/xxx018.3.28_forDebug.xlsx"
wb = load_workbook(excelFullfilename)
logging.info("wb=%s", wb)
# sheetNameList = wb.get_sheet_names()
# logging.info("sheetNameList=%s", sheetNameList)
ws = wb[u"绘本"]
logging.info("ws=%s", ws)
rows = ws.rows
columns = ws.columns
是可以正常执行的:
现在需要去搞清楚,一共有多少行,然后每行循环去处理。
去找找api文档:
https://openpyxl.readthedocs.io/en/stable/index.html
-》
openpyxl package — openpyxl 2.5.0 documentation
Index — openpyxl 2.5.0 documentation
openpyxl.cell.cell module — openpyxl 2.5.0 documentation
搜:
sheet
workbook
openpyxl.packaging.workbook module — openpyxl 2.5.0 documentation
Openpyxl get current sheet max row
row_count = sheet.max_row
column_count = sheet.max_column
-》
openpyxl.worksheet.worksheet module — openpyxl 2.5.0 documentation
-》
“max_column
The maximum column index containing data (1-based)
Type:
int
max_row
The maximum row index containing data (1-based)
Type:
int “
找到了。
然后PyCharm中再想要去查看源码,也是可以看到合适的匹配出的代码的:
然后就可以去写代码读取值了:
from openpyxl import Workbook, load_workbook
StorybookSheetTitle = u"绘本"
EnglishStorybookRootPath = "/Users/crifan/dev/dev_rootxxx"
ExcelFilename = "xxx资源2018.3.28_forDebug.xlsx"
ExcelFullFilename = os.path.join(EnglishStorybookRootPath, ExcelFilename)
AudioFilePathPrefix = EnglishStorybookRootPath
# parse excel file
wb = load_workbook(ExcelFullFilename)
logging.info("wb=%s", wb)
# sheetNameList = wb.get_sheet_names()
# logging.info("sheetNameList=%s", sheetNameList)
ws = wb[StorybookSheetTitle]
logging.info("ws=%s", ws)
# process each row in excel
for curRowNum in range(realContentRowStartNum, ws.max_row + 1):
logging.info("-"*30 + " row[%d] " + "-"*30, curRowNum)
hasAudioFileColNumCellValue = ws.cell(row=curRowNum, column=HasAudioFileColNum).value
logging.info("col[%d] hasAudioFileColNumCellValue=%s", HasAudioFileColNum, hasAudioFileColNumCellValue)
audioFilePathColNumCellValue = ws.cell(row=curRowNum, column=AudioFilePathColNum).value
logging.info("col[%d] audioFilePathColNumCellValue=%s", AudioFilePathColNum, audioFilePathColNumCellValue)
对于excel文件:
输出:
2018/03/30 02:55:02 LINE 104 INFO wb=<openpyxl.workbook.workbook.Workbook object at 0x107855c50>
2018/03/30 02:55:02 LINE 108 INFO ws=<Worksheet "\u7ed8\u672c">
2018/03/30 02:55:02 LINE 113 INFO mongoClient=MongoClient(host=[‘localhost:27017’], document_class=dict, tz_aware=False, connect=True)
2018/03/30 02:55:02 LINE 117 INFO gridfsDb=Database(MongoClient(host=[‘localhost:27017′], document_class=dict, tz_aware=False, connect=True), u’gridfs’)
2018/03/30 02:55:02 LINE 125 INFO fsCollection=<gridfs.GridFS object at 0x107864b50>
2018/03/30 02:55:02 LINE 129 INFO —————————— row[3] ——————————
2018/03/30 02:55:02 LINE 132 INFO col[12] hasAudioFileColNumCellValue=有
2018/03/30 02:55:02 LINE 134 INFO col[13] audioFilePathColNumCellValue=None
调试的效果: