最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】python解析excel文件并读取其中的sheet和row和column的值

Excel crifan 7310浏览 0评论

折腾:

【已解决】把本地的音频字幕等数据存储到本地MongoDB数据库中

期间,需要去用python去读取和解析一个excel文件,并读取其中的内容,

包括包括多个sheet,以及每个sheet中行row和列column的单元格cell的值:

crifan python excel xlsx

python read excel xlsx

Python-Excel 模块哪家强?

  • xlwings

  • openpyxl

  • pandas

  • win32com

  • xlsxwriter

    • 只能写入

  • DataNitro

  • xlutils

此处已经可以基本确定用:openpyxl

但是为了想要顺带再去学学panda的话,去搜搜:

Creating Excel files with Python and XlsxWriter — XlsxWriter Documentation

Python操作excel的几种方式–xlrd、xlwt、openpyxl | 文强的个人空间

  • xlrd:用来读取很方便

  • openpyxl:虽然功能很强大,但是操作起来感觉没有xlwt方便

xls – Reading xlsx files using Python – Stack Overflow

  • openpyxl不错

  • xlrd也不赖

openpyxl – A Python library to read/write Excel 2010 xlsx/xlsm files — openpyxl 2.5.0 documentation

Python Excel Tutorial: The Definitive Guide (article) – DataCamp

看下来:

复杂点的:panda+excel解析器,比如panda+openpyxl

简单点的:openpyxl或xlrd

很久之前用过:xlrd

现在为了尝试新的,且不太麻烦,还是先直接用:openpyxl吧。

参考官网文档:

https://openpyxl.readthedocs.io/en/stable/

➜  英语资源 pip install openpyxl

Collecting openpyxl

  Downloading openpyxl-2.5.1.tar.gz (169kB)

    100% |████████████████████████████████| 174kB 1.1MB/s

Collecting jdcal (from openpyxl)

  Downloading jdcal-1.3.tar.gz

Collecting et_xmlfile (from openpyxl)

  Downloading et_xmlfile-1.0.1.tar.gz

Building wheels for collected packages: openpyxl, jdcal, et-xmlfile

  Running setup.py bdist_wheel for openpyxl … done

  Stored in directory: /Users/crifan/Library/Caches/pip/wheels/98/5e/20/70cde417026f1e168acdac7babf47b204a7b752b1a8e6bb795

  Running setup.py bdist_wheel for jdcal … done

  Stored in directory: /Users/crifan/Library/Caches/pip/wheels/0f/63/92/19ac65ed64189de4d662f269d39dd08a887258842ad2f29549

  Running setup.py bdist_wheel for et-xmlfile … done

  Stored in directory: /Users/crifan/Library/Caches/pip/wheels/99/f6/53/5e18f3ff4ce36c990fa90ebdf2b80cd9b44dc461f750a1a77c

Successfully built openpyxl jdcal et-xmlfile

Installing collected packages: jdcal, et-xmlfile, openpyxl

Successfully installed et-xmlfile-1.0.1 jdcal-1.3 openpyxl-2.5.1

用 openpyxl 处理 xlsx 文件 – 李林克斯

OpenPyXL的使用教程(一) – 简书

Manipulating a workbook in memory — openpyxl 2.5.0 documentation

Simple usage — openpyxl 2.5.0 documentation

Manipulating a workbook in memory — openpyxl 2.5.0 documentation

想要去写代码,但是发现对于excel中的列的表示不清楚:

【已解决】Excel中把列名的表示方式从数字换成英文字母

然后就可以去使用L13这种写法去读取cell的值了。

代码:

# parse excel file

excelFullfilename = "/Users/crifan/dev/dev_root/xxx/xxx018.3.28_forDebug.xlsx"

wb = load_workbook(excelFullfilename)

logging.info("wb=%s", wb)

# sheetNameList = wb.get_sheet_names()

# logging.info("sheetNameList=%s", sheetNameList)

ws = wb[u"绘本"]

logging.info("ws=%s", ws)

rows = ws.rows

columns = ws.columns

是可以正常执行的:

现在需要去搞清楚,一共有多少行,然后每行循环去处理。

去找找api文档:

https://openpyxl.readthedocs.io/en/stable/index.html

-》

openpyxl package — openpyxl 2.5.0 documentation

Index — openpyxl 2.5.0 documentation

openpyxl.cell.cell module — openpyxl 2.5.0 documentation

搜:

sheet

workbook

openpyxl.packaging.workbook module — openpyxl 2.5.0 documentation

Openpyxl  get current sheet max row

python – Is it possible to get an Excel document’s row count without loading the entire document into memory? – Stack Overflow

row_count = sheet.max_row

column_count = sheet.max_column

-》

openpyxl.worksheet.worksheet module — openpyxl 2.5.0 documentation

-》

https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html#openpyxl.worksheet.worksheet.Worksheet.max_row

“max_column

The maximum column index containing data (1-based)

Type:

int

max_row

The maximum row index containing data (1-based)

Type:

int “

找到了。

然后PyCharm中再想要去查看源码,也是可以看到合适的匹配出的代码的:

然后就可以去写代码读取值了:

from openpyxl import Workbook, load_workbook

StorybookSheetTitle = u"绘本"

EnglishStorybookRootPath = "/Users/crifan/dev/dev_rootxxx"

ExcelFilename = "xxx资源2018.3.28_forDebug.xlsx"

ExcelFullFilename = os.path.join(EnglishStorybookRootPath, ExcelFilename)

AudioFilePathPrefix = EnglishStorybookRootPath

# parse excel file

wb = load_workbook(ExcelFullFilename)

logging.info("wb=%s", wb)

# sheetNameList = wb.get_sheet_names()

# logging.info("sheetNameList=%s", sheetNameList)

ws = wb[StorybookSheetTitle]

logging.info("ws=%s", ws)

# process each row in excel

for curRowNum in range(realContentRowStartNum, ws.max_row + 1):

    logging.info("-"*30 + " row[%d] " + "-"*30, curRowNum)

    hasAudioFileColNumCellValue = ws.cell(row=curRowNum, column=HasAudioFileColNum).value

    logging.info("col[%d] hasAudioFileColNumCellValue=%s", HasAudioFileColNum, hasAudioFileColNumCellValue)

    audioFilePathColNumCellValue = ws.cell(row=curRowNum, column=AudioFilePathColNum).value

    logging.info("col[%d] audioFilePathColNumCellValue=%s", AudioFilePathColNum, audioFilePathColNumCellValue)

对于excel文件:

输出:

2018/03/30 02:55:02 LINE 104  INFO    wb=<openpyxl.workbook.workbook.Workbook object at 0x107855c50>

2018/03/30 02:55:02 LINE 108  INFO    ws=<Worksheet "\u7ed8\u672c">

2018/03/30 02:55:02 LINE 113  INFO    mongoClient=MongoClient(host=[‘localhost:27017’], document_class=dict, tz_aware=False, connect=True)

2018/03/30 02:55:02 LINE 117  INFO    gridfsDb=Database(MongoClient(host=[‘localhost:27017′], document_class=dict, tz_aware=False, connect=True), u’gridfs’)

2018/03/30 02:55:02 LINE 125  INFO    fsCollection=<gridfs.GridFS object at 0x107864b50>

2018/03/30 02:55:02 LINE 129  INFO    —————————— row[3] ——————————

2018/03/30 02:55:02 LINE 132  INFO    col[12] hasAudioFileColNumCellValue=有

2018/03/30 02:55:02 LINE 134  INFO    col[13] audioFilePathColNumCellValue=None

调试的效果:

转载请注明:在路上 » 【已解决】python解析excel文件并读取其中的sheet和row和column的值

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
90 queries in 0.181 seconds, using 22.13MB memory