# 【记录】用Python从pdf文件中提取文字数据信息

【背景】

【折腾过程】

1.搜了下，找到个：

pyPdf

http://pybrary.net/pyPdf/

“extracting document information (title, author, …),”

PyPDF2

http://knowah.github.io/PyPDF2/

PDFtk

2.也找到个：

pyfpdf

3.后来参考：

python提取pdf与word中的相关信息

PDFMiner

#### What’s It?

PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.

4.