程序師世界是廣大編程愛好者互助、分享、學習的平台，程序師世界有你更精彩！


設為首頁	加入收藏

首頁
編程語言: C語言|JAVA編程
 Python編程
網頁編程: ASP編程|PHP編程
 JSP編程
數據庫知識: MYSQL數據庫|SqlServer數據庫
 Oracle數據庫|DB2數據庫

您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Extracting PDF file data with Python

編輯：Python

First, install these two libraries

pip install pdfplumber
pip install openpyxl

1. Initialization path

path = r"C:\Users\lenovo\Desktop\ Thesis and interview \ Customer focus .pdf"

2. open pdf file

pdf_mt = pdfplumber.open(path)
pdf_mt

3. Get the page where the data is located （ How many pages in total ）

# Get the page where the data is located list --> [ The object of the first page , The object of the second page ,... The first n The object of the page ]
all_pages = pdf_mt.pages
all_pages

4. obtain pdf Each page of text data ( Text data of the first 40 pages ）

for pdf_pg in all_pages[0:40]:
print(pdf_pg.extract_text())

5. Get the contents of the form

for pdf_pg in all_pages[0:40]:
print(pdf_pg.extract_tables())

6. Save data to excel

# establish workbook object 
wb = Workbook()
# Activate sheet 
ws = wb.active

for pdf_pg in need_pages:
# print(pdf_pg)
# Get the text content of each page 
# print(pdf_pg.extract_text())
# Get the contents of the form form ： A two-dimensional [[],[]]
# print(pdf_pg.extract_tables()) 
# The table has two-dimensional data with rows and columns , Get a list of two dimensions 
for pdf_tb in pdf_pg.extract_tables():
# print(pdf_tb) 
# Write data row by row into the worksheet 
for row in pdf_tb:
ws.append(row)
wb.save("demo3.xlsx")

上一篇文章： Necessary for data science Python uses panda for data visualization
下一篇文章： Python中bs4怎麼安裝

Python

70 super hot Python projects recommended for hand training

I believe I am learning Python

Python - matplot plot plot multi graph histogram and line graph coexist and share the X axis

introduction Previous article

python小問題2

已知一方程f（x），已知根處於（2，5），利用二分法求出該根

Django從入門到放棄一 -- URL控制器，視圖語法，模板語法

參考地址：Django-MTV - Yuan先生 - 博客園

Use Pythons requests and beautiful soup to analyze web pages

author ： translator ：

django項目在本地是可以運行的，但是使用docker compose部署到服務器上面會報導包的錯誤，如何解決？

django項目在本地是可以運行的，但是使用docker c

相關文章

Python script: change all files in the current folder in a certain order, and save the original file name and the new file name to TXT (separated by spaces)

Python Programming: socket to realize file transfer (simple version of file server)

Python Django static file import failed. I will teach you the correct method

How to solve filenotfounderror (2, no such file or directory: /usr/local/bin/python3.9)

Python finds a specific file and changes its information

Analysis of problems encountered by python2 in reading Chinese file encoding

(target detection) generate XML file script - Python

Illustration of Python | file and directory operations

Yaml configuration file writing small case (Python version)

Python | file operation | read / write operation of text file

閱讀排行榜

Python遙感圖像處理應用篇(十九)：GDAL +numpy批量對遙感圖像外圍背景值進行處理 8、 Python learning notes - object oriented - Exercise Common error reporting solutions for Python Python | attribute wrapper 【機器學習基礎】用Python畫出幾種常見機器學習二分類損失函數 python 基礎知識-day10（面向對象） When I drew a Christmas tree for my female classmate in Python Python 字典 while循環 Python matrix: [numpy, panda] cascade classifier intercepts face area display and image processing (flip, brighten, darken, intercept) Python learning (I) basic syntax and input / output functions Python解決Microsoft Visual C++14.0 is required問題

熱門圖文

ifstream中文路徑問題分析，ifstream中文路徑 UVALive6814 Lexicography php array_map array_multisort 高效處理多維數組排序使用OSGi構建面向服務的聯絡管理應用程序 poj(2676)——Sudoku C語言中的乘方算法 PHP json_decode函數詳細解析 array_multisort實現PHP多維數組排序示例講解

欄目導航

編程綜合問答

更多關於編程

編程問題解答

Copyright © 程式師世界 All Rights Reserved