程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python automation series operation PDF library pypdf2

編輯:Python

PDF yes Portable Document Format For short , Meaning for “ Portable document format ”, By Adobe Systems Used with applications 、 operating system 、 The file format developed by the file exchange in a hardware independent way .

stay python There are multiple corresponding libraries in the Pdf file , The most common one is Pypdf2

PyPDF It's an operation pdf Module , Now the most commonly used version is PyPDF2;
It should be noted that , This library cannot be operated pdf Get text messages

PyPDF2 Introduce

PyPDF2 It's pure. Python PDF library , You can read document information ( title , Author, etc )、 write in 、 Division 、 Merge PDF file , It can also be true of pdf Add watermark to the document 、 Encryption and decryption .

install PyPDF2

Use pip Package manager installation PyPDF2 The latest version :
pip install PyPDF2

The editor recommends using VSCode, start-up VSCode, You can directly choose to open “ terminal ” menu , Install the library and run the program ; Very convenient

Use PyPDF2

PyPdf2 There are two modules , Namely :

  • Read Library PDFFileReader
  • Operation Library PdfFileWriter

1、 Use PDFFileReader Can get pdf Basic information of the document , You can also get every page pdf And load as PageObject object ;

from PyPDF2 import PdfFileReader # introduce reader
pdf = PdfFileReader(input_path) # Initialize a reader object , Incoming file path
infomation = pdf.getDocumentInfo() # Get document information
number_of_pages = pdf.getNumPages() # Get total pages

The complete example code is as follows :

def read(): ''' Read pdf data ''' from PyPDF2 import PdfFileReader # introduce reader pdf = PdfFileReader(input_path) # Initialize a reader object , Incoming file path  #pdf = pdf.decrypt('password') # Keep encrypted files confidential  infomation = pdf.getDocumentInfo() # Get document information  number_of_pages = pdf.getNumPages() # Get total pages  txt = f'''{input_path} information: Author : {infomation.author}, Creator : {infomation.creator}, Producer : {infomation.producer}, Subject : {infomation.subject}, Title : {infomation.title}, Number of pages : {number_of_pages} ''' print(txt) # Above information , Except pages , The following files may not exist  # This library is not suitable for reading document contents  for i in range(0,number_of_pages): pageObject = pdf.getPage(i) #print(pageObject.extractText())

2、 Use PdfFileWriter Need to cooperate with PdfFileReader

from PyPDF2 import PdfFileWriter,PdfFileReader
pdfReader = PdfFileReader(input_path)
pdfWriter = PdfFileWriter()
addPage To this end PDF File add page The page is usually from a PdfFileReader Obtained in instance
pdfWriter.addPage(pdfReader.getPage(0))

For details, please refer to the following code comments :

def write(): ''' write in ''' from PyPDF2 import PdfFileWriter,PdfFileReader pdfReader = PdfFileReader(input_path) pdfWriter = PdfFileWriter() # addPage To this end PDF File add page The page is usually from a PdfFileReader Obtained in instance pdfWriter.addPage(pdfReader.getPage(0)) # insertBlankPage Insert a blank page into this PDF File and return to this page PageObject object # insertBlankPage(width=None, height=None, index=0) Add... At the beginning by default pdfWriter.insertBlankPage(width=100,height=100) # addBlankPage(width=None, height=None) Add a blank page , If not specified width|height, Use the... On the previous page width|height # If not specified width|height And there is no previous page raise PageSizeNotDefinedError pdfWriter.addBlankPage() # Here it is PDF Insert a pageObject object . The page is usually from a PdfFileReader Obtained in instance # index Specify the insertion position By default, insert at the beginning pdfWriter.insertPage(pdfReader.getPage(2)) # addAttachment(fname, fdata) stay PDF Embedded files in # pdfWriter.addAttachment(fname=" Annex 1 .txt", fdata=b'Hello world!') print(pdfWriter.getNumPages()) # encryption #pdfWriter.encrypt(user_pwd='password', owner_pwd='password') pdfWriter.write(open('H:/test_w.pdf','wb'))

3、 Important concepts PageObject:

stay PdfFileReader load pdf After the document , Each page retrieved will be converted to PageObject object , about Pdf The operation of , In fact, it is operating PageObject object ;

Here is PageObject Methods commonly used in objects :

PageObject Methods :
mergePage(page2) Merge the contents of two pages into one , Watermark effect can be achieved
mergeRotatedPage(page2, rotation, expand=False) similar mergePage Method , It can be done to page2 Rotate the page
mergeScaledPage(page2, scale, expand=False) similar mergePage Method , It can be done to page2 Zoom the page
mergeTranslatedPage(page2, tx, ty, expand=False) similar mergePage Method , It can be done to page2 Page panning
mergeRotatedScaledPage(page2, rotation, scale, expand=False) similar mergePage Method , It can be done to page2 Rotate and zoom the page
mergeRotatedScaledTranslatedPage(page2, rotation, scale, tx, ty, expand=False) similar mergePage Method , It can be done to page2 Page rotation 、 Zoom and pan operations
mergeRotatedTranslatedPage(page2, rotation, tx, ty, expand=False) similar mergePage Method , It can be done to page2 Rotate and pan the page
mergeScaledTranslatedPage(page2, scale, tx, ty, expand=False) similar mergePage Method , It can be done to page2 Zoom and pan the page
mergeTransformedPage(page2, ctm, expand=False) similar mergePage Method , It can be done to page2 The page performs matrix conversion operations
rotateClockwise(angle) Rotate the page clockwise ,angle Must be 90 The increment of degrees
rotateCounterClockwise(angle) Rotate the page counterclockwise ,angle Must be 90 The increment of degrees
scale(sx, sy) Zoom the page
scaleBy(factor) Press fixed XY Axis scaling page
scaleTo(width, height) Zoom the page to the specified size

Implement a merge pdf File functionality :

Please refer to the code Notes for understanding :


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved