程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python office automation word to excel file batch processing

編輯:Python

Catalog

Preface

use first Python take Word File import

row and cell Parsing required content

Inner parsing loop

Preface

Hello everyone , Today there is a civil servant junior partner asked me to give him a favor , Probably there is such a Word( Due to privacy related to specific contents of the file so the text has been specifically modified )

A total of nearly 2600 Thin strip table column similar format , Each section contains information there :

date

Units issued

Number

title

Bar sign

We need to extract the contents to a bold three Excel Table storage , The following table style :

That is to say, the time of receiving the document 、 Document title 、 Number fill to the specified location , At the same time it needs to be changed to standard time format , If it is completely manually copy and modify time , According to an entry 10s The time calculation , A minute to complete 6 strip , Then the fastest also need :

And this kind of structured file arrangement is very suitable for Python To execute , Good then the next please Python appearance , I have the necessary information to comment information presented in the code .

use first Python take Word File import # Import required libraries docxfrom docx import Document# Specify the file storage path path = r'C:\Users\word.docx' # Read the file document = Document(path)# Read word All tables in tables = document.tables

Then divide-by-issue , First, try to get the first table entry of the first file three required information

# Get the first table table0 = tables[0]

Careful observation can be found in a file entry occupies 3 That's ok , So when all the lines in the table iteration of the loop can be set in steps of 3

row and cell Parsing required content

Observe table , according to row and cell The content analysis clearly needed

# Put in a global variable for counting number to fill n = 0for i in range(0, len(table0.rows) + 1, 3):    # date     date = table0.cell(i, 1).text    # title     title = table0.cell(i + 1, 1).text.strip()    # Number     dfn = tables[j].cell(i, 3).text.strip()    print(n, date, tite, dfn)

Next we need to address is , Time we have obtained is  2/1  such   Japan / The form of the month . We need to be converted into  YYYY-MM-DD Format , This use to datetime Bag strptime and strftime function :

strptime:  Contained in parsing a string time

strftime:  Conversion to the desired time format

import datetimen = 0for i in range(0, len(table0.rows) + 1, 3):    # date     date = table0.cell(i, 1).text    # Some time entry is empty , Not too much discrimination here     if '/' in date:        date = datetime.datetime.strptime(date, '%d/%m').strftime('2020-%m-%d')    else:        date = '-'    # title     title = table0.cell(i + 1, 1).text.strip()    # Number     dfn = tables[j].cell(i, 3).text.strip()    print(n, date, tite, dfn)

Such a table of contents parsing is done , Notice I'm using theta table[0] That first table , Through all the tables plus a nested loop can , Alternatively, you can capture an abnormal increase in the flexibility of the program

n = 0for j in range(len(tables)):    for i in range(0, len(tables[j].rows)+1, 3):        try:            # date             date = tables[j].cell(i, 1).text            if '/' in date:                date = datetime.datetime.strptime(date, '%d/%m').strftime('2020-%m-%d')            else:                date = '-'            # title             title = tables[j].cell(i + 1, 1).text.strip()            # Number             dfn = tables[j].cell(i, 3).text.strip()            n += 1            print(n, date, title, dfn)        except Exception as error:            # Capture exception , It can also be used. log Written log for easy viewing and management             print(error)            continue

Information analysis and can export the completed acquisition , The package is used openpyxl

from openpyxl import Workbook# Instantiation wb = Workbook()# Get current sheetsheet = wb.active# Set up the header header = [' Serial number ', ' Time of receipt ', ' Office of the text number ', ' Document title ', ' Number ', ' remarks ']sheet.append(header) Inner parsing loop

At the end of the innermost loop parsing code below to add

row = [n, date, ' ', title, dfn, ' ']sheet.append(row)

Finally, remember to save the thread

wb.save(r'C:\Users\20200420.xlsx')

Running time at 10 About minutes , Probably leave for a while on the implementation of the program ended

Finally, attach the complete code , The code is simple , The most important sort out ideas

from docx import Documentimport datetimefrom openpyxl import Workbookwb = Workbook()sheet = wb.activeheader = [' Serial number ', ' Time of receipt ', ' Office of the text number ', ' Document title ', ' Number ', ' remarks ']sheet.append(header)path = r'C:\Users\word.docx'document = Document(path)tables = document.tablesn = 0for j in range(len(tables)):    for i in range(0, len(tables[j].rows)+1, 3):        try:            # date             date = tables[j].cell(i, 1).text            if '/' in date:                date = datetime.datetime.strptime(date, '%d/%m').strftime('2020-%m-%d')            else:                date = '-'            # title             title = tables[j].cell(i + 1, 1).text.strip()            # Number             dfn = tables[j].cell(i, 3).text.strip()            n += 1            print(n, date, title, dfn)            row = [n, date, ' ', title, dfn, ' ']            sheet.append(row)        except Exception as error:            # Capture exception , It can also be used. log Written log for easy viewing and management             print(error)            continuewb.save(r'C:\Users\20200420.xlsx')

That's all Python Office automation Word turn Excel Details of document batch processing , More about Python Office automation Word turn Excel Please pay attention to other relevant articles of software development network !



  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved