程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

python file data analysis management extraction

編輯:Python

目錄

前提提要

要求

思路

代碼

運行結果

分析

 1)讀取文件

 2)讀取數據

 3)數據整理

 4)Regular expression matching plus data deduplication

 6)Data export and save


前提提要

python2.0There is a problem that the Chinese path cannot be directly read,Need to write another function.python3.0在2018can not be read directly.

when using it now,發現python3.0It can directly read the Chinese path.

You need to bring your own or create a fewtxt文件,It is best to write some data in it(姓名,手機號,住址)

 

要求

Best when writing code,Set some requirements yourself,Clarify the purpose.

  1. All corresponding files of the corresponding directory path need to be read
  2. Read out each correspondence line by linetxt文件的記錄
  3. Use regex to get the phone number for each line
  4. Store the mobile number toexcel中

思路

        1)讀取文件

        2)讀取數據

        3)數據整理

        4)正則表達式匹配

        5)數據去重

        6)Data export and save

代碼

import glob
import re
import xlwt
filearray=[]
data=[]
phone=[]
filelocation=glob.glob(r'Classroom training/*.txt')
print(filelocation)
for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data)
combine_data=sum(data,[])
print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))
#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')

運行結果

會生成一個excel文件

 

 

分析

import glob
import re
import xlwt
globeUsed to locate files,re正則表達式,xlwt用於excel

  1)讀取文件

filelocation=glob.glob(r'Classroom training/*.txt')
指定目錄下的所有txt文件

 2)讀取數據

for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data) 
將路徑下的txt文件循環讀取,Read files sequentially by sequence number
Open the file corresponding to each loop
將每一次循環的txtThe data of the file is read line by line
使用append()method to add each row of data todata列表中
輸出一下,Several will be seentxtfile data in the same list in the form of word columns

 3)數據整理

combine_data=sum(data,[])
 Lists are combined into a single list

4)Regular expression matching plus data deduplication

print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))

set()函數:無序去重,創建一個無序不重復元素集

 6)Data export and save

#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')

Workbook('encoding=utf-8'):Set the encoding of the workbook

add_sheet('sheet1',cell_overwrite_ok=True):創建對應的工作表

write(x,y,z):參數對應行、列、值


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved