程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

The method of calculating the number of lines in large files with Python and its performance comparison

編輯:Python

How to use Python Quickly and efficiently count the total number of large files , Here are some implementation methods and performance comparisons .

1.readline Read all the lines

Use readlines Method to read all lines :

def readline_count(file_name):
return len(open(file_name).readlines())

2. Read each line in turn

Read the contents of each line of the file in turn and count :

def simple_count(file_name):
lines = 0
for _ in open(file_name):
lines += 1
return lines

3.sum Count

Use sum Function count :

def sum_count(file_name):
return sum(1 for _ in open(file_name))

4.enumerate Enumeration count :

def enumerate_count(file_name):
with open(file_name) as f:
for count, _ in enumerate(f, 1):
pass
return count

5.buff count

Fixed size for each read , Then count the number of rows :

def buff_count(file_name):
with open(file_name, 'rb') as f:
count = 0
buf_size = 1024 * 1024
buf = f.read(buf_size)
while buf:
count += buf.count(b'\n')
buf = f.read(buf_size)
return count

6.wc count

Call to use wc Command calculation line :

def wc_count(file_name):
import subprocess
out = subprocess.getoutput("wc -l %s" % file_name)
return int(out.split()[0])

7.partial count

stay buff_count Based on the introduction partial:

def partial_count(file_name):
from functools import partial
buffer = 1024 * 1024
with open(file_name) as f:
return sum(x.count('\n') for x in iter(partial(f.read, buffer), ''))

8.iter count

stay buff_count Based on the introduction itertools modular :

''' No one answers the problems encountered in learning ? Xiaobian created a Python Exchange of learning QQ Group :153708845 Looking for small partners who share the same aspiration , Help each other , There are also good video tutorials and PDF e-book ! '''
def iter_count(file_name):
from itertools import (takewhile, repeat)
buffer = 1024 * 1024
with open(file_name) as f:
buf_gen = takewhile(lambda x: x, (f.read(buffer) for _ in repeat(None)))
return sum(buf.count('\n') for buf in buf_gen)

The following is on my computer 4c8g python3.6 Under the environment of , The test respectively 100m、500m、1g、10g The running time of large and small files , Unit second :


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved