您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

The method of calculating the number of lines in large files with Python and its performance comparison


How to use Python Quickly and efficiently count the total number of large files , Here are some implementation methods and performance comparisons .

1.readline Read all the lines

Use readlines Method to read all lines :

def readline_count(file_name):
return len(open(file_name).readlines())

2. Read each line in turn

Read the contents of each line of the file in turn and count :

def simple_count(file_name):
lines = 0
for _ in open(file_name):
lines += 1
return lines

3.sum Count

Use sum Function count :

def sum_count(file_name):
return sum(1 for _ in open(file_name))

4.enumerate Enumeration count :

def enumerate_count(file_name):
with open(file_name) as f:
for count, _ in enumerate(f, 1):
return count

5.buff count

Fixed size for each read , Then count the number of rows :

def buff_count(file_name):
with open(file_name, 'rb') as f:
count = 0
buf_size = 1024 * 1024
buf = f.read(buf_size)
while buf:
count += buf.count(b'\n')
buf = f.read(buf_size)
return count

6.wc count

Call to use wc Command calculation line :

def wc_count(file_name):
import subprocess
out = subprocess.getoutput("wc -l %s" % file_name)
return int(out.split()[0])

7.partial count

stay buff_count Based on the introduction partial:

def partial_count(file_name):
from functools import partial
buffer = 1024 * 1024
with open(file_name) as f:
return sum(x.count('\n') for x in iter(partial(f.read, buffer), ''))

8.iter count

stay buff_count Based on the introduction itertools modular :

''' No one answers the problems encountered in learning ? Xiaobian created a Python Exchange of learning QQ Group :153708845 Looking for small partners who share the same aspiration , Help each other , There are also good video tutorials and PDF e-book ! '''
def iter_count(file_name):
from itertools import (takewhile, repeat)
buffer = 1024 * 1024
with open(file_name) as f:
buf_gen = takewhile(lambda x: x, (f.read(buffer) for _ in repeat(None)))
return sum(buf.count('\n') for buf in buf_gen)

The following is on my computer 4c8g python3.6 Under the environment of , The test respectively 100m、500m、1g、10g The running time of large and small files , Unit second :

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved