I'm trying to fill a preallocated bytearray using the following code:
# preallocate a block array
dt = numpy.dtype('u8')
in_memory_blocks = numpy.zeros(_AVAIL_IN_MEMORY_BLOCKS, dt)
...
# write all the blocks out, flushing only as desired
blocks_per_flush_xrange = xrange(0, blocks_per_flush)
for _ in xrange(0, num_flushes):
for block_index in blocks_per_flush_xrange:
in_memory_blocks[block_index] = random.randint(0, _BLOCK_MAX)
print('flushing bytes stored in memory...')
# commented out for SO; exists in actual code
# removing this doesn't make an order-of-magnitude difference in time
# m.update(in_memory_blocks[:blocks_per_flush])
in_memory_blocks[:blocks_per_flush].tofile(f)
Some points:
num_flushes
is low, at around 4 - 10blocks_per_flush
is a large number, on the order of millionsin_memory_blocks
can be a fairly large buffer (I've set it as low as 1MB and as high as 100MB) but the timing is very consitent..._BLOCK_MAX
is the max for an 8-byte unsigned intm
is a hashilib.md5()
Generating 1MB using the above code takes ~1s; 500MB takes ~376s. By comparison, my simple C program that uses rand() can create a 500MB file in 8s.
How can I improve the performance in the above loop? I'm pretty sure I'm ignoring something obvious that's causing this massive difference in runtime.