M4 fast scalar - markov2/perl5-Mail-Box GitHub Wiki
Mail::Box::FastScalar
Reading
MailBox started-off using IO::Scalar, which got later replaced by its own Mail::Box::FastScalar. These objects wrap a string (which contains a whole message) with a file-handle interface. This way, the same code can be used to parse incoming emails from different source, one of which is real files.
The three options to create a wrapper are
- IO::Scalar, part of the IO-Stringy, available since 1997
- Mail::Box::FastScalar, part of Mail-Message since somewhere 2005
Read in 0.288s, 62 MB
IO::Scalar read lines in 57.271s, 8597500 lines
Mail::Box::FastScalar read lines in 4.356s, 8597500 lines
The own implementation (which is a full file file compatible wrapper: reading, writing, seeks, etc) is 13.5x faster.
Experiment with split
FastScalar keeps the input as single scalar and then uses index() for find line endings. The lines are extracted with substr(). Can we speed this up with a split() on the input instead. Besides: we do no need a real (expensively tied) file-handle in MailBox, only an object which provides the same methods.
Measurement of user and sys times, average of 10 runs:
FastScalar: user=4.36s sys=0.02s
FastLines: user=3.16s sys=0.06s 38% more throughput
Conclusion: we can gain (a bit of) performance using split.