M4 fast scalar - markov2/perl5-Mail-Box GitHub Wiki

Mail::Box::FastScalar

Reading

MailBox started-off using IO::Scalar, which got later replaced by its own Mail::Box::FastScalar. These objects wrap a string (which contains a whole message) with a file-handle interface. This way, the same code can be used to parse incoming emails from different source, one of which is real files.

The three options to create a wrapper are

  • IO::Scalar, part of the IO-Stringy, available since 1997
  • Mail::Box::FastScalar, part of Mail-Message since somewhere 2005
Read in 0.288s, 62 MB
IO::Scalar read lines in 57.271s, 8597500 lines
Mail::Box::FastScalar read lines in 4.356s, 8597500 lines

The own implementation (which is a full file file compatible wrapper: reading, writing, seeks, etc) is 13.5x faster.

Experiment with split

FastScalar keeps the input as single scalar and then uses index() for find line endings. The lines are extracted with substr(). Can we speed this up with a split() on the input instead. Besides: we do no need a real (expensively tied) file-handle in MailBox, only an object which provides the same methods.

Measurement of user and sys times, average of 10 runs:

FastScalar:    user=4.36s   sys=0.02s
FastLines:     user=3.16s   sys=0.06s   38% more throughput

Conclusion: we can gain (a bit of) performance using split.