Polonius Reader - rail5/polonius GitHub Wiki

% polonius-reader(1) Version 1.0 | Manual for the Polonius Reader

NAME

polonius-reader - outputs a selected portion of the contents of a file

SYNOPSIS

polonius-reader ./file

polonius-reader ./file --start 0 --length 10

polonius-reader ./file --search "hello world" --output-pos

OPTIONS

OVERVIEW

-i / --input

Specify file to read

The file can also be specified without the '-i' option

-s / --start

Specify start position (character number in the file)

Defaults to 0 if not specified

-l / --length

Specify how many bytes to read

Defaults to full length of the file if not specified

-b / --block-size

Specify the maximum amount of data we can load from the file into memory at any given time

Defaults to 1 kilobyte if not specified

-f / --find / --search

Search for a string in the file

Returns nothing (blank) if no matches were found

If a start position is given with -s / --start, the program will only search for matches after that start position

If a length is given with -l / --length, the program will only search for matches within that range from the start position

-p / --output-pos

Output the start and end positions, rather than the actual text

Will output in the (space-delimited) format: start end

For example: 0 5

If used with searches, this will output the start and end position of the found match. If used outside of searches, this will output the start and end position of the file read

-c / --special-chars

Parse escaped character sequences in search queries (`\n`, `\t`, `\\`, and `\x00` through `\xFF`)

-e / --regex

Interpret search query as a regular expression

-V / --version

Print version number

-h / --help

Display help message

START POSITION

Specifying the "start position" tells Polonius to skip over the beginning of the file and only start reading at the position you specify (specified in characters)

The start position can be specified with the -s option. For example: polonius-reader ./file -s 50

If the start position is not specified, it defaults to position 0 (the beginning of the file)

READ LENGTH

Specifying the "read length" tells Polonius to only read X number of characters from the start position

The read length can be specified with the -l option. For example: polonius-reader ./file -s 50 -l 10

If the read length is not specified, Polonius will read until the end of the file

SEARCH

Polonius can search a file for a specific string

A search can be done with the -f option. For example: polonius-reader ./file -f "hello world"

If this is combined with Start Positions or Read Lengths, the search will happen only within those boundaries. For example, polonius-reader ./file -f "hello" -s 500 -l 200 will search for the string "hello" only between character #500 and character #700

This can also be combined with the "Output Positions" (-p) option. If the "Output Positions" flag is set with -p, Polonius will tell you where it found a match, in the space-delimited format startposition endposition (for example: 510 515). By default, without the "Output Positions" flag, Polonius will output the match itself.

The search function will output either:

  1. The found match

  2. The position of the found match (if -p is specified)

  3. Nothing (blank) if no match was found

This search function is also fast. Here is an example that was run on my laptop:

  • A 2.5GB file was created using randomtext

  • The string "hello world" was inserted approximately 2.4GB in (right near the end of the file)

  • The following commands were run through the Bash time utility:

    1. polonius-reader ./big-file -f "hello world"

    2. grep -o "hello world" ./big-file

  • Here was the result of the Polonius command:

hello world
real	0m1.874s
user	0m0.862s
sys	0m0.980s
  • Here was the result of the grep command:
grep: memory exhausted

real	0m9.696s
user	0m3.112s
sys	0m4.500s

REGEX SEARCH

A normal search can be made into a regex search by passing the -e option. For example: polonius-reader -f "[a-z]+[0-9]{2}" -e

All of the above about normal searches applies also to regex searches. Regex searches, however, are significantly slower than normal searches.

Polonius is not capable of finding regex matches which are larger than the block size (default 10KB if unspecified).

BLOCK SIZE

Specifying the "Block Size" tells Polonius how much data from the file we're willing to load into memory at once.

The default value (if unspecified) is 10 kilobytes

The block size can be specified with the -b option, in the formats:

1. `-b 15` (This would set the block size to 15 bytes)

2. `-b 16K` (This would set the block size to 16 kilobytes)

3. `-b 17M` (This would set the block size to 17 megabytes)

And of course, the example numbers '15', '16', and '17' can be swapped for any arbitrary number

This option is common to both polonius-reader and polonius-editor

OUTPUT POSITIONS

Setting the "Output Positions" flag tells Polonius to not output the actual content of the file, but instead to tell you the start and end positions of the content that it would've outputted.

The flag can be set with the -p option. Polonius will output the positions in the space-delimited format startposition endposition, for example: 10 15

This is mainly useful in two scenarios:

1. Searches

  When searching for a string, often we don't just want to know *whether* a match was found, but also *where* it was found

2. Determining the length of a file

  If polonius-reader is run with **no extra arguments given**, it will output the entire contents of a file.

  In this case, if you set the *-p* flag, it will output something like `0 700`, where *700* is the number of characters in the file

SPECIAL CHARACTERS

Setting the "special characters" flag tells Polonius to parse escaped character sequences in search queries. Polonius will parse \n, \t, \\, and \x00 through \xFF.

The special characters flag can be set with the -c option.