Splitting the Logs With csplit

csplit is a POSIX command to split a file into sub-files using a line-delimiter:


csplit [options] <file-name> <pattern>

Patterns

The pattern can be a number (i.e. to split every so many lines) or a regular expression.

Example: Split the ape-tools log into separate tests

Each test starts with a recording of the contents of the Parameters named-tuple. It can be matched with the pattern:


INFO.*Running\ Parameters

The csplit command defaults to stopping after the first match so to trim off all the lines that come before the first match:


csplit apetools.log /INFO.*Running\ Parameters/

The forward-slashes enclose the pattern and tell csplit to save off all the text up to but not including the matched line. If you use percent-signs it will save all the text after the pattern:


csplit apetools.log %INFO.*Running\ Parameters%

In the example we're not doing anything hugely useful, as the tests are all in the same file. To tell csplit to break it up for more than the first match you use the {<count>} option. Since you want all the tests you can pass in a wild-card instead of an exact number:


csplit apetools.log /INFO.*Running\ Parameters/ {*}

File-Names

csplit names the output files it creates based on two parts -- a prefix and a suffix. The default prefix is xx and the default suffix is %02d (the strfmt format for an integer with at least 2 places). So the previous command would produce a set of files (xx00, xx01, etc.). If you want to make them a little more memorable you can change the prefix and suffix:


csplit apetools.log /INFO.*Running\ Parameters/ {*} --prefix apetest --suffix-format %03d.log

The output for this would be a series of files:


apetest000.log, apetest001.log, ...