Feature FileOps Advanced - Gadreel/divconq GitHub Wiki

Selecting Files (FUTURE)

Can be simple like this - one or more files directly linked:

<SelectFiles Name="MyFiles" In="$FileStore">
	<File Path="relative to In" />
	<File Path="relative to In" />
</SelectFiles>

Or can be more complex:

File selection is relative to a folder or store. For example, assume that a Zip file has a folder named lists in it. Within that folder are files named like this List_2014-09-03.txt. We want to select all the List_YYYY-MM-DD.txt files for the month of August and September. We can do this with a Pattern (RegEx):

<ZipFileStore Name="ZipDeposits" RootPath="/deposits.zip" />

<Folder Name="MyLists" Path="/lists" In="$ZipDeposits" />

<SelectFiles Name="MySeptLists" In="$MyLists">
	<NameFilter Pattern="List_2014-08-\d{2}.txt" />
	<NameFilter Pattern="List_2014-09-\d{2}.txt" />
</SelectFiles>

<LocalFolder Name="FolderX" Path="/x"  />

<FileOps>
	<Copy Source="$MySeptLists" Dest="$FolderX"  />
</FileOps>

Results in the August and September list files being copied to /x, for example:

/x/List_2014-09-03.txt

There are filters for other properties too:

<SelectFiles Name="MySeptLists" In="$MyLists">
	<NameFilter Pattern="List_2014-08-\d{2}.txt" />
	<SizeFilter GreaterThanOrEqual="4096" LessThan="65536" />
</SelectFiles>

As well as ModifiedFilter and PathFilter. SizeFilter and ModifiedFilter support the following: Equal, LessThan, GreaterThan, LessThanOrEqual, GreaterThanOrEqual and Not. Path supports Pattern (relative to FileStore) and Not.

You may also use wildcard style filters with RegEx because \w or [\w\s] matches just about everything:

  • for * do \w* if optional or \w+ if at least one desired
  • for broader * do [\w\s]* if optional or [\w\s]+ if at least one desired
  • for ? do \w, for more than 1 char do \w{n} where n is the number desired
  • for broader ? do [\w\s], for more than 1 char do [\w\s]{n} where n is the number desired
  • in this example below, since we are expecting digits, \d would suffice - \w is a more general matcher
<SelectFiles>
	<NameFilter Pattern="List_2014-08-\w+.txt" />
	<NameFilter Pattern="List_2014-09-\w{2}.txt" />
</SelectFiles>

Sorting

Sometimes the order in which files are selected matters, not often but sometimes. When it does matter use the Sort or SortDesc (descending) attribute:

<SelectFiles Name="MySeptLists" In="$MyLists" Sort="Name|Path|Modified|Size">
	<NameFilter Pattern="List_2014-08-\d{2}.txt" />
</SelectFiles>

Or sort based on a pattern match. For example, say that the List files above do not force a zero padding for month or day. If so then names might be List_2014-8-5.txt or List_2014-08-05.txt. We want to sort on the date so date match is \d{1,2} using parens (\d{1,2}) to indicate the match. But now 5 will come after 12 because all pattern matches are strings. So we need to match and then indicate the data type like this:

<SelectFiles Name="MySeptLists" In="$MyLists" Sort="Match" SortAs="Number">
	<NameFilter Pattern="List_2014-0?8-(\d{1,2}).txt" />
</SelectFiles>

Or sort based on a constructed value:

<SelectFiles Name="MySeptLists" In="$MyLists" Sort="Value" SortValue="{$var1}_%Modified%_{$var2}">
	<NameFilter Pattern="List_2014-08-\d{2}.txt" />
</SelectFiles>

%Name% %Path% - relative to Select In %ParentName% %ParentPath% - relative to Select In %Modified% %Modified:DateTime-Format% %Size% %Mime%

TODO explain

Tar Files

Tar files are unsuitable for use as File Stores because they are not indexed. But they are still useful for many things. Tar operations are Ops, there are methods to deal with tarballs

Tar

Select a folder and then store contents in a Tar

<ZipFileStore Name="ZipDeposits" RootPath="/deposits.zip" />
<Folder Name="MyLists" Path="/lists" In="$ZipDeposits" />

<LocalFile Name="MyListsTar" Path="/x/Lists.tar"  />

<FileOps>
	<Tar Source="$MyLists" Dest="$MyListsTar"  />
</FileOps>

Select some files and then store them in a Tar

<Folder Name="MyLists" Path="/lists" In="$ZipDeposits" />

<SelectFiles Name="MyAugSeptLists" In="$MyLists">
	<NameFilter Pattern="List_2014-08-\d{2}.txt" />
	<NameFilter Pattern="List_2014-09-\d{2}.txt" />
</SelectFiles>

<LocalFile Name="MySeptTar" Path="/x/List_2014-09.tar"  />

<FileOps>
	<Tar Source="$MyAugSeptLists" Dest="$MySeptTar"  />
</FileOps>

Tar Attributes:

  • Source file or stream in
  • Dest file (ignored if Name)
  • Name stream out (see streaming)
  • NameHint the name to use for the collection of files, if the Dest is a folder

Untar

Select a Tar and expand into a folder

<LocalFile Name="MyListsTar" Path="/x/List_2014-09.tar"  />
<LocalFolder Name="MyLists" Path="/lists" />

<FileOps>
	<Untar Source="$MyListsTar" Dest="$MyLists"  />
</FileOps>

GZ Files

Gzip

<FileOps>
	<Gzip Source="[a single file]" Dest="[a single file/stream]"  />
</FileOps>

Ungzip

<FileOps>
	<Ungzip Source="[a single file]" Dest="[a single file/stream]"  />
</FileOps>

Wget (FUTURE)

Wget

<FileOps>
	<Wget Source="[a single url]" Dest="[a single file/stream]"  />
</FileOps>

Options:

  • Tries
  • Check (vs Get)
  • ConnectTimeout
  • DownloadTimeout
  • User
  • Password

Split and Join

Splitting and Joining are very similar to Tar operations.

Join

A join combines multiple files into a single file using append to add successive files.
The order of the files matters here because they are directly appended one after the other. We can use the Sort Select example:

<Folder Name="MyLists" Path="/lists" In="ZipDeposits" />

<SelectFiles Name="MySeptLists" In="$MyLists" Sort="Match" SortAs="Number">
	<NameFilter Pattern="List_2014-0?9-(\d{1,2}).txt" />
</SelectFiles>

<LocalFile Name="MySeptTxt" Path="/x/List_2014-09.txt"  />

<FileOps>
	<Join Source="$MySeptLists" Dest="$MySeptTxt"  />
</FileOps>

Split

Split takes a large file and makes smaller files from it:

<LocalFile Name="MyListsTar" Path="/x/Lists.tar"  />
<TempFolder Name="TempDest" />

<FileOps>
	<Split Source="$MyListsTar" Dest="$TempDest" 
		Size="512MB" Template="List_%seq%.tar" StartAt="1" />
</FileOps>

Takes /x/Lists.tar and splits it into 512MB chunks named like:

  • List_1.tar
  • List_2.tar
  • List_3.tar

Encryption

PGP Encryption is currently the only encryption supported.

Keyring

TODO - we can define a variable to point to a keyring, we may use more than one keyring in DivConq to keep keyring size manageable.

Encrypt

<FileOps>
	<PGPEncrypt Keyring="$TheirKeyring" Source="[a single file]" Armour="true|false"
		Dest="[a single file]" Recipient="[name in keyring]" />
</FileOps>

Decrypt (FUTURE)

<FileOps>
	<PGPDecrypt Keyring="$MyKeyring" Source="[a single file]" Dest="[a single file]" />
</FileOps>

Sign (FUTURE)

<FileOps>
	<PGPSign Keyring="$MyKeyring" Source="[a single file]" Armour="true|false"
		Dest="[a single file]" Signer="[name in keyring]" />
</FileOps>

PGPSign Attributes:

  • Source file or stream
  • Armour save in ASCII armour format instead of binary
  • Keyring which keyring to use
  • Signer who will sign (must be in keyring)
  • Dest (or see Name) save signature to a target file
  • Name (see streaming) save signature to a HeapFile object
  • StreamTo (see streaming) pass the stream untouched on to the next step

Verify (FUTURE)

If verify fails it sets the internal error dcScript flag.

<FileOps>
	<PGPVerify Keyring="$TheirKeyring" Source="[a single file]" SignatureSource="[a single file]" />
</FileOps>

Armor (FUTURE)

Enarmor

Turn binary to wrapped base64 with a title and possible headers:

<FileOps>
	<Enarmor Source="[a single file]" Dest="[a single file]" Title="PGP MESSAGE">
		<Header Name="Version" Value="GnuPG v1.4.6 (GNU/Linux)" />
	</Enarmor>
</FileOps>

Example output:

-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.6 (GNU/Linux)

hQIOA+9JbyriNorZEAf/UuCyC0T80XffXVkmewfrRSvtsYbNSGZFvSr+32jJT2fs
...
...
=YJ4D
-----END PGP MESSAGE-----

Dearmor

Turn wrapped base64 (with a title and possible headers) into binary:

<FileOps>
	<Dearmor Source="[a single file]" Dest="[a single file]" />
</FileOps>

Text Reader (FUTURE)

See streaming...

Create a line by line reading stream:

TODO

for all lines in source (be it a collection or single file)

<TextReader Name="TextRdr" Source="$DecryptedLists" />

filter lines by value in column 5 (awk style columns)

<TextReader Name="TextRdr" Source="$DecryptedLists" AwkRecord="[override record separator]" AwkField="[override field separator]">
	<AwkFilter Column="5" Equal="$someValue" />
</TextReader>

[TODO or column matches pattern]

filter lines by pattern

<TextReader Name="TextRdr" Source="$DecryptedLists">
	<RegExFilter Contains="copyright \d{4}" CaseInsensitive="True" />
</TextReader>
⚠️ **GitHub.com Fallback** ⚠️