Split - ram-jayapalan/filesplit GitHub Wiki

Create an instance

from filesplit.split import Split

split = Split(inputfile: str, outputdir: str)

inputfile (str, Required) - Path to the original file.

outputdir (str, Required) - Output directory path to write the file splits.

With the instance created, the following methods can be used on the instance

bysize (size: int, newline: Optional[bool] = False, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by size.

Args:

size (int, Required): Max size in bytes that is allowed in each split.

newline (bool, Optional): Setting this to True will not produce any any incomplete lines in each split. Defaults to False.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

bylinecount(self, linecount: int, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by line count.

Args:

linecount (int, Required): Max lines that is allowed in each split.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

The file splits are generated in this fashion [original_filename]_1.ext, [original_filename]_2.ext, .., [original_filename]_n.ext.

A manifest file is also created in the output directory to keep track of the file splits. This manifest file is required for merge operation.

Moreover,
  • The delimiter for the generated splits can be changed by setting splitdelimiter property like split.splitdelimiter='$'. Default is _ (underscore).
  • The manifest file name for the generated splits can be changed by setting manfilename property like split.manfilename='man'. Default is manifest.
  • To forcefully and safely terminate the process set the property terminate to True while the process is running.
⚠️ **GitHub.com Fallback** ⚠️