predicates.py - haltosan/RA-python-tools GitHub Wiki

Overview

The purpose of this is to be used with file_analysis.py. This a collection of predicate functions to help identify text.

These are the global variables at the top:

variable description
SHORT_LEN this indicates what 'short' is for short() and long()
MOSTLY_CAPS_THRESHOLD this is the min percentage of capital letters that mostlyCaps() requires to return true
REGULAR_EXPRESSION a default regex used in the regex() function. If the inputted text matches the regular expression, the function will return true

What predicate functions are

Predicate functions are just any function that returns true or false, and take a single string argument. These are used to flag strings for removal in cleaning functions. They're all named fairly clearly and should simple enough to understand without extensive documentation.

When writing a predicate function, make sure to follow these two guidelines:

  • takes only 1 string argument
  • returns true or false only

In other words, it should match the following definition: def f(arg: str) -> bool:

If a function needs more that 1 parameter, make a global variable like in the cases of short(), long(), mostlyCaps(), etc.. You can also have a default parameter and/or do partial function application when calling it. For example:

def a(text:str, size:int = 3) -> bool:
  ...

cleanFile(file, lambda txt: a(txt, size=5))

List of functions

  • lower
  • upper
  • blank
  • alpha
  • notAlpha
  • alphaSpace
  • alphaSpaceChar
  • upperWord
  • hasPage
  • name
  • nameChar
  • address
  • addressChar
  • short
  • long
  • mostlyCaps
  • indented
  • regex
  • printableChar
  • printableWord
  • space
  • repeatedLetters
  • number