predicates.py - haltosan/RA-python-tools GitHub Wiki
Overview
The purpose of this is to be used with file_analysis.py. This a collection of predicate functions to help identify text.
These are the global variables at the top:
variable | description |
---|---|
SHORT_LEN | this indicates what 'short' is for short() and long() |
MOSTLY_CAPS_THRESHOLD | this is the min percentage of capital letters that mostlyCaps() requires to return true |
REGULAR_EXPRESSION | a default regex used in the regex() function. If the inputted text matches the regular expression, the function will return true |
What predicate functions are
Predicate functions are just any function that returns true or false, and take a single string argument. These are used to flag strings for removal in cleaning functions. They're all named fairly clearly and should simple enough to understand without extensive documentation.
When writing a predicate function, make sure to follow these two guidelines:
- takes only 1 string argument
- returns true or false only
In other words, it should match the following definition: def f(arg: str) -> bool:
If a function needs more that 1 parameter, make a global variable like in the cases of short()
, long()
, mostlyCaps()
, etc.. You can also have a default parameter and/or do partial function application when calling it. For example:
def a(text:str, size:int = 3) -> bool:
...
cleanFile(file, lambda txt: a(txt, size=5))
List of functions
- lower
- upper
- blank
- alpha
- notAlpha
- alphaSpace
- alphaSpaceChar
- upperWord
- hasPage
- name
- nameChar
- address
- addressChar
- short
- long
- mostlyCaps
- indented
- regex
- printableChar
- printableWord
- space
- repeatedLetters
- number