ReplaceMetadata - JimmXinu/FanFicFare GitHub Wiki
## Use regular expressions to find and replace (or remove) metadata.
## For example, you could change Sci-Fi=>SF, remove *-Centered tags,
## etc. See http://docs.python.org/library/re.html (look for re.sub)
## for regexp details.
## Make sure to keep at least one space at the start of each line and
## to escape % to %%, if used.
## Two, three or five part lines. Two part effect everything.
## Three part effect only those key(s) lists.
## *Five* part lines. Effect only when trailing conditional key=>regexp matches
## metakey[,metakey]=>pattern=>replacement[&&conditionalkey=>regexp]
## Note that if metakey == conditionalkey the conditional is ignored.
## You can use \s in the replacement to add explicit spaces. (The config parser
## tends to discard trailing spaces.)
## replace_metadata <entry>_LIST options: FanFicFare replace_metadata lines
## operate on individual list items for list entries. But if you
## want to do a replacement on the joined string for the whole list,
## you can by using <entry>_LIST. Example, if you added
## calibre_author: calibre_author_LIST=>^(.{,100}).*$=>\1
replace_metadata:
genre,category=>Sci-Fi=>SF
Puella Magi Madoka Magica.*=>Madoka
Comedy=>Humor
Crossover: (.*)=>\1
title=>(.*)Great(.*)=>\1Moderate\2
.*-Centered=>
characters=>Sam W\.=>Sam Witwicky&&category=>Transformers
characters=>Sam W\.=>Sam Winchester&&category=>Supernatural
replace_metadata
lines can take one of three different forms.
The first and simplest is: pattern=>replacement
All metadata items that matches regexp pattern
will be replaced
using the standard Python regexp library like so: value = re.sub(pattern,replacement,value)
So, for example, if you are offended by the word Furbie and never want to see it anywhere in your metadata, you do:
Furbie=>F*rbie
The second form is: metakey[,metakey]=>pattern=>replacement
The only difference is that you are limiting which metadata items the
line will apply do by including one or more metakeys
.
Metakey is one of the metadata items defined by FanFicFare
(category
, genre
, characters
, etc) or added by
extra_valid_entries.
So, for example, if you what Humor converted to Comedy in genre
and category
:
genre,category=>Humor=>Comedy
The third form is:
metakey[,metakey]=>pattern=>replacement&&conditionalkey=>condregexp
This essentially says, "For metadata items metakey
, if metadata item
conditionalkey
matches condregexp
, replace pattern
with
replacement
.
Now there are three conditions that must be true before the replacement is done.
- It must be a metadata item
metakey
, - the value must match
pattern
and - the value of metadata item
conditionalkey
must matchcondregexp
.
characters=>Sam W\.=>Sam Witwicky&&category=>Transformers
characters=>Sam W\.=>Sam Winchester&&category=>Supernatural
Starting 2019Jan, replace_metadata
conditionals can use ==
, =!
,
!=
and !~
instead of =>
(which is equivalent to =~
).
Also as of 2019Jan, conditional checks are done against each item in
metadata lists. Before that, they checked the string made from the
list. So &&category==Transformers
will now match a story with
category
= The Lord of the Rings
, Transformers
whereas before it
would not. If you want to be able to still use the whole string
method, you can use <entry>_LIST
, eg, category_LIST
to get The Lord of the Rings, Transformers
as a single string.
You can use conditionals_use_lists:false
to get the old behavior.
Replacements are applied in order--so plan accordingly.
Another tip: If you want to be able to test your patterns without hitting your favorite stories again and again, you can use the fake test site and the extracharacters, extracategories, etc parameters. test1.com URLs will generate stories, but not go out to the network.
URL: http://test1.com?sid=12345
[test1.com]
extracharacters:Reginald Smythe-Smythe,Mokona,Harry P.
replace_metadata:
characters=>Harry P\.=>Harry Potter
Do you understand the regular expression you're trying to use? This a good regex quick start guide.
Using a regex tester such as regex101 can also help with figuring out how to match specific patterns. Note that FanFicFare uses Python as its regexp engine -- while the basics are generally the same, other languages have some subtle differences that may affect things.
A couple common 'gotcha's:
If regex special characters appear in the string you're trying to
match, you need to escape them by putting a \
before them. Those
characters are: [\^$.|?*+()
So for example, if you want to match 'Harry P.' but not 'Harry Pa',
your regex should be Harry P\.
, not just Harry P.
because the .
is special and matches any character.
Another is '&' (and less commonly '<' and '>'). Because FanFicFare
keeps things internally as (X)HTML valid strings, while it looks
like that string is just 'This & That', to match it you need the
pattern This & That
.
Similarly, use <
and >
instead of '<' and '>'.
For the same reason (FanFicFare keeps things internally as (X)HTML valid strings),
when you want & you should substitute to &
as well or there may be subtle problems with your
files down the road. Example:
replace_metadata:
category=>Starsky and Hutch=>Starsky & Hutch
Regular expressions are case-sensitive. There are two ways around this. One is to put an [Aa] wherever you may expect varying capitalization:
## Matches 'mass effect big bang' and 'Mass Effect Big Bang'
collections=>[Mm]ass [Ee]ffect [Bb]ig [Bb]ang=>Mass Effect Big Bang
The other is to add (?i)
(for insensitive) to the beginning of your match. This is a regex flag that ignores casing.
## Matches 'mass effect big bang' and 'Mass Effect Big Bang' but also 'MASS EFFECT BIG BANG'
collections=>(?i)Mass Effect Big Bang=>Mass Effect Big Bang
Note that entries are always case-sensitive. Your replacement will fail if you use Category=>
instead of category=>
.
Another wrinkle is that you can split one list item into multiple list entries by using \,
in the replacement string. Or add new items depending on what's already there. Example:
replace_metadata:
category=>Bitextual=>M/M\,F/M
If category previously contained ['A', 'Bitextual', 'Z']
it now contains ['A', 'M/M', 'F/M', 'Z']
. (Remember lists will usually be de-duped and alphabetically ordered automatically.)
Also note that each split item has the replacements run on it, too. Example:
replace_metadata:
category=>Bitextual=>M/M\,F/M
category=>^M/M$=>Gay
category=>^F/M$=>Hetro
category ['A', 'Bitextual', 'Z']
now contains ['A', 'Gay', 'Hetro', 'Z']
.