RegexSearch - robmcmullen/peppy GitHub Wiki
Only very simple regular expression matching is possible in the StyledTextCtrl; for full python-style regular expressions, the text must be matched using python strings outside of the STC. For example, from the initial proof-of-concept implementation, the whole STC can be converted to a python string and the regex can be matched against that:
#!python
text = self.stc.GetTextRange(start, self.stc.GetTextLength())
index = 0
match = self.regex.search(text, index)
if match:
pos = start + len(text[:match.start(0)].encode('utf-8'))
count = len(text[match.start(0):match.end(0)].encode('utf-8'))
self.highlightSelection(pos, count)
but this is slow as you have to convert the STC to a python string for every match attempt. With a very long file, this can cause noticeable slowdowns. A more efficient means is needed.
A shadow copy of the text in the stc can be created. You can't do this in doFindNext by itself because it doesn't know when to get a new copy or when to use the existing copy. You can guarantee, however, that every time setFlags is called, the user has made some change to either the text or the search string. We don't really care if the search string is changed, but we do want to hook into when the text has changed.
The shadow copy keeps a pristine version of the text before any replaces have been made. Once replacements have been made, however, the length of the text contained in the stc will change with respect to the shadow copy. So, we need to maintain a relative index that shows the current position in the shadow as it relates to the current position in the stc. Here's the shadow index compared to the stc index after replacing the first "yy" with "ZZZZ":
shadow: xxxxxxxxxxxyyxxxxxxyyxxxxyyxxxxyyxxxxyyxxxxyyxxxx shadow index: ^ stc: xxxxxxxxxxxZZZZxxxxxxyyxxxxyyxxxxyyxxxxyyxxxxyyxxxx stc index: ^after several replacements:
shadow: xxxxxxxxxxxyyxxxxxxyyxxxxyyxxxxyyxxxxyyxxxxyyxxxx shadow index: ^ stc: xxxxxxxxxxxZZZZxxxxxxZZZZxxxxZZZZxxxxZZZZxxxxyyxxxxyyxxxx stc index: ^This is additionally complicated by the fact that unicode characters can be longer than one byte, and while python strings consider each character occupying one position, the STC counts positions by byte.
So, after every replacement, the stc length will change, and the equivalent positions in both shadow and stc must be updated.
Here's the relevant code from FindRegexService in find_replace.py
#!python
def verifyShadow(self, start=-1, incremental=False):
if self.shadow is None or start >= 0:
if start < 0:
sel = self.stc.GetSelection()
if incremental:
start = min(sel)
else:
start = max(sel)
self.shadow = self.stc.GetTextRange(start, self.stc.GetTextLength())
self.shadow_equiv_pos = 0
self.stc_equiv_start = start
self.stc_equiv_pos = start
def doFindNext(self, start=-1, incremental=False):
if not self.settings.find:
return None, None
self.getFlags()
if self.regex is None:
return _("Incomplete regex"), None
self.verifyShadow(start, incremental)
match = self.regex.search(self.shadow, self.shadow_equiv_pos)
if match:
# Because unicode characters are stored as utf-8 in the stc and the
# positions in the stc correspond to the raw bytes, not the number
# of unicode characters, we have to find out the offset to the
# unicode chars in terms of raw bytes.
pos = self.stc_equiv_pos + len(self.shadow[self.shadow_equiv_pos:match.start(0)].encode('utf-8'))
count = len(self.shadow[match.start(0):match.end(0)].encode('utf-8'))
self.stc_equiv_start = pos
self.stc_equiv_pos = pos + count
self.shadow_equiv_pos = match.end(0)
dprint("match=%s shadow: (%d-%d) equiv=%d, stc: (%d-%d) equiv=%d" % (match.group(0), match.start(0), match.end(0), self.shadow_equiv_pos, pos, pos+count, self.stc_equiv_pos))
self.stc.SetSelection(self.stc_equiv_start, self.stc_equiv_pos)
else:
pos = -1
return pos, start
def doReplace(self):
self.verifyShadow()
# We assume that doFindNext has been called, setting up the equivalent
# start and end positions
replacing = self.stc.GetTextRange(self.stc_equiv_start, self.stc_equiv_pos)
try:
replacement = self.regex.sub(self.settings.replace, replacing)
except re.error, e:
raise ReplacementError("Regex error: %s" % e)
self.stc.SetTargetStart(self.stc_equiv_start)
self.stc.SetTargetEnd(self.stc_equiv_pos)
# The stc equivalent position must be adjusted for the difference in
# numbers of bytes, not numbers of characters.
self.stc_equiv_pos += len(replacement.encode('utf-8')) - len(replacing.encode('utf-8'))
self.stc.ReplaceTarget(replacement)
self.stc.SetSelection(self.stc_equiv_start, self.stc_equiv_pos)