Smart string usage - abelcheung/types-lxml GitHub Wiki
Smart string intro
Smart string is a private str
subclass documented in
return types
of XPath evaluation result. Directly quoting from lxml documentation:
XPath string results are 'smart' in that they provide a
getparent()
method that knows their origin:
- for attribute values,
result.getparent()
returns the Element that carries them. An example is//foo/@attribute
, where the parent would be a foo Element.- for the
text()
function (as in//text()
), it returns the Element that contains the text or tail that was returned.
The actual class is named
_ElementUnicodeResult
in source code. Although for Python 2.x and PyPy this str
subclass
represents some other concrete classes, we can forget them as far as
type checking is concerned.
Important notice
Following are breaking changes since 2023.2.11
.
Class rename
Historically the class is named SmartStr
in annotation
package, which is more user friendly but need to be
imported manually for typing. Being underused, it is
decided to break compatibility and revert to concrete
class name (_ElementUnicodeResult
) instead.
Class specialization
Because getparent()
method needs to known original
element type, smart string is modified as a Generic
class,
containing the element type as subscript, as in
_ElementUnicodeResult[_Element]
.
Version | Usage |
---|---|
2023.02.11 or earlier |
SmartStr |
Afterwards | _ElementUnicodeResult[_Element] |
How to use
There are 2 occasions where this class is primarily useful. See further down for examples of both types of usage.
XPath
selection resultHtmlElement.text_content()
result (which usesXPath
internally)
However this class is almost never used directly in type annotation,
since XPath result is too versatile to be annotated (str
, float
,
bool
, list of them, as well as list of _Element
and namespace tuples).
Users are therefore expected to narrow down XPath selection result themselves. First example code below shows how to handle smart strings in selection result.
Examples
XPath selection result
from lxml.etree import parse, _ElementUnicodeResult, _Element
from typing import TypeIs # (or from typing_extensions)
def is_smart_str(s: str) -> TypeIs[_ElementUnicodeResult[_Element]]:
return hasattr(s, 'getparent')
tree = parse(<...some html file...>)
for result in tree.xpath('//div/span/text()'):
if is_smart_str(result):
# At this point,
# result -> _ElementUnicodeResult[_Element],
# parent -> Optional[_Element]
parent = result.getparent()
if parent is not None:
print(parent.tag) # 'span'