Smart string usage - abelcheung/types-lxml GitHub Wiki

Smart string intro

Smart string is a private str subclass documented in return types of XPath evaluation result. Directly quoting from lxml documentation:

XPath string results are 'smart' in that they provide a getparent() method that knows their origin:

for attribute values, result.getparent() returns the Element that carries them. An example is //foo/@attribute, where the parent would be a foo Element.

for the text() function (as in //text()), it returns the Element that contains the text or tail that was returned.

The actual class is named _ElementUnicodeResult in source code. Although for Python 2.x and PyPy this str subclass represents some other concrete classes, we can forget them as far as type checking is concerned.

Important notice

Following are breaking changes since 2023.2.11.

Class rename

Historically the class is named SmartStr in annotation package, which is more user friendly but need to be imported manually for typing. Being underused, it is decided to break compatibility and revert to concrete class name (_ElementUnicodeResult) instead.

Class specialization

Because getparent() method needs to known original element type, smart string is modified as a Generic class, containing the element type as subscript, as in _ElementUnicodeResult[_Element].

Version	Usage
`2023.02.11` or earlier	`SmartStr`
Afterwards	`_ElementUnicodeResult[_Element]`

How to use

There are 2 occasions where this class is primarily useful. See further down for examples of both types of usage.

XPath selection result
HtmlElement.text_content() result (which uses XPath internally)

However this class is almost never used directly in type annotation, since XPath result is too versatile to be annotated (str, float, bool, list of them, as well as list of _Element and namespace tuples).

Users are therefore expected to narrow down XPath selection result themselves. First example code below shows how to handle smart strings in selection result.

Examples

XPath selection result

from lxml.etree import parse, _ElementUnicodeResult, _Element
from typing import TypeIs  # (or from typing_extensions)

def is_smart_str(s: str) -> TypeIs[_ElementUnicodeResult[_Element]]:
    return hasattr(s, 'getparent')

tree = parse(<...some html file...>)

for result in tree.xpath('//div/span/text()'):
    if is_smart_str(result):
        # At this point,
        # result -> _ElementUnicodeResult[_Element],
        # parent -> Optional[_Element]
        parent = result.getparent()
        if parent is not None:
            print(parent.tag)  # 'span'