Vulnerabilities - usnistgov/xslt-blender GitHub Wiki

This discussion addresses both vulnerabilities and presumed or supposed security vulnerabilities (considered broadly as threats to Confidentiality, Integrity or Accessibility) related to XSLT Blender, or to XML- and XSLT-based technologies in general.

If you know of a vulnerability in this technology stack that should be discussed here but isn't -- or that is discussed inadequately or incorrectly -- isn't discussed here, please let us know.

Vulnerabilities and context

While vulnerabilities in general may be a "real thing", it also proves difficult to define, describe and sometimes even characterize vulnerabilities in the abstract. All vulnerabilities must be understood in context.

This is simply illustrated by a hypothetical vulnerability: you have given out too many copies of your house keys, and now too many friends and relations have them and you don't know who. But you also don't lock your house. The "missing keys" problem can be mitigated by replacing your locks and not sharing copies of the new keys. But if you still don't lock your house, you are not actually reducing your risk of people entering uninvited.

XSLT-based systems are similar. Defending the inside is more difficult if the perimeter is not defended -- and unnecessary if nothing requires defending. XSLT-based systems, in both the transformations, and the applications that orchestrate them, are most easily defended by not doing things that get you into trouble. If you avoid doing the bad things, the system itself is designed to provide a high measure of safety, by default. In this respect, XSLT 1.0 is kind of the child safety seat of Internet technology. Use it properly and you won't do things by accident that you really shouldn't be doing. This applies not only to data security but indeed, often, to efficiency and maintainability as well. In 2022, leaving side the self-fulfilling perspective of not being able to find talent, for anyone to regard XML and XSLT as especially "risky" from a data security or indeed any point of view, is both paradoxical and regrettable, perhaps reflecting scary reputations more than real experience, when considered in view of the actual problems caused by bad software -- but not by any particular technology stack.

This is the flip side of the fact that as a Turing-complete language, XSLT may actually be risky to deploy, depending on the purposes, uses and applications of particular data transformations task being performed. Running XSLT to generate page displays dynamically offers no opportunity to open holes into Java runtimes.

To reduce missing a forest for the trees, an assessment should go from the outside in (as discussed: cf Assessment), starting with the user with the data and the problem, looking at the host system (the browser and platform on which it runs) and means of delivery (software distribution, whether served or cached, etc.) and only then considering XSLT Blender or (finally) the particular application. Where there are problems, then, the actual symptoms and effects of design or implementation flaws can be examined, to be understood in the context of both risk likelihood and severity, including potential downside costs in the event of failure.

XSLT Blender vulnerabilities

Operational context of XSLT Blender applications

Two modes: (1) experimental/evaluative/analytical vs (2) actual use case

Use case (1) includes reverse engineers but also system assessors. See Assessment. If you are in this category, by definition, any bugs or vulnerabilities you can find constitute information, not problems. Please report any findings to the Issues board. Success!

Use case (2) may usually define a smaller subset of users, although for some XSLT Blender applications, it could be the majority. Assessment focus should arguably be on these users since to the extent that vulnerabilities are realized (actually exploited), these are the people and systems likely to be adversely affected.

A simple test for which of these categories applies is to ask:

Does it matter if the browser crashes when I'm using XSLT Blender?

If it doesn't (or doesn't much), then potential bugs or design flaws in the applications do not constitute real vulnerabilities for you, since a browser crash or hang is the worst thing that can happen in this architecture.

XSLT Blender Typescript/Javascript

First, all this code is open for assessment. If you have doubts about what is running in the page, inspect it. Your browser's developer features, including tools for inspecting the DOM at runtime, are especially useful.

Second, it is limited in capability by design:

parsing and acquiring data (subject to CORS requirements) asynchronously as appropriate
configuring and running XSLT engines via the DOM API
splicing results back into the runtime DOM

Consequently the Typescript and (transpiled) Javascript libraries are small and easily audited.

Applications also commonly have a little of their own Javascript to support user interaction, which can be observed and tested on the page.

Strengths of declarative, layered architecture

Since an XSLT processor's job -- the requirement it addresses -- is to produce for a given input a result that is deterministic and conformant to a known rule set (i.e. the language specification) -- it is isolated from any system interaction. Often, an XSLT application has only two points of exposure for any given runtime additional to the XSLT itself, namely a single nominal source and a single nominal result, whose handling is managed by a single calling application (in the case of XSLT Blender, the page Javascript). This isolation makes it both more robust and more traceable. (See the Vulnerabilities page.) Unlike Javascript, for example, XSLT cannot use http POST or PUT, and has no interface with browser events. Moreover, by default (that is, until modified to do otherwise), an XSLT transformation does nothing with data but 'dump' it, and the application does nothing with a dump but display it. For a display application this is a safe default. For applications that work on unsafe inputs (such as for example HTML or SVG inputs containing raw, executable script), it enforces sanitization by design.

How to read an XSLT stylesheet

Understanding the processing model.

XSLT follows a processing model that relies, by default, on a recursive, step-wise, depth-first top-down traversal of a nominal input as a tree of elements and attributes, representing (according to the rules and conventions of XML) a "document object" and typically presenting documentary information (with arbitrary mixed content) to a downstream application.

The result of this is a similar tree of information describing a "result document" that has been systematically derived from the source document (the tree representing input). In an XSLT Blender application both source and result trees can be handled as DOM Document objects within your browser's execution environment.

A typical application can work by accepting inputs in the form of a DOM, applying template rules to the elements represented in that DOM, and then binding the results of this transformation into the DOM maintained by the browser for display. The input DOM may commonly be produced by parsing an XML document (provided by the application or the user); yet inasmuch as DOM trees and tree fragments are also tractable for more generalized scripting, so also XSLT transformations can be put to work providing for arbitrary transformations (delivering well defined outputs for given inputs); and not all inputs need be produced by a parse (of some static XML), if they are or can be made into a DOM.

Order (mostly) reflects convention, not any order of operations - templates are applied together

Top-level variables, parameters, key definitions

Template matches

XSLT 1.0 soft spots and defenses

[list hypothetical or purported soft spots and mitigations or defenses]

Bad practices and trouble signs

(any others besides these)

Careless use of extension functions and libraries

Writing tags with DOE

disable-output-escaping to write literal code (ha)

XSLT that (ab)uses doe to perform 'tag writing' - demo how it doesn't work?

Indiscriminate node copying of unknown/untrusted inputs

xsl:copy matching script, etc.

What happens to the result? Is it executed?

XML soft spots and defenses

[list hypothetical or purported soft spots and mitigations or defenses]

"Ten thousand laughs" attack. Illustrate with example?

[Build tests in XSLT Blender of underlying processors wrt malicious inputs?]