Builders - xokomola/origami GitHub Wiki

Builders

Builders are the strong workhorse of Origami.

Mu documents introduced the Mu data structure. Builders are to Mu what XSLT is to XML.

Builder object have one or more of the following tasks:

  • Extract nodes from another document
  • Attach handlers to nodes
  • Handle conversion tasks such as converting XML names to strings and strings to XML names.

You can construct builders that do one or more of these tasks. Some Origami functions take a builder object as an argument and then use it to perform these tasks on a document.

Extracting nodes

To introduce builders let's first look at the first task: extracting nodes. This is something you need when building templates but it could also be useful, for example, when building a scraper. A scraper takes a document and extracts the parts that are interesting for us. We may want to process them further, display the content differently etc.

declare $ex:html :=
  <html>
    <head>
        <title>My little web shop</title>
        <link rel="stylesheet" type="text/css" href="main.css"></link>
    </head>
    <body>
        <div id="header">My shop</div>
        <div id="content">
            <table>
                <tr>
                    <td>
                        <div class="article">
                            <img src="article1.jpg"/>
                            <div class="desc">Article 1</div>
                            <div class="price">9.99</div>
                        </div>
                    </td>
                    <td>
                        <div class="article">
                            <img src="article2.jpg"/>
                            <div class="desc">Article 2</div>
                            <div class="price">10.99</div>
                        </div>
                    </td>
                </tr>
                <tr>
                    <td>
                        <div class="article">
                            <img src="article3.jpg"/>
                            <div class="desc">Article 3</div>
                            <div class="price">12.99</div>
                        </div>
                    </td>
                    <td>
                        <div class="article">
                            <img src="article4.jpg"/>
                            <div class="desc">Article 4</div>
                            <div class="price">4.99</div>
                        </div>
                    </td>
                </tr>
            </table>
        </div>
        <div id="footer">More info</div>
    </body>
  </html>;

When constructing a builder that extracts, or scrapes, data from this page you can use regular XPath selectors. The builder function takes a sequence of arrays of which the first item is an XPath selector. The rest of the array determines if nodes are copied or deleted.

o:builder(array(*)+)

Each array in the sequence of arrays defines the starting point for extracting nodes. Nested arrays will operate only within the context of the parent selector.

To extract the whole HTML document.

o:builder(['html'])

To remove the css link from the head use the empty sequence as the second item in the array with the link XPath selector.

o:builder(
    ['html',
        ['head',
            ['link[@rel="stylesheet"]', ()]
        ]
    ]
)

To extract only the article div's but removing the images.

declare variable $ex:extractor :=
    o:builder(
        ['div[@class="article"]',
            ['img', ()],
            ['div[@class="desc"]'],
            ['div[@class="price"]']
        ]
    );

We'll create a function that returns the article divs.

declare function ex:extract-articles()
{
	o:doc($ex:html, $ex:extractor)
};

In the next few sections we'll use this data and insert it into a template.

Attaching handlers to nodes

Before reading this part it may be a good idea to read about Handlers first.

Say you have some HTML stored on disk or in a database. You would like to use this HTML as a template for generating web pages.

Assume the following HTML document is availabe as the variable $html.

declare variable $ex:template :=
  <html>
    <head>
        <title>Base Template</title>
        <link rel="stylesheet" type="text/css" href="main.css"></link>
    </head>
    <body>
        <div id="header">
            The base header
        </div>
        <div id="main">
            The base body
        </div>
        <div id="footer">
            The base footer
        </div>
    </body>
  </html>;

Let's define a builder that attaches handlers to each of the three divs.

declare function ex:build-template()
{
  let $builder :=
    ['html',
      ['div[@id="header"]', o:insert('Header text')],
      ['div[@id="main"]', function($n,$d) { $n => o:insert($d) }],
      ['div[@id="footer"]', o:insert('Footer text')]
    ]
  return
    o:doc($ex:template, $builder)
};

The main div adds an anonymous function that 'picks' up the template data as $d and the handler body inserts this data into the div node. The other two handlers can remain simple node transformers (these return a handler function that modifies the template div node.

Finally we take the scraped data from the first example and insert it as data into the built template.

declare variable $ex:template := ex:build-template()

let $data := ex:extract-articles()
return
  o:xml(o:apply($ex:template, $data))

This is the final result

    <html>
      <head>
        <title>Base Template</title>
        <link href="main.css" rel="stylesheet" type="text/css"/>
      </head>
      <body>
        <div id="header">Header text</div>
        <div id="main">
          <div class="article">
            <div class="desc">Article 1</div>
            <div class="price">9.99</div>
          </div>
          <div class="article">
            <div class="desc">Article 2</div>
            <div class="price">10.99</div>
          </div>
          <div class="article">
            <div class="desc">Article 3</div>
            <div class="price">12.99</div>
          </div>
          <div class="article">
            <div class="desc">Article 4</div>
            <div class="price">4.99</div>
          </div>
        </div>
        <div id="footer">Footer text</div>
      </body>
    </html>

Simpler builders

Although the builders shown so far are quite powerful, sometimes, for simple jobs, you may want simpler builders just to attach a few handlers to a few nodes or attributes.

These builders are more suited for attaching handlers to nodes, less for extracting nodes from nested structures.

o:builder(
    map {
      'p': o:copy(),
      'div@class': (),
      'hr': (),
    }
)

Handling name conversions

Builders also are the objects used when translating XML names to strings and vice versa. This processed can be customized but here I explain only the relatively straightforward handling of XML namespaced names to colon prefixed strings and back.

TBD

Performance

Constructing a builder is not what matters most for performance. Building a document with the builder that is when most of the work is done. A builder is just a re-usable object. Once constructed it can be re-used on many documents.

Especially the first XSLT-like builders need to do quite a lot of work and thus tend to be quite slow. One reason is that, surprise, they perform an XSLT transform under the hood each time a document has to be created with such a builder.

With templating solutions you often need to prepare only a handful of templates. After being prepared by the builder they can be stored in a variable that can be cached. The hard work is then only done once. Applying data to such a prepared template to render final output is pretty fast (at least an order of magnitude faster if not faster).

Therefore, the primary pattern to use is to do as much of the work upfront and give the XQuery engine a chance to cache the resulting Mu data structures with it's embedded handlers.

What's next

There's a lot more to explain about builders and node handlers. But for now this is hopefully sufficient to get started.

⚠️ **GitHub.com Fallback** ⚠️