POST source file api - Tizra/Tizra-Customer-Tracker GitHub Wiki

POST: Updating Source File Metadata

Each file in a publishing source is either an attachment (an HTTP resource, stored on the server with a content-type, just like any static file on a web server), or it's a source file for publication (the PdfSource) which is processed to produce the pages delivered for a document.

The management URL for a source consists of the management URL for the document, followed by a path component naming the source, e.g. /objects/id/PdfSource. The URL for a file consists of the management URL for the source, followed by the desired path for the file (just the same as is used for a GET). For example objects/id/PdfSource/filename.pdf.

Publication sources should simply have a filename, but attachments may have multi-component path names, allowing the transparent storage and use of static HTML resources on a Tizra site.

The PUT operation updates the content of an attachment or publication file. POST operations to a file's management URL change the metadata for that file (filename path, content-type, whether it should be delivered as an attachment, etc.). Publication sources support the same metadata, but additional option control the publishing behavior of a publication resource (logical page numbering, table of contents organization, automatic creation of excerpts, etc.)

update operation and JSON records

The update operation below would take a URL with a form like objects/id/PdfSource/filename.pdf.

A JSON update record for a file looks like the descriptive record returned for a file, with an operation field. The following values are allowed for operation:

  • update This value signifies that the information about the file should be updated.
  • contents This value signifies that the JSON metadata record for this file should be returned as the body of the response.

An example JSON record including values for all fields that can currently be set would look like this:

{
        "operation": "update",
        "name": "newname.pdf",
        "props": {
                "DisplayName": "This is a URL", 
                "isDownload": false, 
                "isUrlName": true, 
                "isVisible": true, 
                "rel": "lightbox"
        },
        "content-type": "content-test/weird",
        "pagination-info": {
                "logical-pages": ["I", "ii", "iii", "iv", "1", "7", "8", "9"],
                "toc-entries": [{
                        "level": 0,
                        "page-number": 1,
                        "props": {},
                        "proc-props": {},
                        "logical-page": "I",
                        "title": "Dumb bookmark 1",
                        "is-displayed": true
                }, {
                        "level": 1,
                        "page-number": 7,
                        "props": {},
                        "proc-props": {},
                        "logical-page": "8",
                        "title": "Stupid bookmark 2",
                        "is-displayed": true
                }]
        }
}

The operation field is mandatory.

All other fields and subfields are optional.

  • name The filename of this file. If this name is changed the URL for subsequent updates will have to reflect the new name. The name will not be changed on any error return.
  • content-type sets the content type associated with the file, and that will be used when delivering it via HTTP.
  • sort-field set a value to be used in sorting the list of files. Files without a value for this field appear earlier in the list, and all files with a value for this are sorted by the unicode value of the string. Sort values may be discarded once the files have been reordered, as administrative users can manually rearrange file order -- however, even if the sort value is discarded the resulting file order will be preserved unless overriden.
  • purge-excerpts if true, existing excerpts for this document should be deleted before processing the bookmarks. This will remove licenses attached to the earlier excerpts, so should be done with some caution, but does provide a way to ensure that no excerpts will remain with the document that do not correspond to table of contents entries.

props values for source files

There are several properties defined on source files that affect system behavior:

  • isDownload the value "true" specifies that the file should be delivered to a browser with the Content-Disposition: attachment header set. This requests that the file to be downloaded, rather than displayed in the browser directly. False will not specify the attachment option to Content-disposition:, but will still set the filename attribute on the Content-Disposion: header, which will ensure that a meaningful name is available to a browser performing a "save as". Note that this cannot affect the delivery of URL isUrlName meta-files as described below.
  • DisplayName the value of this is a string name that should be used in displaying links to this attachment in the Tizra reader interface.
  • isVisible if "true" this file will be displayed in reader download lists; if "false", it will not.
  • isUrlName if true The filename of this file will be rendered as a URL in the reader. With proper URL %-escaping, this URL can be an absolute URL for another site. While it is currently possible to store file content in a "URL file" this may not be supported in the future.
    • Sometimes it's desirable to use a URL for an external resource that is relative to a location that can be changed. To provide for that, it is possible to create a design property that will prefix a string to all URL resources in a given Tizra Source. The name of that design property is ${sourceName?lower_case}-url-prefix where sourceName is the source in question. If defined the value of this property will be appended the beginning of any isUrlName meta-files in a given source.

File metadata updates can re-order files within a source (thus changing the rendering of the document from its publication source files). The default behavior stores files in chronological order of creation. Using PUT to update the contents of a file does not affect file ordering once a file has been created.

Pagination-info

This sub-record contains information that only pertains to publication resources, relating to pagination and tables of contents. This information initially determined automatically based on the contents of a PDF file, but can be explicitly managed after a file is created.

Since this information is critical to the publication of a document, and it can be automatically extracted from source files, the effects of updating pagination info are not guaranteed to be preserved after the source file is updated. The metadata for each file is independent of other files, however, even in the same source.

The logical-pages array

This array of strings contains the logical page identifier for each page in the document. The logical pages specified in a PDF files will be used to initialize this array on initial file upload. Updates can be made, and will be used in all Tizra operations subsequently. The logical page numbers on table of contents entries are in fact redundant with this array, and they will probably be phased out over time.

Note that in some cases the logical pages array may not have entries for every page. In that case, the logical page number used for that page will be the same as the physical page number (rendered as a decimal string).

The toc-entries array

This array lists the table of contents entries for pages in this file. The fields of a toc record are as follows:

  • level Level of this toc entry within the toc hierarchy. 0-based. The system does not require that levels be contiguous (you can have a level 2 without a level 1), but this may lead to anomalies in automatic generation of Excerpts or premature cut-off of levels in short table of contents blocks.
  • page-number The 1-based page number of this toc-entry in the book. It is not required that the order of pages match the order of ToC entries -- this can be useful to create ToC subsections to index Illustrations, and the like. However automatic Excerpt generation for such ToC entries may produce unusual results.
  • logical-page The logical page identifier for the page this indexes. Redundant with the logical-pages array, and perhaps to be phased out over time.
  • title The text for this ToC entry. May be any UTF-8 text. This can include HTML, and is rendered without HTML/XML escaping in page templates, so that characters like & should be properly escaped, if present.
  • is-displayed Whether this table of contents entry will be displayed to users, or is just to be tracked by the system.
  • props Properties for this table of contents entry. Properties are used as general metadata.
  • proc-props Processing Properties for this table of contents entry. Processing properties are used to determine how a bookmark is processed. Often this takes the form of declaring a persistent relationship to another object in the system, like a Excerpt created based on the data in this bookmark.

proc-props for toc entries

The proc-props hash specifies aspects of how a bookmark is processed by the system. In particular, these properties can be used to control the creation and properties of Excerpts, based on metadata associated with the bookmark. Linkage of a toc-entry to an Excerpt is by means of an identifying property which must have an identical value on the Excerpt and the bookmark. Matching against an Excerpt property, or setting a value on an Excerpt will have no effect unless the property has been defined for the Excerpt metatype in the administrative UI.

The following values can be set in the proc-props map:

  • match-prop This string-valued property gives the name of the property that should be used to match this bookmark against Excerpts in the system. Most sites will have one property that they will use in all cases (holding some form of chapter ID). It is possible for different toc entries to match against Excerpts on different properties, but the need for this is likely to be uncommon.

    The match is made against any Excerpts for the containing document, and is an exact string match. Tizra generally trims leading and trailing spaces, and these should not be used. The value matched is determined by the props value for that name. For instance a match-prop=foo for a bookmark with props={"foo": "bar"} would match to any Excerpts on the document with the value bar for its foo property.

    If more than one matching Excerpt is found for a toc-entry being processed, the request will fail. If there is no matching entry (the typical case on a first-time upload of toc-entries), a new Excerpt will be created, and the match property will be added to that Excerpt.

  • copy-props This boolean value determines if the props fields in the bookmark will be copied to the matched or newly created Excerpt. This need not be set to true even if a new Excerpt is being created: the relevant match-prop will be set on the Excerpt even if other properties are not to be copied in.

  • range-expression The contents of this string will be set as the range-expression of the matched or created Excerpt. The syntax for range expressions is a comma-separated seqence of page ranges, where a page range can be a single page number, or a two pages separated by hyphens, representing an (inclusive range). For instance the value "1,3-4" would identify 3 pages of a document: 1, 3, and 4. While the administration interface can automatically attempt to deduce the pages covered by a toc-entry, direct setting is used in the API so that Excerpts associated with a Toc entry can include additional frontmatter and backmatter if necessary.

  • auto-publish If this boolean property is true, the created except will have its "AutoPublishExcerpt" metadata property set to true, which will ensure that the Excerpt will be published whenever the document is published.