Spec: RSS 0.91 (Netscape) - simplepie/simplepie-ng GitHub Wiki

Archivist's Note: This is the RSS 0.91 specification published by Netscape on July 10, 1999. The current version of the RSS 2.0 specification is available at this link and other revisions have been archived. Netscape transferred this specification to the RSS Advisory Board on Jan. 22, 2008.

RSS 0.91 Spec, revision 3
Netscape Communications
Primary Author: Dan Libby
July 10, 1999

Table of Contents

Notes

Files must be 100% valid XML. We're trying to move towards a more standard format, and to this end we have included several tags from the popular <scriptingNews> format. We have also ensured that this version is 100% valid XML. We did this by requiring that a DOCTYPE tag be included, and validating each RSS document against that DTD. This means that it is not enough for an RSS document to be "well-formed". It must also be "valid" with respect to its DTD.

No mixed content tags. We are specifically not including any tags that contain mixed content in RSS 0.91. This means that each tag either contains sub-tags only, or text only, not a combination. This is both because we want to keep the format simple, and because our current validation system is not able to handle this type of tag. We also are not allowing any HTML markup beyond the commonly used entities such as &quot; A full list of these are defined in the RSS 0.91 DTD.

New tags for syndication community. Our validator will now allow several new tags through the system, though most of them will not actually be used by Netcenter. However, these may work when syndicating content to other sites. These tags are noted explicitly in the spec as "ignored."

RDF references removed. RSS was originally conceived as a metadata format providing a summary of a website. Two things have become clear: the first is that providers want more of a syndication format than a metadata format. The structure of an RDF file is very precise and must conform to the RDF data model in order to be valid. This is not easily human-understandable and can make it difficult to create useful RDF files. The second is that few tools are available for RDF generation, validation and processing. For these reasons, we have decided to go with a standard XML approach.

Specification

Tags in alphabetical order.

<channel>

Description

information about a particular channel. Everything pertaining to an individual channel is contained within this tag.

Netcenter Usage

Currently displayed on "My Netscape". May use in other locations in the future.

Attributes

none

Sub-elements:

  • required:
    • <description>
    • <language>
    • <link>
    • <title>
  • optional:
    • <copyright>
    • <docs>
    • <image>
    • <item>
    • <lastBuildDate>
    • <managingEditor>
    • <pubDate>
    • <rating>
    • <skipDays>
    • <skipHours>
    • <textinput>
    • <webMaster>

<copyright>

Description

copyright string

Netcenter Usage

ignored

Attributes

none

Sub-elements:

none

<day>

Description

The day of the week, spelled out in English.

Netcenter Usage

ignored

Attributes

none

Sub-elements:

none

<description>

Description

a plain text description of an item, channel, image, or textinput.

Netcenter Usage

displayed as appropriate depending on context.

Attributes

none

Sub-elements:

none

<docs>

Description

This tag should contain a URL that references a description of the channel.

Netcenter Usage

ignored

Attributes

none

Sub-elements:

none

<!DOCTYPE>

Description

Document Type Identifier. This is an XML tag that identifies where to find the definition for this format. It should follow the xml tag. The full DTD is here.

Netcenter Usage

required to ensure document validity

Attributes

1 of these two formats is required:

rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" ""
rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"

Sub-elements:

none

<height>

Description

Specifies the height of an image. Should be an integer value.

Netcenter Usage

The value must be between 1 and 400. If omitted, the default value is 31.

Attributes

none

Sub-elements:

none

<hour>

Description

Specifies an hour of the day. Should be an integer value between 0 and 23. See <skipHours>.

Netcenter Usage

ignored

Attributes

none

Sub-elements:

none

<image>

Description

Specifies an image associated with a <channel>.

Netcenter Usage

Optionally (user preference) display an image along with the channel content.

Attributes

none

Sub-elements:

  • required:
    • <url>
    • <link>
    • <title>
  • optional:
    • <description>
    • <width>
    • <height>

<item>

Description

An item that is associated with a <channel>. The item should represent a web-page, or subsection within a web page. It should have a unique URL associated with it. Each item must contain a title and a link. A description is optional.

Netcenter Usage

generates a list of links. The description, if supplied, may optionally be viewed by the user as plain text beneath the link. Also, a maximum of 15 items per channel is enforced at this time.

Attributes

none

Sub-elements:

  • required:
    • <title>
    • <link>
  • optional:
    • <description>

<language>

Description

Specifies the language of a <channel>. See supported language codes.

Netcenter Usage

used to assist user with determining correct page encoding

Attributes

none

Sub-elements:

none

<lastBuildDate>

Description

The last time the channel was modified.

Netcenter Usage

ignored

Attributes

none

Sub-elements

none

<link>

Description

This is a url that a user is expected to click on, as opposed to a <url> that is for loading a resource, such as an image.

Netcenter Usage

must start with either http:// or ftp://. All other urls are considered invalid.

Attributes

none

Sub-elements

none

<managingEditor>

Description

The email address of the managing editor of the site, the person to contact for editorial inquiries

Netcenter Usage

ignored

Attributes

none

Sub-elements

none

<name>

Description

The name of an object, corresponding to the name attribute of an HTML <INPUT> element. Currently, this only applies to <textinput>.

Netcenter Usage

generates name attribute in html form

Attributes

none

Sub-elements

none

<pubDate>

Description

Date when channel was published.

Netcenter Usage

ignored

Attributes

none

Sub-elements

none

<rating>

Description

  • Recommended links rating agencies:
  • User actions:
    • Obtain a rating for your site from a well-known rating agency (e.g., RSACi, SafeSurf)
    • Copy rating data into RSS file. Include only the data within the content attribute.
  • Expected format:
    • starts with (PICS-1.1

Netcenter Usage

ignored. May use in the future to dynamically decide page rating.

Attributes

none

Sub-elements

none

<rss>

Description

Identifies begin and end of rss content.

Netcenter Usage

identifies content type

Attributes

  • required:
    • version (must be 0.91)

Sub-elements

  • required:
    • <channel>

<skipDays>

Description

A list of <day>s of the week, in English, indicating the days of the week when your channel will not be updated. As with activeHours, if you know your channel will never be updated on Saturday or Sunday, for example

Netcenter Usage

ignored

Attributes

none

Sub-elements

  • required:
    • <day>

<skipHours>

Description

A list of <hour>s indicating the hours in the day, GMT, when the channel is unlikely to be updated. If this sub-item is omitted, the channel is assumed to be updated hourly.

Netcenter Usage

ignored

Attributes

none

Sub-elements

  • required:
    • <hour>

<textinput>

Description

An input field for the purpose of allowing users to submit queries back to the publisher's site. This element should have a title, a link (to a cgi or other processor), a description containing some instructions, and a name, to be used as the name in the HTML tag <input type=text name="[name]">

Netcenter Usage

Displays form for submission back to publisher.

Attributes

none

Sub-elements

  • required:
    • <title>
    • <link>
    • <description>
    • <name>

<title>

Description

An identifying string for a resource. When used in an <item>, this is the name of the item's <link>. When used in an <image>, this is the Alt text for the image. When used in a <channel>, this is the channel's title. When used in a <textinput>, this is the the textinput's title.

Netcenter Usage

displayed as appropriate depending on context.

Attributes

none

Sub-elements

none

<url>

Description

Location to load a resource from. Note that this is slightly different from the <link> tag, which specifies where a user should be re-directed to if a resource is selected.

Netcenter Usage

must start with either http:// or ftp://. All other urls are considered invalid.

Attributes

none

Sub-elements

none

<webMaster>

Description

The email address of the webmaster for the site, the person to contact if there are technical problems with the channel.

Netcenter Usage

ignored

Attributes

none

Sub-elements

none

<width>

Description

Specifies the width of an <image>. Should be an integer value.

Netcenter Usage

The value must be between 1 and 144. If omitted, the default value is 88.

Attributes

none

Sub-elements

none

<?xml?>

Description

Identifies this as an XML document and specifies encoding. See W3C. Note that this must be on the first line of the document.

Netcenter Usage

required for XML compliance.

Attributes

  • version: must be "1.0"
  • encoding: see list of supported encodings

Sub-elements

none

Examples

Example 1 - Simple

<?xml version="1.0"?>
<!DOCTYPE rss
  SYSTEM 'http://my.netscape.com/publish/formats/rss-0.91.dtd'>
<rss version="0.91">
    <channel>
        <language>en</language>
        <description>News and commentary from the cross-platform scripting community.</description>
        <link>http://www.scripting.com/</link>
        <title>Scripting News</title>
        <image>
            <link>http://www.scripting.com/</link>
            <title>Scripting News</title>
            <url>http://www.scripting.com/gifs/tinyScriptingNews.gif</url>
        </image>
    </channel>
</rss>

Example 2 - Complete

<?xml version="1.0"?>
<!DOCTYPE rss
  SYSTEM 'http://my.netscape.com/publish/formats/rss-0.91.dtd'>
<rss version="0.91">
    <channel>
        <copyright>Copyright 1997-1999 UserLand Software, Inc.</copyright>
        <pubDate>Thu, 08 Jul 1999 07:00:00 GMT</pubDate>
        <lastBuildDate>Thu, 08 Jul 1999 16:20:26 GMT</lastBuildDate>
        <docs>http://my.userland.com/stories/storyReader$11</docs>
        <description>News and commentary from the cross-platform scripting community.</description>
        <link>http://www.scripting.com/</link>
        <title>Scripting News</title>
        <image>
            <link>http://www.scripting.com/</link>
            <title>Scripting News</title>
            <url>http://www.scripting.com/gifs/tinyScriptingNews.gif</url>
            <height>40</height>
            <width>78</width>
            <description>What is this used for?</description>
        </image>
        <managingEditor>[email protected] (Dave Winer)</managingEditor>
        <webMaster>[email protected] (Dave Winer)</webMaster>
        <language>en-us</language>
        <skipHours>
            <hour>6</hour>
            <hour>7</hour>
            <hour>8</hour>
            <hour>9</hour>
            <hour>10</hour>
            <hour>11</hour>
        </skipHours>
        <skipDays>
            <day>Sunday</day>
        </skipDays>
        <rating>(PICS-1.1 &quot;http://www.rsac.org/ratingsv01.html&quot; l gen true comment &quot;RSACi North America Server&quot; for &quot;http://www.rsac.org&quot; on &quot;1996.04.16T08:15-0500&quot; r (n 0 s 0 v 0 l 0))</rating>
        <item>
            <title>stuff</title>
            <link>http://bar</link>
            <description>This is an article about some stuff</description>
        </item>
        <textinput>
            <title>Search Now!</title>
            <description>Enter your search terms</description>
            <name>find</name>
            <link>http://my.site.com/search.cgi</link>
        </textinput>
    </channel>
</rss>

Example 3 - International

<?xml version="1.0" encoding="EuC-JP"?>  
<!DOCTYPE rss
  SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd">  
<rss version="0.91">
    <channel>
        <title>... </title>
        <link>http://www.mozilla.org</link>
        <description>... </description>
        <language>ja</language>
        <item>
            <title>... </title>
            <link>http://www.mozilla.org/status/</link>
            <description>This is an item description...</description>
        </item>
        <item>
            <title>... </title>
            <link>http://www.mozilla.org/status/</link>
            <description>This is an item description...</description>
        </item>
        <item>
            <title>... </title>
            <link>http://www.mozilla.org/status/</link>
            <description>This is an item description...</description>
        </item>
        <item>
            <title>... </title>
            <link>http://www.mozilla.org/status/</link>
            <description>This is an item description...</description>
        </item>
    </channel>
</rss>

Supported languages

Why these?

These are the language codes that are accepted by Netcenter. Other language codes may be available as specified by the w3c, but these are guaranteed to work with most browsers. Netcenter will currently reject other language codes, however other sites may accept them.

Codes

Code Language
af Afrikaans
sq Albanian
eu Basque
be Belarusian
bg Bulgarian
ca Catalan
zh-cn Chinese (Simplified)
zh-tw Chinese (Traditional)
hr Croatian
cs Czech
da Danish
nl Dutch
nl-be Dutch (Belgium)
nl-nl Dutch (Netherlands)
en English
en-au English (Australia)
en-bz English (Belize)
en-ca English (Canada)
en-ie English (Ireland)
en-jm English (Jamaica)
en-nz English (New Zealand)
en-ph English (Phillipines)
en-za English (South Africa)
en-tt English (Trinidad)
en-gb English (United Kingdom)
en-us English (United States)
en-zw English (Zimbabwe)
fo Faeroese
fi Finnish
fr French
fr-be French (Belgium)
fr-ca French (Canada)
fr-fr French (France)
fr-lu French (Luxembourg)
fr-mc French (Monaco)
fr-ch French (Switzerland)
gl Galician
gd Gaelic
de German
de-at German (Austria)
de-de German (Germany)
de-li German (Liechtenstein)
de-lu German (Luxembourg)
de-ch German (Switzerland)
el Greek
hu Hungarian
is Icelandic
id Indonesian
ga Irish
it Italian
it-it Italian (Italy)
it-ch Italian (Switzerland)
ja Japanese
ko Korean
mk Macedonian
no Norwegian
pl Polish
pt Portuguese
pt-br Portuguese (Brazil)
pt-pt Portuguese (Portugal)
ro Romanian
ro-mo Romanian (Moldova)
ro-ro Romanian (Romania)
ru Russian
ru-mo Russian (Moldova)
ru-ru Russian (Russia)
sr Serbian
sk Slovak
sl Slovenian
es Spanish
es-ar Spanish (Argentina)
es-bo Spanish (Bolivia)
es-cl Spanish (Chile)
es-co Spanish (Colombia)
es-cr Spanish (Costa Rica)
es-do Spanish (Dominican Republic)
es-ec Spanish (Ecuador)
es-sv Spanish (El Salvador)
es-gt Spanish (Guatemala)
es-hn Spanish (Honduras)
es-mx Spanish (Mexico)
es-ni Spanish (Nicaragua)
es-pa Spanish (Panama)
es-py Spanish (Paraguay)
es-pe Spanish (Peru)
es-pr Spanish (Puerto Rico)
es-es Spanish (Spain)
es-uy Spanish (Uruguay)
es-ve Spanish (Venezuela)
sv Swedish
sv-fi Swedish (Finland)
sv-se Swedish (Sweden)
tr Turkish
uk Ukranian

Supported encodings

NOTE: These are not case sensitive.

IANA standard name MIME preferred name (if different from IANA)
ANSI_X3.4-1968 US-ASCII
ISO_8859-1:1987 ISO-8859-1
ISO_8859-2:1987 ISO-8859-2
ISO_8859-5:1988 ISO-8859-5
ISO_8859-7:1987 ISO-8859-7
ISO_8859-9:1989 ISO-8859-9
Shift_JIS
Extended_UNIX_Code_Packed_Format_for_Japanese EUC-JP
GB2312
EUC-KR
Big5
windows-1250
windows-1251
UTF-8
x-mac-roman

DTD

Location

Public ID:

-//Netscape Communications//DTD RSS 0.91//EN

System ID:

http://my.netscape.com/publish/formats/rss-0.91.dtd

The DTD itself

<!--  
Rich Site Summary (RSS) 0.91 official DTD, proposed.  
RSS is an XML vocabulary for describing  
metadata about websites, and enabling the display of  
"channels" on the "My Netscape" website.  
RSS Info can be found at http://my.netscape.com/publish/  
XML Info can be found at http://www.w3.org/XML/  
copyright Netscape Communications, 1999  
Dan Libby - [email protected]  
Based on RSS DTD originally created by  
Lars Marius Garshol - [email protected].  
: rss-spec-0.91.html,v 1.1.2.2 2001/11/09 08:10:07 dprusak Exp $  
-->  
<!ELEMENT rss (channel)>  
<!ATTLIST rss  
version CDATA #REQUIRED> <!-- must be "0.91"> -->  
<!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>  
<!ELEMENT title (#PCDATA)>  
<!ELEMENT description (#PCDATA)>  
<!ELEMENT link (#PCDATA)>  
<!ELEMENT image (title | url | link | width? | height? | description?)*>  
<!ELEMENT url (#PCDATA)>  
<!ELEMENT item (title | link | description)*>  
<!ELEMENT textinput (title | description | name | link)*>  
<!ELEMENT name (#PCDATA)>  
<!ELEMENT rating (#PCDATA)>  
<!ELEMENT language (#PCDATA)>  
<!ELEMENT width (#PCDATA)>  
<!ELEMENT height (#PCDATA)>  
<!ELEMENT copyright (#PCDATA)>  
<!ELEMENT pubDate (#PCDATA)>  
<!ELEMENT lastBuildDate (#PCDATA)>  
<!ELEMENT docs (#PCDATA)>  
<!ELEMENT managingEditor (#PCDATA)>  
<!ELEMENT webMaster (#PCDATA)>  
<!ELEMENT hour (#PCDATA)>  
<!ELEMENT day (#PCDATA)>  
<!ELEMENT skipHours (hour+)>  
<!ELEMENT skipDays (day+)>  
<!--  
Copied from HTML 3.2 DTD, with modifications (removed CDATA)  
http://www.w3.org/TR/REC-html32.html#dtd  
=============== BEGIN ===================  
-->  
<!--  
Character Entities for ISO Latin-1  
(C) International Organization for Standardization 1986  
Permission to copy in any form is granted for use with  
conforming SGML systems and applications as defined in  
ISO 8879, provided this notice is included in all copies.  
This has been extended for use with HTML to cover the full  
set of codes in the range 160-255 decimal.  
-->  
<!-- Character entity set. Typical invocation:  
<!ENTITY % ISOlat1 PUBLIC  
"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">  
%ISOlat1;  
-->  
<!ENTITY nbsp " "> <!-- no-break space -->  
<!ENTITY iexcl "¡"> <!-- inverted exclamation mark -->  
<!ENTITY cent "¢"> <!-- cent sign -->  
<!ENTITY pound "£"> <!-- pound sterling sign -->  
<!ENTITY curren "¤"> <!-- general currency sign -->  
<!ENTITY yen "¥"> <!-- yen sign -->  
<!ENTITY brvbar "¦"> <!-- broken (vertical) bar -->  
<!ENTITY sect "§"> <!-- section sign -->  
<!ENTITY uml "¨"> <!-- umlaut (dieresis) -->  
<!ENTITY copy "©"> <!-- copyright sign -->  
<!ENTITY ordf "ª"> <!-- ordinal indicator, feminine -->  
<!ENTITY laquo "«"> <!-- angle quotation mark, left -->  
<!ENTITY not "¬"> <!-- not sign -->  
<!ENTITY shy "­"> <!-- soft hyphen -->  
<!ENTITY reg "®"> <!-- registered sign -->  
<!ENTITY macr "¯"> <!-- macron -->  
<!ENTITY deg "°"> <!-- degree sign -->  
<!ENTITY plusmn "±"> <!-- plus-or-minus sign -->  
<!ENTITY sup2 "²"> <!-- superscript two -->  
<!ENTITY sup3 "³"> <!-- superscript three -->  
<!ENTITY acute "´"> <!-- acute accent -->  
<!ENTITY micro "µ"> <!-- micro sign -->  
<!ENTITY para "¶"> <!-- pilcrow (paragraph sign) -->  
<!ENTITY middot "·"> <!-- middle dot -->  
<!ENTITY cedil "¸"> <!-- cedilla -->  
<!ENTITY sup1 "¹"> <!-- superscript one -->  
<!ENTITY ordm "º"> <!-- ordinal indicator, masculine -->  
<!ENTITY raquo "»"> <!-- angle quotation mark, right -->  
<!ENTITY frac14 "¼"> <!-- fraction one-quarter -->  
<!ENTITY frac12 "½"> <!-- fraction one-half -->  
<!ENTITY frac34 "¾"> <!-- fraction three-quarters -->  
<!ENTITY iquest "¿"> <!-- inverted question mark -->  
<!ENTITY Agrave "À"> <!-- capital A, grave accent -->  
<!ENTITY Aacute "Á"> <!-- capital A, acute accent -->  
<!ENTITY Acirc "Â"> <!-- capital A, circumflex accent -->  
<!ENTITY Atilde "Ã"> <!-- capital A, tilde -->  
<!ENTITY Auml "Ä"> <!-- capital A, dieresis or umlaut mark -->  
<!ENTITY Aring "Å"> <!-- capital A, ring -->  
<!ENTITY AElig "Æ"> <!-- capital AE diphthong (ligature) -->  
<!ENTITY Ccedil "Ç"> <!-- capital C, cedilla -->  
<!ENTITY Egrave "È"> <!-- capital E, grave accent -->  
<!ENTITY Eacute "É"> <!-- capital E, acute accent -->  
<!ENTITY Ecirc "Ê"> <!-- capital E, circumflex accent -->  
<!ENTITY Euml "Ë"> <!-- capital E, dieresis or umlaut mark -->  
<!ENTITY Igrave "Ì"> <!-- capital I, grave accent -->  
<!ENTITY Iacute "Í"> <!-- capital I, acute accent -->  
<!ENTITY Icirc "Î"> <!-- capital I, circumflex accent -->  
<!ENTITY Iuml "Ï"> <!-- capital I, dieresis or umlaut mark -->  
<!ENTITY ETH "Ð"> <!-- capital Eth, Icelandic -->  
<!ENTITY Ntilde "Ñ"> <!-- capital N, tilde -->  
<!ENTITY Ograve "Ò"> <!-- capital O, grave accent -->  
<!ENTITY Oacute "Ó"> <!-- capital O, acute accent -->  
<!ENTITY Ocirc "Ô"> <!-- capital O, circumflex accent -->  
<!ENTITY Otilde "Õ"> <!-- capital O, tilde -->  
<!ENTITY Ouml "Ö"> <!-- capital O, dieresis or umlaut mark -->  
<!ENTITY times "×"> <!-- multiply sign -->  
<!ENTITY Oslash "Ø"> <!-- capital O, slash -->  
<!ENTITY Ugrave "Ù"> <!-- capital U, grave accent -->  
<!ENTITY Uacute "Ú"> <!-- capital U, acute accent -->  
<!ENTITY Ucirc "Û"> <!-- capital U, circumflex accent -->  
<!ENTITY Uuml "Ü"> <!-- capital U, dieresis or umlaut mark -->  
<!ENTITY Yacute "Ý"> <!-- capital Y, acute accent -->  
<!ENTITY THORN "Þ"> <!-- capital THORN, Icelandic -->  
<!ENTITY szlig "ß"> <!-- small sharp s, German (sz ligature) -->  
<!ENTITY agrave "à"> <!-- small a, grave accent -->  
<!ENTITY aacute "á"> <!-- small a, acute accent -->  
<!ENTITY acirc "â"> <!-- small a, circumflex accent -->  
<!ENTITY atilde "ã"> <!-- small a, tilde -->  
<!ENTITY auml "ä"> <!-- small a, dieresis or umlaut mark -->  
<!ENTITY aring "å"> <!-- small a, ring -->  
<!ENTITY aelig "æ"> <!-- small ae diphthong (ligature) -->  
<!ENTITY ccedil "ç"> <!-- small c, cedilla -->  
<!ENTITY egrave "è"> <!-- small e, grave accent -->  
<!ENTITY eacute "é"> <!-- small e, acute accent -->  
<!ENTITY ecirc "ê"> <!-- small e, circumflex accent -->  
<!ENTITY euml "ë"> <!-- small e, dieresis or umlaut mark -->  
<!ENTITY igrave "ì"> <!-- small i, grave accent -->  
<!ENTITY iacute "í"> <!-- small i, acute accent -->  
<!ENTITY icirc "î"> <!-- small i, circumflex accent -->  
<!ENTITY iuml "ï"> <!-- small i, dieresis or umlaut mark -->  
<!ENTITY eth "ð"> <!-- small eth, Icelandic -->  
<!ENTITY ntilde "ñ"> <!-- small n, tilde -->  
<!ENTITY ograve "ò"> <!-- small o, grave accent -->  
<!ENTITY oacute "ó"> <!-- small o, acute accent -->  
<!ENTITY ocirc "ô"> <!-- small o, circumflex accent -->  
<!ENTITY otilde "õ"> <!-- small o, tilde -->  
<!ENTITY ouml "ö"> <!-- small o, dieresis or umlaut mark -->  
<!ENTITY divide "÷"> <!-- divide sign -->  
<!ENTITY oslash "ø"> <!-- small o, slash -->  
<!ENTITY ugrave "ù"> <!-- small u, grave accent -->  
<!ENTITY uacute "ú"> <!-- small u, acute accent -->  
<!ENTITY ucirc "û"> <!-- small u, circumflex accent -->  
<!ENTITY uuml "ü"> <!-- small u, dieresis or umlaut mark -->  
<!ENTITY yacute "ý"> <!-- small y, acute accent -->  
<!ENTITY thorn "þ"> <!-- small thorn, Icelandic -->  
<!ENTITY yuml "ÿ"> <!-- small y, dieresis or umlaut mark -->  
<!--  
Copied from HTML 3.2 DTD, with modifications (removed CDATA)  
http://www.w3.org/TR/REC-html32.html#dtd  
================= END ===================  
-->

Proprietary Schema (Validation Rules)

Explanation

XML currently provides a limited amount of validation via DTD's. However, DTD's do not provide any support for common validation requirements, such as data types, length of strings, number of sub-elements, or pattern matching.

A standard has been proposed to solve this problem. XML Schemas looks like it will do all of this and more. Unfortunately, there are few, if any parsers available today that understand them.

As a proprietary, interim only solution, we have developed a very simplistic schema format that performs a second level of validation after the parser has read the XML document into memory. We are listing the schema used to validate RSS 0.91 files, so that there will be no ambiguity when validation fails.

Here are the basic rules:

  • Each XML element must be defined by an <Element> tag.
    • Each Element definition must have a unique id attribute and a type attribute.
    • Each Attribute of an Element must be referenced by an <Attrib> tag
    • Each sub-Element of an Element of type container must be referenced by <Contains> tag.
    • Each Element may have a type associated with it. Currently supported types are:
      • container: this Element contains other Elements only.
      • string: this Element contains text data.
      • int: this Element contains an integer.
    • Each string or int Element may contain a matching rule, specified via <Matches>
    • Each string or int Element may specify a minimum and maximum number of characters (or value if type int) via min, max, and exactly.
  • Each XML attribute must be defined by an <Attribute> tag.
    • Each Attribute definition must have a unique id attribute and a type attribute.
    • Each Attribute may be of type string or int.
    • Each Attribute may contain a matching rule, specified via <Matches>
    • Each Attribute may specify a minimum and maximum number of characters (or value if type int) via min, max, and exactly.
  • Each <Contains> and <Attrib> definition must contain a ref attribute that refers to a uniquely defined Element or Attribute with the value of ref as its id.
  • Each <Contains> and <Attrib> definition may contain min, max, or exactly attributes to define the number of Elements or Attributes required.
  • Each <Matches> must contain a valid regular expression, against which the corresponding Element or Attribute will be evaluated.

Schema

Here is the schema for RSS 0.91.

<?xml version="1.0"?>
<!DOCTYPE Schema
  PUBLIC '-//Netscape Communications//DTD Schema 1.0//EN'
  'http://my.netscape.com/publish/formats/schema-1.0.dtd'>
<Schema name="RSS 0.91" root="rss" version="DKHXVF 1.0">
    <Element id="rss" type="container">
        <Contains exactly="1" ref="channel"/>
        <Attrib exactly="1" ref="version"/>
    </Element>
    <Attribute id="version" type="string">
        <Matches>0.91</Matches>
    </Attribute>
    <Element id="channel" type="container">
        <Contains exactly="1" ref="description"/>
        <Contains max="1" min="0" ref="image"/>
        <Contains max="15" min="0" ref="item"/>
        <Contains exactly="1" ref="language"/>
        <Contains exactly="1" ref="link"/>
        <Contains max="1" min="0" ref="rating"/>
        <Contains max="1" min="0" ref="textinput"/>
        <Contains exactly="1" ref="title"/>
        <Contains max="1" min="0" ref="copyright"/>
        <Contains max="1" min="0" ref="pubDate"/>
        <Contains max="1" min="0" ref="lastBuildDate"/>
        <Contains max="1" min="0" ref="docs"/>
        <Contains max="1" min="0" ref="managingEditor"/>
        <Contains max="1" min="0" ref="webMaster"/>
        <Contains max="1" min="0" ref="skipHours"/>
        <Contains max="1" min="0" ref="skipDays"/>
    </Element>
    <Element id="copyright" max="100" type="string"/>
    <Element id="pubDate" max="100" type="string"/>
    <Element id="lastBuildDate" max="100" type="string"/>
    <Element id="docs" max="500" type="string"/>
    <Element id="managingEditor" max="100" type="string"/>
    <Element id="webMaster" max="100" type="string"/>
    <Element id="skipHours" type="container">
        <Contains max="24" min="0" ref="hour"/>
    </Element>
    <Element id="skipDays" type="container">
        <Contains max="7" min="0" ref="day"/>
    </Element>
    <Element id="hour" max="24" min="0" type="int"/>
    <Element id="day" max="10" min="0" type="string"/>
    <Element id="item" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="link"/>
        <Contains max="1" min="0" ref="description"/>
    </Element>
    <Element id="image" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains max="1" min="0" ref="link"/>
        <Contains exactly="1" ref="url"/>
        <Contains max="1" min="0" ref="width"/>
        <Contains max="1" min="0" ref="height"/>
        <Contains max="1" min="0" ref="description"/>
    </Element>
    <Element id="textinput" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="link"/>
        <Contains exactly="1" ref="description"/>
        <Contains exactly="1" ref="name"/>
    </Element>
    <Element id="title" max="100" min="1" type="string"/>
    <Element id="description" max="500" min="1" type="string"/>
    <Element id="url" max="500" min="1" type="string">
        <Matches>^(http://|^ftp://)</Matches>
    </Element>
    <Element id="link" max="500" min="1" type="string">
        <Matches>^(http://|^ftp://)</Matches>
    </Element>
    <Element id="language" max="5" min="2" type="string">
        <Matches>^(af | # Afrikaans  
sq | # Albanian  
eu | # Basque  
be | # Belarusian  
bg | # Bulgarian  
ca | # Catalan  
zh-cn | # Chinese (Simplified)  
zh-tw | # Chinese (Traditional)  
hr | # Croatian  
cs | # Czech  
da | # Danish  
nl | # Dutch  
nl-be | # Dutch (Belgium)  
nl-nl | # Dutch (Netherlands)  
en | # English  
en-au | # English (Australia)  
en-bz | # English (Belize)  
en-ca | # English (Canada)  
en-ie | # English (Ireland)  
en-jm | # English (Jamaica)  
en-nz | # English (New Zealand)  
en-ph | # English (Phillipines)  
en-za | # English (South Africa)  
en-tt | # English (Trinidad)  
en-gb | # English (United Kingdom)  
en-us | # English (United States)  
en-zw | # English (Zimbabwe)  
fo | # Faeroese  
fi | # Finnish  
fr | # French  
fr-be | # French (Belgium)  
fr-ca | # French (Canada)  
fr-fr | # French (France)  
fr-lu | # French (Luxembourg)  
fr-mc | # French (Monaco)  
fr-ch | # French (Switzerland)  
gl | # Galician  
gd | # Gaelic  
de | # German  
de-at | # German (Austria)  
de-de | # German (Germany)  
de-li | # German (Liechtenstein)  
de-lu | # German (Luxembourg)  
de-ch | # German (Switzerland)  
el | # Greek  
hu | # Hungarian  
is | # Icelandic  
id | # Indonesian  
ga | # Irish  
it | # Italian  
it-it | # Italian (Italy)  
it-ch | # Italian (Switzerland)  
ja | # Japanese  
ko | # Korean  
mk | # Macedonian  
no | # Norwegian  
pl | # Polish  
pt | # Portuguese  
pt-br | # Portuguese (Brazil)  
pt-pt | # Portuguese (Portugal)  
ro | # Romanian  
ro-mo | # Romanian (Moldova)  
ro-ro | # Romanian (Romania)  
ru | # Russian  
ru-mo | # Russian (Moldova)  
ru-ru | # Russian (Russia)  
sr | # Serbian  
sk | # Slovak  
sl | # Slovenian  
es | # Spanish  
es-ar | # Spanish (Argentina)  
es-bo | # Spanish (Bolivia)  
es-cl | # Spanish (Chile)  
es-co | # Spanish (Colombia)  
es-cr | # Spanish (Costa Rica)  
es-do | # Spanish (Dominican Republic)  
es-ec | # Spanish (Ecuador)  
es-sv | # Spanish (El Salvador)  
es-gt | # Spanish (Guatemala)  
es-hn | # Spanish (Honduras)  
es-mx | # Spanish (Mexico)  
es-ni | # Spanish (Nicaragua)  
es-pa | # Spanish (Panama)  
es-py | # Spanish (Paraguay)  
es-pe | # Spanish (Peru)  
es-pr | # Spanish (Puerto Rico)  
es-es | # Spanish (Spain)  
es-uy | # Spanish (Uruguay)  
es-ve | # Spanish (Venezuela)  
sv | # Swedish  
sv-fi | # Swedish (Finland)  
sv-se | # Swedish (Sweden)  
tr | # Turkish  
uk # Ukranian  
)$  
</Matches>
    </Element>
    <Element id="rating" max="500" min="20" type="string">
        <Matches>^(PICS-1.1</Matches>
    </Element>
    <Element id="width" max="144" min="1" type="int"/>
    <Element id="height" max="400" min="1" type="int"/>
    <Element id="name" max="20" min="1" type="string"/>
</Schema>

Schema DTD

Here is the DTD for the schema format.

<!--  
A DTD for Dan's Kinda Hacky XML Validation Format (DKHXVF)  
Basically, this format allows us to enforce some additional rules  
that DTD's do not. Specifically, we can:  
- specify min and max for number of each child element  
- specify a regular expression that text elements and attributes must match  
- specify type of text elements and attributes (int, float, string, timestamp)  
- specify min and max for any type. (length compare for strings, numeric otherwise)  
The hope is that this will allow the rapid creation of new formats, and modification  
of existing formats (adding/removing tags, attributes etc), without requiring  
code changes in the validation software.  
This is not in any way intended to be an alternative to XML schemas. In the  
absence of code supporting XML schemas, I created this, but it is meant as  
a transitional work only.  
For more on XML schemas, see:  
http://www.w3.org/1999/05/06-xmlschema-1/ and  
http://www.w3.org/1999/05/06-xmlschema-2/  
This is also not meant to replace DTDs. There are many things that you can do  
with DTDs that you cannot do with this format. For example, you cannot declare  
entities with this format. You must do that in the DTD. If you want your  
parser to interpret them correctly, you must use a validating parser.  
It is possible to use these schemas without DTD validation, however you may run  
into problems with entity expansion and other things.  
Dan Libby - [email protected]  
: rss-spec-0.91.html,v $  
Revision 1.1.2.2 2001/11/09 08:10:07 dprusak  
Merged for 6.2  
  
Revision 1.1.2.1 2001/10/17 22:25:28 dprusak  
NewMyNetscape  
Revision 1.1.2.1 2001/05/03 00:44:50 hoangtv  
adding DTD definition  
Revision 1.4 1999/09/10 03:01:44 jquach  
removed comments  
Revision 1.3 1999/09/10 03:01:24 jquach  
pulled ref to internal file  
Revision 1.2 1999/08/07 04:53:02 danda  
'cleaning' (removing useful info) for public release  
Revision 1.3 1999/08/07 04:52:12 danda  
'cleaning' (removing useful info) for public release  
Revision 1.2 1999/07/22 07:09:41 danda  
fixing examples, RDF Site Summary -> Rich Site Summary  
Revision 1.1 1999/06/09 07:01:29 danda  
adding schema and dtd for rss 0.9 and 1.0  
-->  
<!--  
Tag: Schema  
Description: Document wrapper.  
Sub tags: Element & Attribute  
Attributes: version, root, name  
Notes:  
version must be "DKHXVF 1.0"  
root is the document root.  
-->  
<!ELEMENT Schema (Element | Attribute)*>  
<!ATTLIST Schema  
    version CDATA #FIXED "DKHXVF 1.0"  
    root CDATA #REQUIRED  
    name CDATA #REQUIRED>  
<!--  
Tag: Element  
Description: Definition of an allowed element (tag)  
Sub tags: Contains, Attrib, Matches  
Attributes: id, type, min, max, exactly  
Notes: exactly="1" is equivalent to min="1" max="1"  
-->  
<!ELEMENT Element ((Contains | Attrib)* | Matches?)>  
<!ATTLIST Element  
    id CDATA #REQUIRED  
    type (int | float | container | string | timestamp) #REQUIRED  
    min CDATA #IMPLIED  
    max CDATA #IMPLIED  
    exactly CDATA #IMPLIED>  
<!--  
Tag: Contains  
Description: Defines rules for a sub-element.  
Sub tags: None, this tag must be empty.  
Attributes: ref, min, max, exactly  
Notes: ref must refer to the 'id' of an element defined elsewhere or the schema  
is invalid.  
-->  
<!ELEMENT Contains EMPTY>  
<!ATTLIST Contains  
    ref CDATA #REQUIRED  
    min CDATA #IMPLIED  
    max CDATA #IMPLIED  
    exactly CDATA #IMPLIED>  
<!--  
Tag: Attrib  
Description: Defines rules for an element attribute.  
Sub tags: None, this tag must be empty  
Attributes: ref, min, max, exactly  
Notes: ref must refer to the 'id' of an Attribute defined elsewhere or the schema  
is invalid.  
-->  
<!ELEMENT Attrib EMPTY>  
<!ATTLIST Attrib  
    ref CDATA #REQUIRED  
    min CDATA #IMPLIED  
    max CDATA #IMPLIED  
    exactly CDATA #IMPLIED>  
<!--  
Tag: Attribute  
Description: Definition of an allowed attribute  
Sub tags: Matches  
Attributes: id, type, min, max, exactly  
Notes: none  
-->  
<!ELEMENT Attribute (Matches?)>  
<!ATTLIST Attribute  
    id CDATA #REQUIRED  
    type (int | float | string | timestamp) #REQUIRED  
    min CDATA #IMPLIED  
    max CDATA #IMPLIED  
    exactly CDATA #IMPLIED>  
<!--  
Tag: Matches  
Description: A regular expression that values will be compared against  
Sub tags: None  
Attributes: None  
Notes: Matches may be used for elements of any type but container, and for attributes.  
An example of a useful matching pattern is:  
<Matches>^(foo|bar|foobar)$</Matches>  
This will allow any values that exactly match "foo", "bar", or "foobar".  
Whitespace is allowed in the regex and '#' is used for comments. The following  
is valid:  
<Matches>  
&# # Start of a numeric entity reference, xml escaped &  
(?P<char> # xml escaped <, >  
[0-9]+[^0-9] # Decimal form  
| 0[0-7]+[^0-7] # Octal form  
| x[0-9a-fA-F]+[^0-9a-fA-F] # Hexadecimal form  
)  
</Matches>  
which is equivalent to: <Matches>&#(?P<char>[0-9]+[^0-9]| 0[0-7]+[^0-7]| x[0-9a-fA-F]+[^0-9a-fA-F])</Matches>  
For help on regular expressions, see:  
http://www.python.org/doc/howto/regex/regex.html or  
http://www.ciser.cornell.edu/info/regex.html  
-->  
<!ELEMENT Matches (#PCDATA)>  
<!--  
Example of a DKHXVF 1.0 file:  
<?xml version="1.0"?>
<!DOCTYPE Schema
  PUBLIC '-//Netscape Communications//DTD Schema 1.0//EN'
  'http://my.netscape.com/publish/formats/schema-1.0.dtd'>
<Schema name="RSS 0.9" root="rdf:RDF" version="DKHXVF 1.0">
    <Element id="rdf:RDF" type="container">
        <Contains exactly="1" ref="channel"/>
        <Contains max="1" min="0" ref="image"/>
        <Contains max="15" min="1" ref="item"/>
        <Contains max="1" min="0" ref="textinput"/>
        <Attrib exactly="1" ref="xmlns"/>
        <Attrib exactly="1" ref="xmlns:rdf"/>
    </Element>
    <Attribute id="xmlns" type="string">
        <Matches>http://my.netscape.com/rdf/simple/0.9/</Matches>
    </Attribute>
    <Attribute id="xmlns:rdf" type="string">
        <Matches>http://www.w3.org/1999/02/22-rdf-syntax-ns#</Matches>
    </Attribute>
    <Element id="channel" type="container">
        <Contains exactly="1" ref="link"/>
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="description"/>
    </Element>
    <Element id="item" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="link"/>
    </Element>
    <Element id="image" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="link"/>
        <Contains exactly="1" ref="url"/>
    </Element>
    <Element id="textinput" type="container">
        <Contains exactly="1" ref="title"/>
        <Contains exactly="1" ref="description"/>
        <Contains exactly="1" ref="link"/>
        <Contains exactly="1" ref="name"/>
    </Element>
    <Element id="title" max="100" min="1" type="string"/>
    <Element id="description" max="500" min="1" type="string"/>
    <Element id="url" max="500" min="1" type="string">
        <Matches>^(http://|^ftp://)</Matches>
    </Element>
    <Element id="link" max="500" min="1" type="string">
        <Matches>^(http://|^ftp://)</Matches>
    </Element>
    <Element id="name" max="20" min="1" type="string"/>
</Schema>
-->
⚠️ **GitHub.com Fallback** ⚠️