Canonical XML Schema

From CASRAI

This document describes the conventions used in creating CASRAI standards.  These standards are created using Xml and Xml Schema technologies.  The rules in this guide highlight the naming and design rules which enable the encoding of business rules in Xml Schema.   Because Schema provides a rich array of design choices, this guide is intended to show best practices for consistency and flexibility.  This specification provides a means to identify, capture and maximize the re-use of business information expressed as XML Schema components within CASRAI and CASRAI extensions in order to support information interoperability across integrated environments.

Scope and Focus

Support for International Standards

Where possible and practical, CASRAI will make use of existing international standards.  This will maximize interoperability as well as shorten the development of CASRAI standards by leveraging already-defined components.

Examples of existing specifications that could be utilized include ISO date formats and UN/CEFACT core components.  Selective evaluation of these and others will occur as CASRAI standards are produced.

Acknowledgements

The concepts and patterns in this document are based upon industry common practices across many domains. A portion of the designs were developed by the Open Applications Group (OAGi) and adopted by other standards organizations and individual enterprises, including HR-XML (Human Resources Xml), CIDX (Chemical Industry Data eXchange), STAR (Standards for Technology in Automotive Retail), and others.  In this case, the designs are derived work based on the OAGIS® Release 9.3 specification.  That work is protected by the following copyright.  Copyright © Open Applications Group (1997-2009). All Rights Reserved.

In all cases, the designs are open and freely adoptable and CASRAI wishes to acknowledge these original works.

Terminology and Notation

The key words, “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in Internet Engineering Task Force (IETF) Request For Comments (RFC) 2119. Where ever xsd: appears it is references to constructs from W3C XML schema specification. The following is a notation convention used throughout this document:

[CASRAI R #] – Identifies a specific rule with a numeric reference number “#”.  These numbers can be used later for quality control during review of CASRAI standards.

Conformance

Text in-progress.

CASRAI XML Constructs

Relationship to other standards

1. All XML Schema design rules MUST be based on the W3C XML Schema Recommendations: XML Schema Part 1: Structures Second Edition and XML Schema 1.1 Part 2: Datatypes.

Naming and Modeling Rules

2. Xml element, attribute and type names MUST be composed of words in the English language, using the primary English spellings provided in the Oxford English Dictionary.
3. LowerCamelCase (LCC) MUST be used for naming attributes.
4. UpperCamelCase (UCC) MUST be used for naming elements and types.

CASRAI will utilize the best practice of “camel casing” the names of elements, attributes, and data types.  Camel case is defined as the use of English words merged together (white space removed) with each individual word containing either an initial upper or lower case letter followed by all lower case letters.  Upper Camel Case will consist of an initial upper case letter and Lower Camel Case will consist of an initial lower case letter.

Examples:

<PersonName languageCode="en-ca">
<GivenName>James</GivenName>
<FamilyName">Lunnix</FamilyName>
<SuffixName type="generation">III</SuffixName>
</PersonName>

<Address type="mailing" languageCode="en-ca">
<AddressLine>350 Sparks St.</AddressLine>
<AddressLine>Suite 1200</AddressLine>
<CityName>Ottawa</CityName>
<CountrySubDivisionCode>ON</CountrySubDivisionCode>
<CountryCode listAgencyName="ISO"
listAgencyID="3166 Country Codes">CA</CountryCode>
<PostalCode listAgencyName="Canada Post">K1R 7S8</PostalCode>
</Address>

<xsd:simpleType name="EffectiveDateTimeType">

<xsd:complexType name="PersonNameType">
5. Element, attribute and type names MUST be in singular form unless the concept itself is plural.

For example, when referring to a code of some kind, the singular element name <Code> is used.  Where multiple codes are grouped together, then a parent of this element could be called <Codes>.

<Codes>
<Code></Code>
<Code></Code>
</Codes>
6. Element, attribute and type names MUST be drawn from the following character set: a-z and A-Z.

The use of non-alphabetic characters in element, type, or attribute names is prohibited.  This includes punctuation or separators such as underscores and dashes as well as numbers.

Examples of prohibited character usage:

<Grant_Amount>
<Grant-Approval>
<AddressLine1>  <AddressLine2> <AddressLine3>
7. XML element, attribute and type names MUST NOT use acronyms, abbreviations, or other word truncations except those included in a controlled vocabulary or listed in this document.  Acronyms and abbreviations SHOULD be commonly accepted business terminology and widely recognized.
8. The acronyms and abbreviations listed in this document C MUST always be used for consistency.
9. Acronyms and abbreviations at the beginning of an attribute declaration MUST appear in all lower case. All other acronyms and abbreviation usage in an attribute declaration must appear in upper case.
10. Acronyms MUST appear in all upper case for all element declarations and type definitions.

Examples of acronym usage as an element and an attribute.  In this case, the term is Value Added Tax or VAT:

<xsd:complexType name="PersonNameType">

<xsd:element name="PersonName" type="PersonNameType">

Namespace Strategy

Background

Both xml instances and xml schemas need to have a namespace strategy.  Xml Namespaces is a component of the Xml Schema specification that enables components to be clearly delineated and identifiable from the parser perspective.  

In the example below, a Person xml instance is said to be in the namespace “http://www.example.com”

<Person xmlns="http://www.example.com">Jules Verne</Person>

This is differentiated from a Person defined in an accounting context.  Below, a Person xml instance is said to be in the namespace “http://www.accountingexample.com”

<Person xmlns="http://www.accountingexample.com">123456789</Person>

Because of the namespaces, an xml parser is able to delineate between the two and validate their respective data types.  When using namespaces in this fashion, one consortia can leverage the work of another, or vice versa, without any name conflicts.  In addition, any user extensions can be made with the confidence that no parser problems will ensue.

Xml Namespaces can also be written two ways.  They can have no prefix (said to be “default”) or a specified prefix.  Both of the following examples represent the exact same thing to the xml parser.  The latter simply makes use of an explicit prefix whereas the former is defaulted.

<Person xmlns="http://www.example.com">Jules Verne</Person>

<ex:Person xmlns:ex="http://www.example.com">Jules Verne</ex:Person>

Namespaces are used both in the xml instance as well as in an Xml Schema.  In the following example, an Xml Schema element is declared.  The prefix “xsd” is paired with the attribute “xmlns:xsd” to indicate that this belongs to the W3C Xml Schema specification namespace.

<xsd:element name="PostalAddress" type="PostalAddressType"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">

Best practice

The best practice is for Xml Namespace to reflect two pieces of metadata.  First, the ownership of the element  or type in question.  The value of the xmlns attribute is written to be an unique identifier for the owning agency.  

Second, the Xml Namespace is used to reflect major version of said components.  Take for example the namespace of Xml Schema:

xmlns:xsd="http://www.w3.org/2001/XMLSchema"

This namespace reflects the owning organization (W3.org) as well as the major version (2001) of the specification.  In this fashion,

12. CASRAI SHALL use an Xml Namespace consisting the unique identifier “http://www.casrai.org/schema/” as well as it major version number.

For example, the first major release version of CASRAI will have this namespace:

xmlns:ca="http://www.casrai.org/schemas/1"
13. The default Xml namespace in an Xml Schema will match the targetNamespace attribute.

It is also best practice for the namespace in an Xml Schema to match that of the targetNamece.  This causes the namespace in the instance and schema to match, thus making it easier to understand.

Xml Schema root element:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns="http://www.casrai.org/schemas/1"
targetNamespace="http://www.casrai.org/schemas/1">

Xml instance:

<PersonName xmlns="http://www.casrai.org/schemas/1">Jules Verne</Person>
14. CASRAI namespaces will use a Uniform Resource Locator (URL).

Although an Xml namespace is essentially a string, there is a common practice for creating them.  A unique identifier and version number can be done via a Uniform Resource Name (URN).  However, it is commonly restricted to the set of URLs.  CASRAI will follow this pattern.

Versioning

CASRAI standards will need to change over time as new parts of data models are added to transactions defined.  As these changes take place, the need to proactively manage version control becomes clear.

Major version

It is considered best practice to use the Xml Namespace to indicate a major version of a specification.  One reason for this design is to allow multiple major versions of a specification to exist in the same software tool simultaneously without any chance of conflict.  For example, a common vendor tool implementation scenario is to support the latest version of a specification as well as the most recent major one at the same time.  This allows customers to migrate to a new version based on market conditions and as their own business needs require.  One way to facilitate this is through the namespace management.  A Curriculum Vita (CV) can be versioned as:

<CV xmlns="http://www.casrai.org/schemas/1">  <!—version 1.0 -->

<CV xmlns="http://www.casrai.org/schemas/2"> <!—version 2.0 -->

The same element name is used; however, they can exist in the same cache of files without conflict.

Rule [CASRAI R 12]    states that “CASRAI SHALL use an Xml Namespace consisting the unique identifier “http://www.casrai.org/schema/” as well as it major version number. “  This enables the namespace pattern described here.

Minor Version

Minor version changes are ones that are considered backwardly compatible.  More specifically, the schemas in a minor version update will also validate the previous version’s xml instance documents.

15. Minor version updates in an Xml Schema will backwardly compatible with the immediate preceding version.  Xml instances from this previous version will validate against the Xml Schema(s) of the minorly updated one.

Examples:

<CV xmlns="http://www.casrai.org/schemas/1">  <!—version 1.0 -->
<CV xmlns="http://www.casrai.org/schemas/1"> <!—version 1.1 -->

<CV xmlns="http://www.casrai.org/schemas/2"> <!—version 2.0 -->
<CV xmlns="http://www.casrai.org/schemas/2"> <!—version 2.1 -->
<CV xmlns="http://www.casrai.org/schemas/2"> <!—version 2.3 -->

In order to delineate between different minor versions, an attribute on the root element will be used to indicate what specific version is in a given Xml instance.  This attribute, named “releaseID” will contain the numeric or release number.

16. Xml Schemas MUST have a “releaseID” attribute on the root element of a transactional schema.  It MUST be required.
<xsd:attribute name="releaseID" type="xsd:string" use=required>

A string should be used as a data type because a full enumerated list of version numbers is not known at design time of the first one.

XML Schema Practices

Xml Schema has a rich toolset of features and functionality.  In fact, there is often more than one way to design a content model.  This makes it imperative that best practices and consistent approach be employed for the creation of Xml Schema components.  It is the intent of this guide to use common practices used in other Xml consortia.

In fact, a survey of Xml consortia Schema design practices was conducted in an article “Profiling Xml Schema”. This guide makes use of this and other works describing the most commonly used Schema design practices available.

Schema Element

The root element in all CASRAI Schema will have the exact same syntax.  This reflects the best practice in namespaces, and form defaults.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns="http://www.casrai.org/schemas/1"
targetNamespace="http://www.casrai.org/schemas/1"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
17. The elementFormDefault MUST be set to “qualified”.
18. The attributeFormDefault MUST be set to “unqualified”.

The form default setting for the elements and attribute is relating to the explicitness of Xml Namespaces in each.  The best practice for the element form is to be “qualified” which means that namespaces are clear, present, and unambiguous.  The attributes however are always intimately connected to the element to which it belongs.  This renders the explicitness of namespace unnecessary.  It is therefore a best practice to set it to “unqualified”.

Attribute and Element Declarations

Elements and attributes will follow the Xml Schema defined method for declaration, reuse, and derivation as stated in the specification.  However, for semantic clarity and predictability, the follow rules are added.

19. Xml Schema elements and attributes MUST have explicit data types.
<xsd:element name="CV"/> 

This element is declared but given no data type.  This is legal Xml Schema; however, the inheritance of “anyType” is unnecessarily ambiguous and confusing to implementers.  It is therefore a rule to have a data type explicitly made.

Type Definitions

As stated, the type definitions in CASRAI standards will follow upper camel case naming and have a “Type” keyword suffix.

20. All type definitions MUST be named.

In addition, anonymous data types are not reusable.  In order to facilitate the maximum amount of reuse, the data types should be defined globally and named.  There may be places where locally scoped (or anonymous) types may be necessary, but the default behavior should be to name all data types.

Example of anonymous (unnamed) data type:

<xsd:element name="PersonName">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="FamilyName"/>
<xsd:element ref="GivenName"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

Example of named (and thus reusable) data type:

<xsd:complexType name="PersonNameType">
<xsd:sequence>
<xsd:element ref="FamilyName"/>
<xsd:element ref="GivenName"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="PersonName" type="PersonNameType"/>
<xsd:element name="GranteeName" type="PersonNameType"/>
<xsd:element name="SubmitterName" type="PersonNameType"/>
21. All data types developed SHOULD be named (globally scoped).

Compositors

Xml Schema provides three methods of defining the arrangement of a complex content model.  The first method is for child elements to occur in an ordered sequence using the xsd:sequence compositor.  Additionally, the content model may be a choice between the occurrence of several elements, which is indicated with the xsd:choice element.  Finally, the existence of all child elements in no particular order is illustrated with the xsd:all compositor.

In order to maximize the predictability of the content model, the xsd:all element will not be used.  This element can lead to what Xml Schema refers to as “ambiguous content models” which cause validation issues and hence interoperability.

22. The xsd:all element MUST NOT be used.

Occurrence constraint must be posted on elements and attributes, not on groups or compositors.

Limitations on Extension and Restriction

Xml Schema has a wide array of features that enable the creation of complex data models.  In order for these features to be implementable, they must be widely supported and available in tools.  Two features of Xml Schema are not well supported in tools are xsd:redefine and complexType restriction.  The former is way to take a schema definition and redefine it so as to match a particular content model. Fortunately, redefining a content model is not needed for building a robust library of reusable data structures.  So it can be avoided without a negative impact on the library.

23. The xsd:redefine element MUST NOT be used.

The second feature of Xml Schema that is not widely used or supported is complexType restriction.  This is where a larger data model that contains child elements is restricted down to a subset of that model when reused.  The loss of complexType restriction is also not an impediment to the creation of CASRAI standards.  This is because the CASRAI library is build from smaller, reusable components and aggregated into larger and more complex models.  It is a “bottom up” or additive based structure.  This is in contrast to complexType restriction which is a “top down” or restrictive approach.  Thus this feature of Xml Schema can be safely avoided.

24. xsd:complexType restriction MUST NOT be used.

Schema Modules

The following is a file structure for the CASRAI library of schema modules.  The top level or root is simply a folder for containing all versions of the standards.  The first sublevel is a folder which indicates a version of the library as a whole.  In the case here, version “1_0” is shown.  If there was a version 1_1 or 2_0, then corresponding folders would exist as siblings and contain those libraries.

Within a version of the library, there are four basic pieces, named Documents, Instances, Packages, and Profiles.  Each has a corresponding folder.

The Documents folder contains any augmenting documents associated with the standard library.   No schemas or data models.  The Instances folder contains Xml documents that conform to the profiles, packages or other schemas in the library.  The Packages folder contains a schema that governs how a package is constructed and validated within CASRAI.  Finally, the Profiles folder contains Xml Schemas governing various projects and data models in the Consortium.

Architecture Considerations

File Organization

CASRAI files will be organized in a way as to facilitate ongoing developing standards efforts.  In any case, the file names should be intuitive and unambiguous.  One way to facilitate this is to use the same naming as is used in the xml itself.  In this manner, a consistency and predictability make for easiest to consume schema for new implanters

25. Schema files should be named according to upper camel casing rules.

In addition, the consumption of schemas can be made even more intuitive by naming the file according to its root element.  For example, the file named “PersonName.xsd” would have an assumed root element of <PersonName>.

26. Schema files SHOULD be named the same as its root element, if there is a central or single root one.

This is not a firm rule, however, as it will become necessary to group numerous general use components into a single, reusable schema file.  This file can be simply named “Components.xsd”.

In addition, enumerated lists can be managed as in one single place.  This file, named “Lists.xsd” contains all standard picklists for the various standards in CARAI.  Each profile will make use of this Lists.xsd as an xsd:include.

Component Re-Use

The physical working schema files will be organized in a modular fashion to facilitate reusability.  A major component can be put into an Xml Schema file by itself as a cohesive object.  This in turn can be reused through CASRAI standards via the “xsd:include” statement.   For example,  a CV schema may reuse the schema modules PersonName and PostalAddress.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns="http://www.casrai.org/schemas/1"
targetNamespace="http://www.casrai.org/schemas/1"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xsd:include schemaLocation="PersonName.xsd"/>
<xsd:include schemaLocation="PostalAddress.xsd"/>
<xsd:include schemaLocation="CommonComponents.xsd"/>
...
</xsd:schema>

Stand Alone Files

Occasionally there are Xml tools that have a problem managing numerous nested Schema files (files connected via xsd:include).  Indeed, the great reuse achieved via the xsd:include statements can be difficult when it comes to tooling.  Therefore, a set of “stand alone” or runtime schemas will be automatically generated.  The resulting Schemas have an identical content model and namespace of its original.  However, the files included via the “xsd:include” statements will all be merged into a single file.  A runtime version of the schemas can be used for validation and work effectively in production tools.

Extensibility

Extension of the standard is an inevitable need for real world implementation.  All possible requirements for all contexts cannot be known as design time.  For this reason, a consistent element will be used to both enable extensions and clearly delineate between the standard and extensions

27. Extensions to standard document models will be accommodated by the use of a <UserArea> element.  This element MUST be of type xsd:any
28. The UserArea element SHOULD occur as the last child on major complex structures.

Example:

<CV xmlns="http://www.casrai.org/schemas/1">
<PersonName> ... </PersonName>
<PostalAddress> ... </PostalAddress>
<Competencies> ... </Competencies>
<UserArea>

</UserArea>
</CV>