PREV NEXT
XML Fundamentals:
- XML markup describes and provides structure to the content of an XML document or data packet.
- Unlike HTML, XML is case-sensitive including element-tags and attribute values.
- XML uses most of the characters defined in the 16-bit unicode character set.
- 2 unicode formats are the basis of XML characters – UTF-8 and UTF-16.
- 3 control characters are:
XML Control characters Horizontal Tab(HT) 09 Line Feed (LF) 0A Carriage-Return (CR) 0D - 5 special markup characters are: < > & ” ‘ . These characters have alternate representations in the form of entity references.
- When representing a legal XML name, the first character must be either a unicode character, an underscore or a colon. The other characters may be one of these – unicode character, unicode number, underscore, colon, hyphen or a period.
- Colon char should not be used except as a namespace delimiter
- Colon char should not be used except as a namespace delimiter
- Elements are the basic building blocks of XML markup. Tags consist of element type names.
- Everything between the start-tag and the end-tag of an element is contained within that element.
- Examples
Examples of legal XML names < ElementName> Not Allowed Allowed < /Name> Not Allowed Not Allowed Allowed Not Allowed Allowed Not Allowed - Empty element tags may have associated attributes
- XML documents have three parts -prolog (optional), body (required) and epilog(optional)
- Document root/ Document entity is the root element of the XML document (which is not visible), this has a subtree(body), the root element of that subtree is called Document element/Root element.
- Prolog may contain – XML declaration, comments, PIs, DOCTYPE declaration
- Epilog may contain – PIs or comments.
- XML data is in the form of a simple hierarchical tree.
- All elements must be properly nested, no overlapping of tags is allowed.
- String literals are used for the values of attributes, internal entities and external identifiers.
- All string literals are enclosed by apos (‘) or quot (“)
- Attributes are comprised of name-value pairs.
- Permissible values for attributes are — Text characters, Entity references, character references. Forbidden characters in attribute values: < and >. Use the entity references instead. Only one instance of attribute name is allowed within a given tag.
- All whitespace characters in the content are preserved and whitespace within element tags and attribute values may be removed.
- 3 combinations of chars for end-of-line are: CR-LF, CR only, LF only. All these strings are converted to a single LF character.
- Except for the 5 built-in entity references, all entities must be defined prior to their use.
- Comments in XML must follow these rules –
- Cannot have double hyphen within the string
- Cannot be nested
- Cannot be put in the start or end tag
- Extra hyphen at the end is illegal
- CDATA Section in XML must follow these rules –
- Cannot be empty
- Cannot be nested
- Text in the CDATA section cannnot contain “]]>”
- CDATA Section in XML must follow these rules –
- Order of attributes: version, encoding, standalone is fixed.
- Version attribute is required, encoding and standalone are optional.
- Default value for standalone is “no”
- If encoding is other than UTF-8 or UTF-16, it must be specified.
- Encoding values are not case-sensitive
- Special meaning attributes – xml:lang and xml:space( can have values preserve or default)
- XML document has logical and physical structure. Physical – document has storage units: entities. Logical – document is composed of declarations, elements, comments, char references and PIs
- Document Type Declaration contains or points to markup declaration that provides a grammar for a class of documents. This grammar is known as Document Type Definition.
- DOCTYPE declaration must appear before the first element in the document
- No attribute name may appear more than once in the same start tag or empty element tag.
- Attribute values cannot contain direct or indirect entity references to external entities.
- Markup takes the form of start-tags, end-tags, empty-element-tags, entity references, character references, comments, CDATA sections, DOCTYPE declaration, PIs. All text that is not markup is the character data of the document
- Each XML document has one entity -document entity that serves as the starting point for the XML processor.
- < and & characters may only appear as such in comments, PIs or CDATA Sections, otherwise these are substituted by respective entity references.
- ID, IDREF, IDREFS, ENTITY names – all must be legal XML names NMTOKEN, NMTOKENS, enumerated values should be legal NmTokens
Post Your Thoughts