XML Fundamentals

PREV NEXT

XML Fundamentals:

XML markup describes and provides structure to the content of an XML document or data packet.
Unlike HTML, XML is case-sensitive including element-tags and attribute values.
XML uses most of the characters defined in the 16-bit unicode character set.
2 unicode formats are the basis of XML characters – UTF-8 and UTF-16.
3 control characters are:

XML Control characters

Horizontal Tab(HT) 09

Line Feed (LF) 0A

Carriage-Return (CR) 0D
5 special markup characters are: < > & ” ‘ . These characters have alternate representations in the form of entity references.
When representing a legal XML name, the first character must be either a unicode character, an underscore or a colon. The other characters may be one of these – unicode character, unicode number, underscore, colon, hyphen or a period.
Colon char should not be used except as a namespace delimiter
Colon char should not be used except as a namespace delimiter
Elements are the basic building blocks of XML markup. Tags consist of element type names.
Everything between the start-tag and the end-tag of an element is contained within that element.
Examples

Examples of legal XML names

< ElementName> Not Allowed

Allowed

< /Name> Not Allowed

Not Allowed

Allowed

Not Allowed

Allowed

Not Allowed
Empty element tags may have associated attributes
XML documents have three parts -prolog (optional), body (required) and epilog(optional)
Document root/ Document entity is the root element of the XML document (which is not visible), this has a subtree(body), the root element of that subtree is called Document element/Root element.
Prolog may contain – XML declaration, comments, PIs, DOCTYPE declaration
Epilog may contain – PIs or comments.
XML data is in the form of a simple hierarchical tree.
All elements must be properly nested, no overlapping of tags is allowed.
String literals are used for the values of attributes, internal entities and external identifiers.
All string literals are enclosed by apos (‘) or quot (“)
Attributes are comprised of name-value pairs.
Permissible values for attributes are — Text characters, Entity references, character references. Forbidden characters in attribute values: < and >. Use the entity references instead. Only one instance of attribute name is allowed within a given tag.
All whitespace characters in the content are preserved and whitespace within element tags and attribute values may be removed.
3 combinations of chars for end-of-line are: CR-LF, CR only, LF only. All these strings are converted to a single LF character.
Except for the 5 built-in entity references, all entities must be defined prior to their use.
Comments in XML must follow these rules –
1. Cannot have double hyphen within the string
2. Cannot be nested
3. Cannot be put in the start or end tag
4. Extra hyphen at the end is illegal
CDATA Section in XML must follow these rules –
1. Cannot be empty
2. Cannot be nested
3. Text in the CDATA section cannnot contain “]]>”
CDATA Section in XML must follow these rules –
1. Order of attributes: version, encoding, standalone is fixed.
2. Version attribute is required, encoding and standalone are optional.
3. Default value for standalone is “no”
4. If encoding is other than UTF-8 or UTF-16, it must be specified.
5. Encoding values are not case-sensitive
Special meaning attributes – xml:lang and xml:space( can have values preserve or default)
XML document has logical and physical structure. Physical – document has storage units: entities. Logical – document is composed of declarations, elements, comments, char references and PIs
Document Type Declaration contains or points to markup declaration that provides a grammar for a class of documents. This grammar is known as Document Type Definition.
DOCTYPE declaration must appear before the first element in the document
No attribute name may appear more than once in the same start tag or empty element tag.
Attribute values cannot contain direct or indirect entity references to external entities.
Markup takes the form of start-tags, end-tags, empty-element-tags, entity references, character references, comments, CDATA sections, DOCTYPE declaration, PIs. All text that is not markup is the character data of the document
Each XML document has one entity -document entity that serves as the starting point for the XML processor.
< and & characters may only appear as such in comments, PIs or CDATA Sections, otherwise these are substituted by respective entity references.
ID, IDREF, IDREFS, ENTITY names – all must be legal XML names NMTOKEN, NMTOKENS, enumerated values should be legal NmTokens

XML Control characters
Horizontal Tab(HT)	09
Line Feed (LF)	0A
Carriage-Return (CR)	0D

Examples of legal XML names
< ElementName>	Not Allowed
	Allowed
< /Name>	Not Allowed
	Not Allowed
	Allowed
	Not Allowed
	Allowed
	Not Allowed

PREV NEXT

Like it? Please Spread the word!

PREV NEXT

PREV NEXT

Post Your Thoughts

Cancel reply