• Home
  • InfoBase
  • Dictionaries
  • Member
  • News
  • 中文网站
     Advanced Search
    Read the latest Blogs from IT professionals in the field. Read and write community created documents. Need IT help? Ask our staff. Connect with your peers. Check our Tech Shop for posters, books and software tools. Home

    XML: Extensible Markup Language

    The Extensible Markup Language ( XML ), a subset of SGML, is a W 3C recommendation of special-purpose markup languages to facilitate the sharing of structured text and information across the Internet. XML tags are not predefined. You must define your own tags. XML uses a Document Type Definition (DTD) or an XML Schema to describe the data and XML with a DTD or XML Schema is designed to be self-descriptive .

    XML is not a replacement for HTML. Actually, XML and HTML are complimentary to each other. XML and HTML were designed with different goals:

    • XML was designed to describe data/information and to focus on what data/information is.
    • HTML was designed to display data/information and to focus on how data/information looks.

    The features of XML that make it particularly appropriate for data transfer are:

    • simultaneously human- and machine-readable format
    • support for Unicode representing all current and many historical character sets
    • the ability to represent the most general computer science data structures (records, lists and trees)
    • the format is self-documenting in that it describes the structure and field names as well as specific values
    • strict syntax makes the necessary parsing algorithms simple, fast and efficient.

    XML is also heavily used as the format for document storage and processing, both online and offline, and offers several benefits:

    • robust, logically-verifiable format based on international standards
    • hierarchical structure suitable for most (but not all) types of document
    • plain text files, unencumbered by licenses or restrictions
    • platform-independent, and so relatively immune to changes in technology
    • has already been in use (as SGML) for long over a decade, and is very popular by itself, so there is extensive experience and software available.

    For certain applications, the XML format also has the following weaknesses:

    • XML syntax is fairly verbose and partially redundant.
    • XML syntax contains a number of obscure features due to its legacy of SGML compatibility.
    • XML still often requires further parsing to extract individual values. No facilities for randomly accessing or updating only portions of a document.
    • Modelling overlapping (non-hierarchical) data structures requires extra effort.
    • Mapping XML to the relational or object oriented paradigms is often cumbersome.

    Extensible stylesheet language (XSL) is a further adjunct to XML that allows users to describe visual properties and transformations of XML data without embedding those instructions into the data itself. The resulting document can then be displayed by a browser in analogy to an HTML document which uses CSS for rendering.

    The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing. Another form of XML Processing API is data binding, where XML data is made available as a strongly typed programming language data structure, in constrast to the DOM.

    The current version of XML is 1.1 (as of February 4, 2004). The first version XML 1.0 currently exists in its third revision. XML 1.0 and XML 1.1 differ in the requirements of characters used for element names, attribute names etc.: XML 1.0 only allows characters which are valid Unicode 2.0, which includes most world scripts, but excludes scripts which only entered in a later Unicode version, such as Mongolian, Cambodian, Amharic, Burmese, etc.. XML 1.1 only disallows certain control characters, which means that any other character can be used, even if the Unicode standard grows exponentially. Many markup languages such as WML, RSS etc. are based on XML.

    Related Terms: HTML, WML, RDF, RSS, MathML, XSIL and SVG