Extensible Markup Language (XML) is a markup language which defines a set of rules for encoding documents in a human & machine readable format. It is a markup language much like HTML but with a completely different goal, HTML was designed to display data with a focus on how data looks whereas XML is about transporting and storing data with a focus on what the data is. Unlike HTML, there are no tags defined for XML and it is designed to be self descriptive. Since it allows users to define their own tags, there is a data definition table (DTD) required to decode the data, which is defined near the top of the file.
Some common constructs that appear in XML:
XML Declaration : <?xml version="1.0" encoding="UTF-8"?> , this is XML declaration which is not required but it identifies the document as XML and indicates the version of XML.
Character : Any XML document is a string of characters and almost every legal Unicode character can appear in it. All XML processors must be able to read entities in both the UTF-8 & UTF-16 encodings.
Markup and Content : The contents in an XML document are divided into Markup and Content, which are distinguished by simple syntactic rules. Like all strings which constitute Markup either begin with the character < and end with >, or begin with & and end with ; And the strings of characters which are not Markup are the Content.
Tags are the markup construct which begin with < and end with > (it can be a start-tag <block> , end-tag </block> or a empty-element tag <line-break />). And the the document component which starts with a start-tag and ends with the end-tag or consists of an empty-element tag is called Element. And the content within the tags are Element’s content which might contain child elements as well.
Attributes : another markup construct which contains the name/value pair
The processor analyzes the markup and passes the structured information into an application. This processor is often called an XML parser. Many word processing programs have XML as their native document format for e.g. our very own AbiWord (.abw documents are XML)
XML in AbiWord : AbiWord uses a straightforward XML document format in which appearance and layout are specified in CSS-like attributes but only as a starting point. An entire XML source of a document (sample.abw) created in AbiWord which contains the text “AbiWord Rocks!) looks like :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE abiword PUBLIC "-//ABISOURCE//DTD AWML 1.0 Strict//EN" "http://www.abisource.com/awml.dtd">
<abiword template="false" xmlns:ct="http://www.abisource.com/changetracking.dtd" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:math="http://www.w3.org/1998/Math/MathML" xid-max="2" xmlns:dc="http://purl.org/dc/elements/1.1/" styles="unlocked" fileformat="1.0" xmlns:svg="http://www.w3.org/2000/svg" xmlns:awml="http://www.abisource.com/awml.dtd" xmlns="http://www.abisource.com/awml.dtd" xmlns:xlink="http://www.w3.org/1999/xlink" version="0.99.2" xml:space="preserve" props="dom-dir:ltr; document-footnote-restart-section:0; document-endnote-type:numeric; document-endnote-place-enddoc:1; document-endnote-initial:1; lang:en-US; document-endnote-restart-section:0; document-footnote-restart-page:0; document-footnote-type:numeric; document-footnote-initial:1; document-endnote-place-endsection:0">
<!-- ======================================================================== -->
<!-- This file is an AbiWord document. -->
<!-- AbiWord is a free, Open Source word processor. -->
<!-- More information about AbiWord is available at http://www.abisource.com/ -->
<!-- You should not edit this file by hand. -->
<!-- ======================================================================== -->
<metadata>
<m key="abiword.generator">AbiWord</m>
<m key="dc.creator">Prashant</m>
<m key="dc.format">application/x-abiword</m>
</metadata>
<rdf>
</rdf>
<history version="1" edit-time="14" last-saved="1338761497" uid="0e329dea-adc9-11e1-9005-9b3eee35aa57">
<version id="1" started="1338761497" uid="16910e40-adc9-11e1-9005-9b3eee35aa57" auto="0" top-xid="2"/>
</history>
<styles>
<s type="P" name="Normal" followedby="Current Settings" props="font-family:Times New Roman; margin-top:0pt; color:000000; margin-left:0pt; text-position:normal; widows:2; font-style:normal; text-indent:0in; font-variant:normal; font-weight:normal; margin-right:0pt; font-size:12pt; text-decoration:none; margin-bottom:0pt; line-height:1.0; bgcolor:transparent; text-align:left; font-stretch:normal"/>
</styles>
<pagesize pagetype="Letter" orientation="portrait" width="8.500000" height="11.000000" units="in" page-scale="1.000000"/>
<section xid="1" props="page-margin-footer:0.5in; page-margin-header:0.5in">
<p style="Normal" xid="2"><c>AbiWord Rocks !</c></p>
</section>
</abiword>
The inherent readability of XML makes the interchange and format specification quite easier. Apart from AbiWord, other formats like the Open Document Format as mentioned in the previous post is XML-based , whereas Microsoft office uses OOXML (a zipped XML based file format) as its default format now.
Some common constructs that appear in XML:
XML Declaration : <?xml version="1.0" encoding="UTF-8"?> , this is XML declaration which is not required but it identifies the document as XML and indicates the version of XML.
Character : Any XML document is a string of characters and almost every legal Unicode character can appear in it. All XML processors must be able to read entities in both the UTF-8 & UTF-16 encodings.
Markup and Content : The contents in an XML document are divided into Markup and Content, which are distinguished by simple syntactic rules. Like all strings which constitute Markup either begin with the character < and end with >, or begin with & and end with ; And the strings of characters which are not Markup are the Content.
Tags are the markup construct which begin with < and end with > (it can be a start-tag <block> , end-tag </block> or a empty-element tag <line-break />). And the the document component which starts with a start-tag and ends with the end-tag or consists of an empty-element tag is called Element. And the content within the tags are Element’s content which might contain child elements as well.
Attributes : another markup construct which contains the name/value pair
The processor analyzes the markup and passes the structured information into an application. This processor is often called an XML parser. Many word processing programs have XML as their native document format for e.g. our very own AbiWord (.abw documents are XML)
XML in AbiWord : AbiWord uses a straightforward XML document format in which appearance and layout are specified in CSS-like attributes but only as a starting point. An entire XML source of a document (sample.abw) created in AbiWord which contains the text “AbiWord Rocks!) looks like :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE abiword PUBLIC "-//ABISOURCE//DTD AWML 1.0 Strict//EN" "http://www.abisource.com/awml.dtd">
<abiword template="false" xmlns:ct="http://www.abisource.com/changetracking.dtd" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:math="http://www.w3.org/1998/Math/MathML" xid-max="2" xmlns:dc="http://purl.org/dc/elements/1.1/" styles="unlocked" fileformat="1.0" xmlns:svg="http://www.w3.org/2000/svg" xmlns:awml="http://www.abisource.com/awml.dtd" xmlns="http://www.abisource.com/awml.dtd" xmlns:xlink="http://www.w3.org/1999/xlink" version="0.99.2" xml:space="preserve" props="dom-dir:ltr; document-footnote-restart-section:0; document-endnote-type:numeric; document-endnote-place-enddoc:1; document-endnote-initial:1; lang:en-US; document-endnote-restart-section:0; document-footnote-restart-page:0; document-footnote-type:numeric; document-footnote-initial:1; document-endnote-place-endsection:0">
<!-- ======================================================================== -->
<!-- This file is an AbiWord document. -->
<!-- AbiWord is a free, Open Source word processor. -->
<!-- More information about AbiWord is available at http://www.abisource.com/ -->
<!-- You should not edit this file by hand. -->
<!-- ======================================================================== -->
<metadata>
<m key="abiword.generator">AbiWord</m>
<m key="dc.creator">Prashant</m>
<m key="dc.format">application/x-abiword</m>
</metadata>
<rdf>
</rdf>
<history version="1" edit-time="14" last-saved="1338761497" uid="0e329dea-adc9-11e1-9005-9b3eee35aa57">
<version id="1" started="1338761497" uid="16910e40-adc9-11e1-9005-9b3eee35aa57" auto="0" top-xid="2"/>
</history>
<styles>
<s type="P" name="Normal" followedby="Current Settings" props="font-family:Times New Roman; margin-top:0pt; color:000000; margin-left:0pt; text-position:normal; widows:2; font-style:normal; text-indent:0in; font-variant:normal; font-weight:normal; margin-right:0pt; font-size:12pt; text-decoration:none; margin-bottom:0pt; line-height:1.0; bgcolor:transparent; text-align:left; font-stretch:normal"/>
</styles>
<pagesize pagetype="Letter" orientation="portrait" width="8.500000" height="11.000000" units="in" page-scale="1.000000"/>
<section xid="1" props="page-margin-footer:0.5in; page-margin-header:0.5in">
<p style="Normal" xid="2"><c>AbiWord Rocks !</c></p>
</section>
</abiword>
The inherent readability of XML makes the interchange and format specification quite easier. Apart from AbiWord, other formats like the Open Document Format as mentioned in the previous post is XML-based , whereas Microsoft office uses OOXML (a zipped XML based file format) as its default format now.
No comments:
Post a Comment