Saturday, 19 May 2012

iTeX vs LaTeX

LaTeX is a widely used document markup language and a document preparation system for the TeX typesetting program. Whereas iTeX can be seen as a downgraded LaTeX. The differences arising because of the way we write a research paper (long and technical) and the way we put stuff on the web (short and snappier). Essentially iTeX is a pure converter whereas LaTeX is a mixture of a converter and renderer (technically LaTeX is the rules to convert the input to TeX which is then rendered by TeX).

iTeX is very similar to the Standard LaTeX but with a few differences keeping in mind that iTeX produces MathML.

There are quite a bit of differences between iTeX and TeX :

1. In iTeX $abc$ would be a single token which when converted to MathML would be <mi>abc</abc>

However $a b c$ would be three tokens which when converted to MathML will be <mi>a</mi><mi>b</mi><mi>c</mi>

but it is important to note that the TeX considers both the above to be the same.

2. Numbers: $10^20$ will be 10^(20)  in iTeX whereas it will be 10^(2)0 in LaTeX , hence it is always safe to use curly brackets to be consistent across like $10^{20}$

3. Whitespace : $a \textrm{ and } b$ will be x and y in LaTeX whereas in iTeX it will be xandy. The reason behind this being the fact that mtext elements in MathMl doesn’t have fore and aft whitespaces.

4. As MathML doesn’t know the difference between unary operators and binary relations it is inconvenient for iTeX to do so.

5. iTeX doesn’t parse math if it includes non-ascii characters

6. It is possible to insert MathML markup inside iTeX equations making “<” and “>” pretty significant. \lt and \gt are used to get less-than and greater-than signs.

A much more detailed look into LaTeX will follow up in the next post.

References:

1. http://www.latex-project.org/guides/

2. http://golem.ph.utexas.edu/~distler/blog/itex2MML.html

Wednesday, 9 May 2012

A look into MathML


Mathematical Markup Language is an application of Extensible Markup Language (XML) for describing mathematical notation and capturing both its structure and content. The main aim of MathML is to integrate math with the World Wide Web. Essentially MathML is for math what HTML is for text. 

As mentioned before MathML deals with both the structure and the content of a mathematical notation. The structure part is called Presentation MathML and as the name suggests it deals with the display of the notation, equation or formula. Whereas the content part is called Content MathML and it focuses on the semantics. 
An example of a Presentation MathML for  is :

 <math>
  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mrow>
          <mo>-</mo>
          <mi>b</mi>
        </mrow>
        <mo>
          &#xB1;<!--PLUS-MINUS SIGN-->
        </mo>
        <msqrt>
          <mrow>
            <msup>
              <mi>b</mi>
              <mn>2</mn>
            </msup>
            <mo>-</mo>
            <mrow>
              <mn>4</mn>
              <mo>
                &#x2062;<!--INVISIBLE TIMES-->
              </mo>
              <mi>a</mi>
              <mo>
                &#x2062;<!--INVISIBLE TIMES-->
              </mo>
              <mi>c</mi>
            </mrow>
          </mrow>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mo>
          &#x2062;<!--INVISIBLE TIMES-->
        </mo>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>

As seen above every valid MathML expression is wrapped in outer <math> tags which shows each instance of MathML markup within a document. 

The presentation elements have 2 classes – Token Elements (symbols, numbers, names etc.) and Layout Schemata (which builds expressions out of the parts and have only elements as its content). Here we are using various token elements such as mi – identifier, mo – operator, mn – number. And general layout schemata elements such as mrow (groups any numbers of sub-expressions horizontally), mfrac (fraction of 2 sub-expressions), msqrt (square root)

Also above we can see that we write b^2 using superscript and two letters written side by side will mean two variables multiplied together which shows that the presentation markup just holds the structure and we need content markup to put in meaning into the formula. 

Content MathML for the same formula would be:

<math>
  <apply>
    <eq/>
    <ci>x</ci>
    <apply>
      <divide/>
      <apply>
        <plus/>
        <apply>
          <minus/>
          <ci>b</ci>
        </apply>
        <apply>
          <root/>
          <apply>
            <minus/>
            <apply>
              <power/>
              <ci>b</ci>
              <cn>2</cn>
            </apply>
            <apply>
              <times/>
              <cn>4</cn>
              <ci>a</ci>
              <ci>c</ci>
            </apply>
          </apply>
        </apply>
      </apply>
      <apply>
        <times/>
        <cn>2</cn>
        <ci>a</ci>
      </apply>
    </apply>
  </apply>
</math>

Content MathML represents mathematical objects as expression trees (i.e. applying operator to sub objects). Hence, the terminal nodes represents basic math objects such as numbers, variables etc. and the internal nodes represent mathematical constructions or function applications. 

Token elements – ci (represent variables) ,cn(numbers), Predefined functions elements – divide, minus, plus are used here. And as we can see above that the Apply element groups the function with its arguments syntactically.

About thirty-eight of the MathML tags describe abstract notational structures, while another about one hundred and seventy provide a way of unambiguously specifying the intended meaning of an expression.


Monday, 23 April 2012

and here it comes... GSoC 2012 - Accepted \m/\m/\m/

After a lot of wait and anxious moments, here it comes... I got selected for GSoC 2012. I'll be working for AbiWord, the supercool cross platform open source word processor under the mentorship of Jean Brefort. And my project is to "Implement and Improve the import and export of math from/to odt, doc & docx formats".

A total of 6 students were selected for AbiWord this year :



After I get done with my end semester exams by 2nd of May, I plan to get into the action with full energy and not only complete my project but contribute as much as possible and in the process learn as much as i can.

Looking forward to a Summer full of learning, fun, excitement and a lot of code :)

Friday, 13 April 2012

Let the Fun begin !

This blog is aimed at keeping track on my open source ventures. I've always been awed by the concept of FOSS (free and open source software), but never actually got my own hands dirty. But now with the summer holidays and my ever growing passion in programming, I'm in and I'm in for good.

I'm starting out by working for an awesome open source cross platform word processor AbiWord. I've been aware of its existence and I've actually seen people use it in many low config PCs (those which couldn't afford the heavy requirements of MS Office) but never really contributed to it but in the process of applying for GSoC 2012, I'm looking into it quite deeply. I've worked on bugs & created a few patches, essentially getting a flavor of the code.

Truth be told, I'm totally impressed by how things work in the AbiWord community, with so many people from different continents working in collaboration. Such is their dedication that even after keeping full time jobs and families they spend a lot of time hacking for AbiWord and that too all voluntarily, that i think is the beauty of open-source. And I'm loving it and i think this is something I'm going to sink right in.

The Judgement Day (23rd April) - the day GSoC result comes out (eagerly waiting for it :)), I've applied for the project of improving the math import/export in Abiword with the center of attraction being the MathML to itex convert as AbiWord uses itex as its Math Composition Language. For instance currently AbiWord can import the MathML of odt but we can't edit it inside AbiWord.

Even though getting selected will be a great honor and responsibility, i plan on to dropping all my other internship options (foreign interns - I'm sorry !) and do this no matter what and use this blog to keep a track of my work.