Quantcast
Channel: C Programming Archives - QnA Plus
Viewing all articles
Browse latest Browse all 93

How to Parse and Print XML File in Tree Form using libxml2 in C Programming?

$
0
0

Here we’ll see how to write C program to print XML file on the screen. XML file is widely used to store and transport data over internet. Parsing and using the data from an XML file is basic programming requirement.

Format of XML file

Before jumping into the code, it is important to understand the basic format of an XML file. Here is an sample XML file.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.</description>
   </book>
</catalog>

XML is a markup language like HTML but here the tags are not predefined set. Any name can be used as tag in case off XML format, that’s why it is called eXtensible Markup Language. Few important constructs we need to know about the XML file.
Tag: tag is the basic markup construct of the XML file which begins with < and ends with >. In the example XML file the example of tags are <catalog> and <book> etc. It could be of three types, 1) start tag such as <catalog>, 2) end tag such as </catalog> and 3) empty element tag such as <catalog />. Example of empty element tag is not available in the example XML file is not shown though.
Element: Element is logical document component which is the mail building block of an XML file. It generally starts with a start tag and ends with an end tag. It could be an empty element tag also. The characters between the start tag and end tag, if any, are call the content of the element. Element can contain markup including other elements which are call children. Our example file contains one big element catalog which has few book elements. We can imagine an XML a file as a hierarchical tree structure of elements.
Attribute: Attribute is also a markup construct which is basically a name-value pair. It exists inside a start tag or empty element tag. In our XML file id is an example of an attribute in <book id=”bk101″> start tag.

C Program to Parse and Print XML file

The C program below can read any XML file and print in a tree structure. We’ll use the above XML file as the input of the program. File name is hard coded in the program. One important think to note that standard C libraries does not include the functionality to parse XML file. For that I used libxml2. To install libxml2 development package on RedHat based Linux, use this command.

yum install libxml2-devel

For Debian based Linux, use this command.

apt-get install libxml2-dev

Here is the complete C program.

#include <stdio.h>
#include <libxml/parser.h>

/*gcc `xml2-config --cflags --libs` test.c*/

int is_leaf(xmlNode * node)
{
  xmlNode * child = node->children;
  while(child)
  {
    if(child->type == XML_ELEMENT_NODE) return 0;

    child = child->next;
  }

  return 1;
}

void print_xml(xmlNode * node, int indent_len)
{
    while(node)
    {
        if(node->type == XML_ELEMENT_NODE)
        {
          printf("%*c%s:%s\n", indent_len*2, '-', node->name, is_leaf(node)?xmlNodeGetContent(node):xmlGetProp(node, "id"));
        }
        print_xml(node->children, indent_len + 1);
        node = node->next;
    }
}

int main(){
  xmlDoc *doc = NULL;
  xmlNode *root_element = NULL;

  doc = xmlReadFile("dummy.xml", NULL, 0);

  if (doc == NULL) {
    printf("Could not parse the XML file");
  }

  root_element = xmlDocGetRootElement(doc);

  print_xml(root_element, 1);

  xmlFreeDoc(doc);

  xmlCleanupParser();
}

In the main() function above xmlReadFile() loads and parses the XML file (dummy.xml) and returns the document tree. We get the root element of the XML from the document tree using the xmlDocGetRootElement() libxml2 function.

The root node (element) of the XML tree is passed to the print_xml() function to print the whole XML content in hierarchical form. This function traverse all siblings of the input node (including the passed node). If a node is of type ELEMENT then it prints some information about the node. libxml2 keeps other type of nodes also as sibling of the ELEMENT type node. That’s why we are skipping all node except ELEMENT type node. Tag name is printed and if the node is a leaf node, then we print content of the node, otherwise, we print the value of “id” attribute. We are not printing the content of non-leaf nodes because libxml2 return content of all nested children as the content of the node. The the content will be lengthy and repeated. Apart from print the information of the node, we are also call the same function print_xml() recursively for the children of the current node. This way all nodes will be printed.

The above program can be compile by this command.

gcc `xml2-config --cflags --libs` test.c

Output of the Program

-catalog:(null)
   -book:bk101
     -author:Gambardella, Matthew
     -title:XML Developer's Guide
     -genre:Computer
     -price:44.95
     -publish_date:2000-10-01
     -description:An in-depth look at creating applications
      with XML.
   -book:bk102
     -author:Ralls, Kim
     -title:Midnight Rain
     -genre:Fantasy
     -price:5.95
     -publish_date:2000-12-16
     -description:A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.
   -book:bk103
     -author:Corets, Eva
     -title:Maeve Ascendant
     -genre:Fantasy
     -price:5.95
     -publish_date:2000-11-17
     -description:After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.

The post How to Parse and Print XML File in Tree Form using libxml2 in C Programming? appeared first on QnA Plus.


Viewing all articles
Browse latest Browse all 93

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>