Over the past 4 years I have worked with many XML providers (interconnecting B2B applications) and some of these providers distribute big XML files, some of them have more than 2GB, as you may already suspect we should not dump 2GB of information into memory because PHP will generate a memory overflow fatal error and your application will not be able to recover by itself, so in this post I will explains how to solve this problem, using…
To format XML files usually I use option 3 described in How to pretty print XML on GNU/Linux but recently I've needed to work with XML files which in addition to being obfuscated one part of it use html entities.
Tool 1: web browser, examples: Mozilla firefox, Google Chrome. The disadvantage of using a browser is that you can not edit the XML file, only useful if you if you want to view it and nothing more.
Tool 2: use xml_pp, xml_pp is part of the XML-Twig suite, to install it on Debian GNU/Linux type:
$ sudo apt-get install xml-twig-tools
You can then format the XML file by typing:
$ xml_pp -i mifichero.xml
If you want to save the formatted content in another file, type:
$ xml_pp mifichero.xml > mifichero_pp.xml
If you want to edit the formatted file, you can use any text editor.
Tool 3: Use Geany editor and prettyprinter plugin, to install it on Debian GNU/Linux type:
$ sudo apt-get install geany geany-plugin-prettyprinter
After running the above command, you must perform the following steps to activate the plugin
1. Run Geany
Menu-> Development-> Geany
From the console, you can also type:
$ geany &
2. Find the Menu
Tools -> Plugin Manager
3. Activate the plugin XML PrettyPrinter
Once the plugin is activated, the option to format XML appears in the Menu:
Tools-> PrettyPtrinter XML