Archive for February, 2009

.xlsx format (MS Excel 2007)

February 6, 2009

Today I’ve sorted out parsing of the new MS Excel 2007 .xlsx format. It turned out to be just a .zip archive with a bunch of .xml files inside:

\_rels\.rels
\docProps\core.xml
\docProps\app.xml
\xl\_rels\workbook.xml.rels
\xl\externalLinks\_rels\externalLink1.xml.rels
\xl\externalLinks\externalLink1.xml
\xl\printerSettings\printerSettings1.bin
\xl\theme\theme1.xml
\xl\worksheets\_rels
\xl\worksheets\sheet1.xml
\xl\worksheets\_rels\sheet1.xml.rels
\xl\calcChain.xml
\xl\workbook.xml
\xl\sharedStrings.xml
\xl\styles.xml
[Content_Types].xml

As it’s expected, the data I need are in \xl\worksheets\sheet1.xml file, which is ordinary XML file:

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes” ?>
– <worksheet xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/main&#8221; xmlns:r=”http://schemas.openxmlformats.org/officeDocument/2006/relationships”&gt;
………
– <cols>
<col width=”20.7109375″ />
…….
</cols>
– <sheetData>
– &lt row r=”2″ spans=”2:12″ s=”9″ customFormat=”1″ ht=”23.25″>
– <c r=”B2″ s=”22″ t=”s”>
<v>16< /v>
</c>

So I think parsing this in Java will be easier than parsing old .xls format using Apache POI