Today I’ve sorted out parsing of the new MS Excel 2007 .xlsx format. It turned out to be just a .zip archive with a bunch of .xml files inside:
\_rels\.rels
\docProps\core.xml
\docProps\app.xml
\xl\_rels\workbook.xml.rels
\xl\externalLinks\_rels\externalLink1.xml.rels
\xl\externalLinks\externalLink1.xml
\xl\printerSettings\printerSettings1.bin
\xl\theme\theme1.xml
\xl\worksheets\_rels
\xl\worksheets\sheet1.xml
\xl\worksheets\_rels\sheet1.xml.rels
\xl\calcChain.xml
\xl\workbook.xml
\xl\sharedStrings.xml
\xl\styles.xml
[Content_Types].xml
As it’s expected, the data I need are in \xl\worksheets\sheet1.xml file, which is ordinary XML file:
<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes” ?>
- <worksheet xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/main” xmlns:r=”http://schemas.openxmlformats.org/officeDocument/2006/relationships”>
………
- <cols>
<col width=”20.7109375″ />
…….
</cols>
- <sheetData>
– < row r=”2″ spans=”2:12″ s=”9″ customFormat=”1″ ht=”23.25″>
– <c r=”B2″ s=”22″ t=”s”>
<v>16< /v>
</c>
So I think parsing this in Java will be easier than parsing old .xls format using Apache POI