Monday, September 13, 2010

Oracle XML PL/SQL Gotcha

Recently I've been working on a project importing and exporting XML from an Oracle DB.  The fastest way to do this and the approach recommended by Oracle is to use the XML libraries to read and write as describe by the Oracle document:

Oracle DB XML Guide

There are multiple ways of getting XML in and out of the database, however the fastest by far is use of the PL/SQL libraries that provide a direct mapping to C database kernel calls.  These are the XML creation library for extracting data from SQL into XML:

DBMS_XMLGEN

And the import library for taking XML and loading it into the database:

DBMS_XMLSTORE

These libraries are extremely fast, especially compared to what you may be used to with slow Java parsing of XML documents.  For instance, loading a portal was reduced from taking 6 minutes to several seconds.  Generating an XML extract of a portal took a fraction of a second.  These PL/SQL calls are lightning speed.

However not is all as it seems, and there are a few caveats.  For one, you must extract each table one at a time, with one call per table; there are no multiple-table calls supported.  This also requires that you manually build an XML document if multiple tables are involved, such as generating a portal snapshot.  Second, you must likewise import one table at a time, there is no way to load multiple tables in a single call.  This has the consequence that you must parse and break apart an XML document into per-table chunks when importing.

The most serious caveat I've found is with the handling of certain quotable XML characters such as the apostrophe (otherwise known as the single quote, "'", or ').  Simply put, DBMS_XMLSTORE will not import any column with an apostrophe in the data field.  There is no easy way around this: various forms of escaping will not work, using a varchar or CLOB will not work, no obvious solution I've devised fixes the problem.  Web forums seem to indicate this is a known problem that will be fixed in the future, however as of Oracle 10RC2, the problem still exists.  Gotcha!

So what can you do?  There are several workarounds.  One way is to use the old Java-based XML import library:

DBMS_XMLSAVE

This library gives you the exact same interface as DBMS_XMLSTORE, so when Oracle fixes the bug you can switch libraries without changing any of your code besides the package name.  It also allows you to import without any special workarounds with characters escaping or manual update statements.  However it can be substantially slower than the C-based kernel libraries, and may be impossible to install if you have Oracle XE or if you do not control the database as the package must be manually loaded and installed.

Another way is to escape the XML with non-XML quoted characters before importing (such as changing the apostrophe to a backslash apostrophe sequence, or an out-of-band ASCII special character sequence), then updating the affected columns after import.  This has the advantage of still allowing full XML import, but the disadvantage of requiring per-table updates after every import.

A third option is to manually insert the columns for the tables that will have free-form text which may have an apostrophe.  This approach works well if your data mostly consists of numeric, currency, identity, non-free-from data, as it avoids duplicate imports.  However it has the disadvantage of requiring manual parsing and can be cumbersome if much of your data can contain apostrophes.  Because in our case only a small portion of the data contained free-form text, this is the approach we used.  The correct approach for your case will depend on the specifics of your data.

It is important to note that this gotcha only applies to DBMS_XMLSTORE for XML input; output of XML does not have any of these problems.  We can all hope that Oracle will fix this bug in subsequent releases as these workarounds can then be dispensed with.

1 comment:

  1. Ukrainian software developers have a strong skill set, they are tech savvy, up to date with recent innovations and have an independent thinking. Moreover, Ukraine offers intermediate prices of $25-$49/hour for its software developers in comparison to the engineers from other eastern European countries. Learn more about software development companies in Ukraine.

    ReplyDelete