Friday, September 17, 2010

Oracle Outage at JPMorgan and Formal Proof of DB Validity

Andrew Wiles, Prover of Fermat's Last Theorem
An interesting post over at DBMS2 about the JPMorgan Oracle DB Outage.  Interestingly enough I was on a similar enterprise-level massive Oracle DB running on Solaris and we had a hard outage which required a manual restart.  I mean it took the whole server down.  And we had corruption which required a restore from backup.  Not completely sure if it's related, but these huge databases are not as reliable as they are made out to be.

A code error can always cause some corruption, because a monolithic DB kernel running on a monolithic Unix kernel could always have problems.  However, there has recently come to light a way out of this snafu, a mathematical light at the end of the tunnel.  Something used for microprocessors but just starting to be used for operating systems.  Formal mathematical verification.

In normal programming tasks, formal verification is too restrictive and unwieldy for regular use.  However for infrequently changed but extremely critical software, formal verification is becoming more than just a toy.  An entire microkernel, seL4 of the L4 family, has recently been formally proven correct.  The increasing speed of computers and sophistication of automated proof checkers is making this possible.  One can picture this being only the groundwork for formal proofs of additional basic kernel libraries, i/o, filesystems, network stacks.  And eventually, applications, at least the critical parts of them.

And this brings us to the DB.  On top of a formally verified microkernel, and a verified filesystem, I want a verified database microkernel.  I'm not talking about verifying that all my SQL statements are 100% accurately implemented, no, what I'm seeking here is formal verification of ACID semantics.  So that certain operations are guaranteed to occur, and file corruption is guaranteed never to occur.

We're not just talking about Oracle / DB2 / Sybase / SQLServer here, you want the same consistency guarantees on NoSQL solutions as well.  I may be able to tolerate my data being updated at different times in different places, but I don't want my account balance to be randomly altered due to some corrupted file.

This formal verification approach at the DB level would not only remove a huge chunk of potential bugs, it would bring us certainty on the basics, so we can focus more on the larger tasks, like distributed horizontal key/value stores such as BigTable, efficient spatial functions, social graphs solvers.  And it might give DBAs more sleep at night.

Formal verification of an operating system security kernel (Computer science)

Wednesday, September 15, 2010

Substitution Variable XML Gotcha

You might have been using XML in your PL/SQL script and tried to include an XML special character such as < or ' as described at the W3C:
XML 1.0 Specification 
When testing your script in sqlplus or the Oracle SQL Developer you may have been confused by an input question or popup window asking for the "lt" or "apos" variable definition.  What is actually happening here is Oracle thinks that your XML character is actually a subsitution variable:

Oracle Substitution Variables

The substitution variable will be matched for anything beginning with an ampersand "&" character and ending in a termination character.  Because of the XML specification, you sometimes need to use the literal special character format and there is no way around this. However, you can suppress the substitution variable matching in Oracle by using the command:

   set define off


As described in the substitution variable usage guide:

Substitution Variable Usage

This will then let your script run correctly.  You can also use substitution variables in conjunction with an escape character, as described in the link above, although it is a little messier and only necessary if you are actually using substitution variables in your script.

Monday, September 13, 2010

Oracle XML PL/SQL Gotcha

Recently I've been working on a project importing and exporting XML from an Oracle DB.  The fastest way to do this and the approach recommended by Oracle is to use the XML libraries to read and write as describe by the Oracle document:

Oracle DB XML Guide

There are multiple ways of getting XML in and out of the database, however the fastest by far is use of the PL/SQL libraries that provide a direct mapping to C database kernel calls.  These are the XML creation library for extracting data from SQL into XML:

DBMS_XMLGEN

And the import library for taking XML and loading it into the database:

DBMS_XMLSTORE

These libraries are extremely fast, especially compared to what you may be used to with slow Java parsing of XML documents.  For instance, loading a portal was reduced from taking 6 minutes to several seconds.  Generating an XML extract of a portal took a fraction of a second.  These PL/SQL calls are lightning speed.

However not is all as it seems, and there are a few caveats.  For one, you must extract each table one at a time, with one call per table; there are no multiple-table calls supported.  This also requires that you manually build an XML document if multiple tables are involved, such as generating a portal snapshot.  Second, you must likewise import one table at a time, there is no way to load multiple tables in a single call.  This has the consequence that you must parse and break apart an XML document into per-table chunks when importing.

The most serious caveat I've found is with the handling of certain quotable XML characters such as the apostrophe (otherwise known as the single quote, "'", or ').  Simply put, DBMS_XMLSTORE will not import any column with an apostrophe in the data field.  There is no easy way around this: various forms of escaping will not work, using a varchar or CLOB will not work, no obvious solution I've devised fixes the problem.  Web forums seem to indicate this is a known problem that will be fixed in the future, however as of Oracle 10RC2, the problem still exists.  Gotcha!

So what can you do?  There are several workarounds.  One way is to use the old Java-based XML import library:

DBMS_XMLSAVE

This library gives you the exact same interface as DBMS_XMLSTORE, so when Oracle fixes the bug you can switch libraries without changing any of your code besides the package name.  It also allows you to import without any special workarounds with characters escaping or manual update statements.  However it can be substantially slower than the C-based kernel libraries, and may be impossible to install if you have Oracle XE or if you do not control the database as the package must be manually loaded and installed.

Another way is to escape the XML with non-XML quoted characters before importing (such as changing the apostrophe to a backslash apostrophe sequence, or an out-of-band ASCII special character sequence), then updating the affected columns after import.  This has the advantage of still allowing full XML import, but the disadvantage of requiring per-table updates after every import.

A third option is to manually insert the columns for the tables that will have free-form text which may have an apostrophe.  This approach works well if your data mostly consists of numeric, currency, identity, non-free-from data, as it avoids duplicate imports.  However it has the disadvantage of requiring manual parsing and can be cumbersome if much of your data can contain apostrophes.  Because in our case only a small portion of the data contained free-form text, this is the approach we used.  The correct approach for your case will depend on the specifics of your data.

It is important to note that this gotcha only applies to DBMS_XMLSTORE for XML input; output of XML does not have any of these problems.  We can all hope that Oracle will fix this bug in subsequent releases as these workarounds can then be dispensed with.