Part of the consultancy work my employer does is expert witness work, where I express opinions in connection with legal cases. So far I've been involved in civil cases including intellectual property cases, review of procedure and data mining for contractual disputes. Having spent some time this week attempt to delve into databases to provide supporting material for a case it struck me how much easier it is when the system used was open source and how this is a drawback.

I have three databases containing information which might be pertinent. The first database was a MySQL database for an open source system. This has been the easiest of the three, MySQL is well documented and the open source software which used the database is completly documented and diagrammed enabling me to see the data model and produce queries for the data quickly and efficently (well as efficently as you can be when you don't know the RDBMS that well).

Last week I took hold of unknown database backup files and was asked to work out what they were and what data they contained. It turns out they're Microsoft Sql Server backup files (you can tell, it has it written in the file headers). As I know Sql Sever quite well (I say quite to head off comments from a certain Sql Server MVP I know is reading) restoring the database was easy, and the diagram tools contained in the Sql Server management studio made visualising the schema reasonably simple. Unfortunately the software using the database appears to be in house, with no documentation and strange table names like as1, as, as3 etc.; all of which intermingle via foreign keys. This has made it a lot harder to retrieve some of the data related to the main tables; however as it is in house software I can request documentation which must then be produced.

This leads me to database number 3. This is a commercial system, the underlying database is Oracle on a unix box and the documentation is the intellectual property of the vendor. The initial problem is restoring the Oracle database onto a PC instance of Oracle. Not knowing Oracle it's been a stiff curve, learning that the restore procedure will attempt to recreate "namespaces" using the original filenames, which are in *nix form and thus dying when on a PC. Strange that such an expensive database won't easily restore cross architecture. Finally having managed the restore I want to view the schema. The lack of tools for Oracle is astounding to me, however I managed to download Toad and get a list of tables. Lo, we're into silly table names again, pr001, pr002, s_pr001, and so on. The schema is impossible for me to grasp so I approach the commercial company. "We'd like to help, but we need permission from the customer who owns the software license". Hardly likely to be granted considering they are the ones being sued. After approximately four weeks of wrangling, begging and eventually getting the lawyers involved I have a schema, a starting point for the extract which I can see taking another full week.

So why do I view open source as a drawback, after all it made my life easier? True, but look at it from the viewpoint of the company being sued. It has taken four weeks so far to get hold of the commerical company's schema, and will probably take another, expensive, billable week to extract the data. My hourly rate on expert work is not cheap. I can easily see this type of search and data mining being written off as too expensive in some circumstances. The open source product took me two days, and everything I needed was on-line. Now, which is more "cover your ass" friendly here? I am sure that someone, somewhere is attempting to market their system as closed and as such it'll be difficult to use the data should you be sued. I don't think we'll see that on slashdot soon.

(The table names I have used have been changed from the actual table names, however the names I have used illustrate the difficulty of the naming schemas actually used).