Migrating data from legacy systems

From OpenPetra Wiki
Revision as of 09:47, 26 September 2011 by Pokorra (talk | contribs) (make sure that we use UTF8 for writing the SQL file)
Jump to navigation Jump to search

Introduction

We are working on a specific migration path for Petra 2.x databases to OpenPetra. But the tools and ideas can be helpful for any other migration as well.

The goal is to dump the data from the old database. It is recommended to write a tool that in the end produces an .sql.gz file similar to the one that is created by a openpetraorg-server.sh backup call. This sql file will be specific to the database system you are using.

The data needs to be collected from the existing table structure into the OpenPetra table structure. Sometimes data must be cleaned up to fulfill the constraints defined in the OpenPetra database.

To collect the data, the best idea would be to use .Net programs to access the old database. But this is not always possible, perhaps the SQL/ODBC connection is far too slow, so you might need to dump first with the legacy system, and then clean up the data and reformat.

The code to collect the data can be auto-generated, if the table structure of the old system is quite similar to the OpenPetra database structure.

Examples

Migration from Petra 2.x

The code for the migration from Petra 2.x is in csharp/ICT/PetraTools/DumpPetra2xToOpenPetra.

Preparations

Since the old database system is quite outdated, and does not support access to the SQL engine over ODBC very well, we have to generate a program in the 4GL language to dump the data to text files.

To generate the 4GL dump program, you have to run Ict.Tools.DataDumpPetra2.exe with the xml database definition file of Petra 2.x and the xml file of OpenPetra. This will generate fulldumpOpenPetraCSV.p, if fulldumpOpenPetraCSV.r does not exist in the current directory yet. You need to compile this program with the Progress IDE, but make sure you compile against a local standalone Petra 2.3 database, compiling against a network database will take forever.

Then copy the resulting fulldumpOpenPetraCSV.r file to the same directory where your Ict.Tools.DataDumpPetra2.exe is.

Create a dump of the data ready for restore to OpenPetra

You must set the environment variables for your Petra 2.x database, so the best is to run first:

. /usr/local/petra/db23/df/etc/petraenv.sh

You also need to stop the Petra 2.x database first, eg. sysadm petra23 stop.

Now you call Ict.Tools.DataDumpPetra2.exe this way:

mono Ict.Tools.DataDumpPetra2.exe | iconv --to-code=UTF-8 | gzip > mydump`date +%Y%m%d`.sql.gz

This will assume the files petra23.xml in the current directory with the database structure of Petra 2.3, and petra.xml with the database structure of OpenPetra in the same directory as well. Note: the old petra23.xml file is not included in the public bazaar, since Petra 2.x was not Open Source and the file would not benefit anyone else. If you have the petra23.xml file, you need to also rename the datastructure.dtd to datastructure23.dtd and change that reference at the top of the file petra23.xml as well.

You must have write access to that directory, since the tool will create a subdirectory called fulldump, where we will store a .d text file with the dump from the database.

Restore file to OpenPetra

To load the resulting .sql.gz file, please use

openpetraorg-server.sh restore mydump`date +%Y%m%d`.sql.gz

Debugging

If you use the parameter -table:<table name>, eg. mono Ict.Tools.DataDumpPetra2.exe -table:a_account, only that single table will be dumped.

The parameter -debuglevel:1 will show more details, and will not delete the .d file from the fulldump directory, after the file has been processed. If a .d file already exists in the fulldump directory, it will be reused and not fetched from the Petra 2.x database again. -debuglevel:10 prints every value, so this is only recommended when you are testing a small file with a specific error.

If you get errors from the CSV parser of a .d file, please note: Since there are line breaks in the CSV values, the line counter does not fit exactly. The Linux editor vi does have different count, since it does not count single newline characters as a line break, but only carriage returns. If you open the .d file in Windows in Notepad++, you get the same line count as the output of DumpPetra2xToOpenPetra.

Text operations on such huge files take a lot of memory. Despite heavy use of StringBuilder, quite a lot of memory is still being used. Because of that, a conversion of the CSV text files to standardized line breaks has been deactivated in the code.