Migrating data from legacy systems
Introduction
We are working on a specific migration path for Petra 2.x databases to OpenPetra. But the tools and ideas can be helpful for any other migration as well.
The goal is to dump the data from the old database. It is recommended to write a tool that in the end produces an .sql.gz file similar to the one that is created by a openpetraorg-server.sh backup
call. This sql file will be specific to the database system you are using.
The data needs to be collected from the existing table structure into the OpenPetra table structure. Sometimes data must be cleaned up to fulfill the constraints defined in the OpenPetra database.
To collect the data, the best idea would be to use .Net programs to access the old database. But this is not always possible, perhaps the SQL/ODBC connection is far too slow, so you might need to dump first with the legacy system, and then clean up the data and reformat.
The code to collect the data can be auto-generated, if the table structure of the old system is quite similar to the OpenPetra database structure.
Examples
Migration from Petra 2.x
The code for the migration from Petra 2.x is in csharp/ICT/PetraTools/DataDumpPetra2
.
Preparations
Since the old database system is quite outdated, and does not support access to the SQL engine over ODBC very well, we have to generate a program in the 4GL language to dump the data to text files.
To generate the 4GL dump program, you have to run Ict.Tools.DataDumpPetra2.exe -operation:createProgressCode
with the xml database definition file of Petra 2.x. This will generate fulldump23.p
. You need to compile this program with the Progress IDE, but make sure you compile against a local standalone Petra 2.3 database, compiling against a network database will take forever.
You will need the resulting fulldump23.r
file to later to dump the data from the database into gzipped text files.
Step 1: dump data from Petra 2.x database
This is what you need to do on the CentOS server, where the Petra 2.x database is running:
You must set the environment variables for your Petra 2.x database, so the best is to run first:
. /usr/local/petra/db23/df/etc/petraenv.sh
Then go into a directory, where you have enough space for the dumped gzipped text files to be placed in a subdirectory called fulldump:
$DLC/bin/pro -b -pf $PETRA_HOME/etc/batch.pf -p ./fulldump23.r | cat
You can now tar the fulldump directory, and take the data to a machine where the latest Mono or .Net runtime environment is installed.
Step 2: convert/upgrade the data to OpenPetra format
This can happen on the developers machine, or the machine where OpenPetra is running:
- unpack the fulldump data again into a fulldump subdirectory.
- you might want to remove the fulldump23.r file, to avoid the program to attempt to dump the data from the database again.
- then run Ict.Tools.DataDumpPetra2.exe from the parent directory that contains the fulldump directory
This will assume the files petra23.xml in the current directory with the database structure of Petra 2.3, and petra.xml with the database structure of OpenPetra in the same directory as well. Note: the old petra23.xml file is not included in the public bazaar, since Petra 2.x was not Open Source and the file would not benefit anyone else. If you have the petra23.xml file, you need to also rename the datastructure.dtd to datastructure23.dtd and change that reference at the top of the file petra23.xml as well.
You must have write access to that directory, since the tool will create a subdirectory called fulldump
, where we will store a the .sql.gz text file with the SQL load statements for importing into an OpenPetra PostgreSQL database.
Step 3: Restore file to OpenPetra
To load the resulting .sql.gz file, please use
on a Linux server: openpetraorg-server.sh restore fulldump/_load.sql.gz
or
on a developer's environment: nant loadDatabase -D:file=_load.sql.gz
Debugging
If you use the parameter -table:<table name>
, eg. mono Ict.Tools.DataDumpPetra2.exe -table:a_account
, only that single table will be dumped.
The parameter -debuglevel:1
will show more details, and will not delete the .d.gz file from the fulldump directory, after the file has been processed. If a .d.gz file already exists in the fulldump directory, it will be reused and not fetched from the Petra 2.x database again. -debuglevel:10
prints every value, so this is only recommended when you are testing a small file with a specific error.
If you get errors from the CSV parser of a .d file, please note: Since there are line breaks in the CSV values, the line counter does not fit exactly. The Linux editor vi does have different count, since it does not count single newline characters as a line break, but only carriage returns. If you open the .d file in Windows in Notepad++, you get the same line count as the output of DumpPetra2xToOpenPetra.