Migrating data from legacy systems

From OpenPetra Wiki
Jump to navigation Jump to search

Introduction

We are working on a specific migration path for Petra 2.x databases to OpenPetra. But the tools and ideas can be helpful for any other migration as well.

The goal is to dump the data from the old database. It is recommended to write a tool that in the end produces an .sql.gz file similar to the one that is created by a openpetraorg-server.sh backup call. This sql file will be specific to the database system you are using.

The data needs to be collected from the existing table structure into the OpenPetra table structure. Sometimes data must be cleaned up to fulfill the constraints defined in the OpenPetra database.

To collect the data, the best idea would be to use .Net programs to access the old database. But this is not always possible, perhaps the SQL/ODBC connection is far too slow, so you might need to dump first with the legacy system, and then clean up the data and reformat.

The code to collect the data can be auto-generated, if the table structure of the old system is quite similar to the OpenPetra database structure.

Examples

Migration from Petra 2.x

The code for the migration from Petra 2.x is in csharp/ICT/PetraTools/DataDumpPetra2.

Preparations

Since the old database system is quite outdated, and does not support access to the SQL engine over ODBC very well, we have to generate a program in the 4GL language to dump the data to text files.

To generate the 4GL dump program, you have to run Ict.Tools.DataDumpPetra2.exe -operation:createProgressCode with the xml database definition file of Petra 2.x. This will generate fulldump23.p. You need to compile this program with the Progress IDE, but make sure you compile against a local standalone Petra 2.3 database, compiling against a network database will take forever.

You will need the resulting fulldump23.r file to later to dump the data from the database into gzipped text files.

Note: fulldump22.p is not a file generated with Ict.Tools.DataDumpPetra2.exe, but it produces the same output as fulldump23.p, because it was intended to write data that can be loaded straight into Petra 2.3.

Step 1: dump data from Petra 2.x database

This is what you need to do on the CentOS server, where the Petra 2.x database is running:

You must set the environment variables for your Petra 2.x database, so the best is to run first:

. /usr/local/petra/db23/df/etc/petraenv.sh

or for Petra 2.2:

. /usr/local/petra/petra22env.sh

Then go into a directory, where you have enough space for the dumped gzipped text files to be placed in a subdirectory called fulldump:

$DLC/bin/pro -b -pf $PETRA_HOME/etc/batch.pf -yy 1980 -yr4def -p ./fulldump23.r | cat

or for Petra 2.2:

$DLC/bin/pro -b -pf $PETRA_HOME/batch22.pf -yy 1980 -yr4def -p ./fulldump22.r | cat

You can now tar the fulldump directory, and take the data to a machine where the latest Mono or .Net runtime environment is installed.

Step 2: convert/upgrade the data to OpenPetra format

This can happen on the developers machine, or the machine where OpenPetra is running:

  • unpack the fulldump data again into a fulldump subdirectory.
  • you might want to remove the fulldump23.r file, to avoid the program to attempt to dump the data from the database again.
  • then run Ict.Tools.DataDumpPetra2.exe from the parent directory that contains the fulldump directory
  • use the parameter -CodePage:1252 for Ict.Tools.DataDumpPetra2.exe depending on the code page that Progress used to dump the files.

This will assume the files petra23.xml in the current directory with the database structure of Petra 2.3, and petra.xml with the database structure of OpenPetra in the same directory as well. Note: the old petra23.xml file is not included in the public bazaar, since Petra 2.x was not Open Source and the file would not benefit anyone else. If you have the petra23.xml file, you need to also rename the datastructure.dtd to datastructure23.dtd and change that reference at the top of the file petra23.xml as well.

You must have write access to that directory, since the tool will create a subdirectory called fulldump, where we will store a _loadTest.sql.gz file with the SQL load statements for testing for load errors and a _load.sql.gz text file with the SQL load statements for importing into an OpenPetra PostgreSQL database.

Step 3: Test the data for errors

To test for errors, please use:

nant loadTestDatabase -D:file=delivery\bin\fulldump\_loadTest.sql.gz

It is useful to write the output to a text file. If there are errors in the data they will be detailed in the output. All errors must be fixed before the data can be loaded into OpenPetra.

To fix an error:

  • unpack _load.sql.gz
  • find the row containing the error
  • manually fix the error.

Step 4: Restore file to OpenPetra

To load the data to OpenPetra, please use

on a Linux server:
openpetraorg-server.sh restore fulldump/_load.sql.gz

or

on a developer's environment:
nant loadDatabase -D:file=delivery\bin\fulldump\_load.sql
(or if you did not need to fix any errors: nant loadDatabase -D:file=delivery\bin\fulldump\_load.sql.gz)

The build will fail if there are any errors in the data. These need to be fixed in Step 3.

It is useful to write the output to a text file. If there are any constraint errors in the database they will be detailed in the output. These will need to be fixed manually.

The output will warn you if any rows are missing from the database after the load has taken place.

Since all user passwords are unusable from Progress4gl, you need to reset the password like this in PostgreSQL

UPDATE s_user SET s_retired_l=false, s_password_hash_c='1887C0796A313074DDB464409F8B92BE6C128674', s_password_salt_c='HI0+ddjS8/gR2zJUJ74QjmdXjCadx5fdJ8ZvqlJSmxE=' where s_user_id_c='SYSADMIN';

This will reset the password to: CHANGEME, and you can login and change the password to something more reasonable.

Important: From trunk Rev. 3237 this will no longer work due to a change in the password hashing algorithm; please refer to a SQL file that christiank sent to all developers to reset the sysadmin password.

Debugging

If you use the parameter -table:<table name>, eg. mono Ict.Tools.DataDumpPetra2.exe -table:a_account, only that single table will be dumped.

The parameter -debuglevel:1 will show more details, and will not delete the .d.gz file from the fulldump directory, after the file has been processed. If a .d.gz file already exists in the fulldump directory, it will be reused and not fetched from the Petra 2.x database again. -debuglevel:10 prints every value, so this is only recommended when you are testing a small file with a specific error.

If you get errors from the CSV parser of a .d file, please note: Since there are line breaks in the CSV values, the line counter does not fit exactly. The Linux editor vi does have different count, since it does not count single newline characters as a line break, but only carriage returns. If you open the .d file in Windows in Notepad++, you get the same line count as the output of DumpPetra2xToOpenPetra.