|
|
(26 intermediate revisions by 3 users not shown) |
Line 1: |
Line 1: |
| Data creation and import is split into two tasks:
| | == Overview == |
| * create sample data ([https://sourceforge.net/apps/mantisbt/openpetraorg/view.php?id=29 issue #29])
| |
| * import this data ([https://sourceforge.net/apps/mantisbt/openpetraorg/view.php?id=220 issue #220])
| |
| | |
| The import is done via import file. Decision: simple for user.
| |
| | |
| Keeping the focus: the focus is creating sample data for the database, not import/export.
| |
| | |
| The above points were decided to be done this way in a phonecall with Timo.
| |
| | |
| ''
| |
| This page aims to act as whiteboard for displaying current state and solving this task. The task itself is tracked in the two issue stated above.''
| |
| | |
| == Creating sample data ==
| |
| Goal: creating sample data for the database. | | Goal: creating sample data for the database. |
|
| |
|
Line 20: |
Line 7: |
| * many donations | | * many donations |
|
| |
|
| === Current line of action (as of 2011-07) ===
| | Data creation and import is done in a two step process: |
|
| |
|
| Steps:
| | [[File:SampleDataCreation.png|600px|alt This diagram shows the data flow of created sample data from benedata benerator into the sample data constructor and the into OpenPetra Server]] |
|
| |
|
| # create raw data as csv files (sufficiently complete!) -- done | | # The 3rd-party data creation tool [https://www.benerator.de/ benerator] creates raw data (names, addresses, phone numbers...) |
| # import and put together raw data in temporary program (for creating relationships) --done | | # The OpenPetra SampleDataConstructor-tool is specifically built to take this data, assemble, and import it into OpenPetra |
| # connect to petra server from temporary program, assemble data as OpenPetra TDS, save, and done...
| |
|
| |
|
| Additionally:
| | == Related Bugtracking-Issues == |
|
| |
|
| # create nant job to import data | | * create sample data ([https://sourceforge.net/apps/mantisbt/openpetraorg/view.php?id=29 issue #29]) |
| | * import this data ([https://sourceforge.net/apps/mantisbt/openpetraorg/view.php?id=220 issue #220]) |
|
| |
|
| == Installing third party software == | | == Running Sample Data Constructor == |
| | Creation of raw sample data and assembling this into the database are two separate steps which can be executed with nant: |
|
| |
|
| Currently the only external software used is Benedata Benerator for creation of the raw data, but this in turn requires the java runtime enviroment.
| | === Creating Raw Sample Data === |
| | The following assumes that databene benerator is installed and functioning (see below on how to do this). If it is not installed, you may skip this step if you already have working sample data csv-files. |
| | ==== Creating Raw Sample Data with Benerator ==== |
| | To start the generation of data, run: |
| | nant generateDemodata |
|
| |
|
| === Databene Benerator ===
| | This should create csv files containing sample data in the directory <code>demodata\generated</code>: "People.csv","Addresses.csv","Organisations.csv". |
|
| |
|
| * Project Website: http://databene.org/databene-benerator.html
| | === Running Sample Data Constructor === |
| * Note: The project is a java project
| |
| * Used version: 0.6.6 (as of 2011-July - but it may be advisable to use the recent one, we just use simple features anyway)
| |
| * Other requirements: Java Runtime Enviroment
| |
|
| |
|
| These instructions are largely copied from [http://databene.org/databene-benerator/installation.html|the benerator website] and slightly abridged. The original instructions are more detailed and contain details about installing benerator on a Windows / Mac / Linux-System.
| | The Sample Data Constructor assembles the raw data, and saves it onto the Petra Server. It requires raw sample data files ("People.csv","Addresses.csv","Organisations.csv") to be in the directory <code>demodata\generated</code>. |
|
| |
|
| ==== Abridged Instructions ====
| | To run the Sample Data Constructor, run: |
| | nant resetDatabase |
| | nant importDemodata |
|
| |
|
| * [https://sourceforge.net/project/platformdownload.php?group_id=222964 Download the Benerator distribution] from SourceForge
| | '''Note''': It is necessary to reset the database content with <code>nant resetDatabase</code> before running <code>nant importDemodata</code> as otherwise the RDBMS's Referential Integrity checks will throw duplicate key Exceptions when Units are created! |
| * Unzip into a directory of your choice (e.g. C:\Program Files\Development\databene-benerator-0.6.5)
| |
| * Create an environment variable BENERATOR_HOME that points to the path you extracted benerator.
| |
| * Append BENERATOR_HOME/bin to your environment variable PATH
| |
| <code> | |
| BENERATOR_HOME = C:\Documents and Settings\thomass\My Documents\external\databene-benerator-0.6.6
| |
| PATH = C:\WINDOWS; ....; %BENERATOR_HOME%\bin
| |
| </code> | |
| * That's all! Test benerator by just opening a new command prompt:
| |
| <code> | |
| C:\> benerator
| |
| </code> | |
|
| |
|
| == Test data generators / Sample data (SPLIT this chapter out) ==
| | After having run the sample data constructor, you can access the created data from the OpenPetra Client. |
|
| |
|
| There are a number of good test data generators out there, building our own would not have been beneficial for just finding something to work and quickly. I looked at a number of them, with an emphasis on they ''should be recommended by people'' and were ''open source''.
| | == Current State == |
|
| |
|
| Decision was: look at '''benerator''' and '''generatedata.com'''.
| | * creating names and addresses and putting these into the OpenPetra database is completed |
| | * creating donations is not done yet |
|
| |
|
| ''I looked at benerator and decided to stick with that for now, if it works out.''
| | == Installing third party software == |
|
| |
|
| Idea to be checked: use data from generatedata / geo-database / briandunning together benerator to compile data to common format, which is then imported as shown below.
| | Currently the only external software used is Benedata Benerator for creation of the raw data, but this in turn requires the java runtime enviroment. |
|
| |
|
| === generatedata.com === | | === Java === |
|
| |
|
| * creates name, address, email ... looks very nice! | | * Required: Java Runtime Environment only |
| * creates data for Australia, Belgium, Canada, Netherlands, United States, United Kingdom
| |
| * but: e.g. the UK postal codes don't seem to be real UK codes. So not only are the Codes not correct, but the combination code / address neither. Perhaps this is different for the US. In any case, this would not be a show-stopper as they look close enough, and we just want lots of data anyway.
| |
|
| |
|
| But:
| | === Abridged Instructions === |
| * Have not looked at the code yet.
| |
| * Have not thought about how we can integrate this with OP
| |
|
| |
|
| === benerator ===
| | * Download the [http://www.java.com/de/download/ current Java Runtime environment] and install |
| | * You might need to add the bin directory to your PATH: |
|
| |
|
| * Has generators for all sorts of information, and can create xml files
| | <code> |
| * is not actually GPL - has a "GPL v2 ''with exceptions''" ???. Should chat with the author.
| | PATH = ....; "C:\Program Files (x86)\Java\jre6\bin" |
| | </code> |
|
| |
|
| === Lists of test data generators ===
| | * Give it a quick test by opening a new command prompt / terminal and typing: |
|
| |
|
| * http://www.webresourcesdepot.com/test-sample-data-generators/
| | <code> |
| * http://databene.org/databene-benerator/similar-products.html
| | java -version |
| | </code> |
|
| |
|
| The majority of the software listed below was extracted from former page. Criteria for judging: actively maintained + fits the job + documentation (less important)
| | Example output: |
|
| |
|
| {| border="1" cellspacing="0"
| | java version "1.6.0_25" |
| ! interest?
| | Java(TM) SE Runtime Environment (build 1.6.0_25-b06) |
| ! Program
| | Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing) |
| ! creates
| |
| ! area
| |
| ! Output
| |
| ! App-Type
| |
| ! License
| |
| |-
| |
| | *
| |
| | [http://databene.org/databene-benerator.html benerator]
| |
| | creates data / transforms given data to test data
| |
| |
| |
| | various databases, xml, csv, excel
| |
| | Framework
| |
| | GPL / commercial (WARNING! GPL ''"with exceptions"'')
| |
| |-
| |
| | *
| |
| | [http://www.generatedata.com/ generatedata.com]
| |
| | Addresses / Cities / Countries
| |
| | Netherlands, Canada, UK, US
| |
| | XML, Excel, HTML, CSV, SQL
| |
| | Webapp (JS,PHP,MySQL)
| |
| | GPL v2
| |
| |-
| |
| | *
| |
| | [http://www.webresourcesdepot.com/free-geographical-database-of-all-countries-over-8-million-places-geonames/ Geographical Places Database]
| |
| | geographical locations (schools, universities, whitehouse, eiffel tower...)
| |
| |
| |
| | tab delimited
| |
| | website, download, libraries (various languages), webservice
| |
| | creative commons attribution
| |
| |-
| |
| |
| |
| | http://www.briandunning.com/sample-data/
| |
| | Website with real address and company data (US and Canada) but with fake names. This could be useful with testing map services as well since there are real geographic locations.
| |
| | US, Canada
| |
| |
| |
| |
| |
| | free
| |
| |-
| |
| | (still want to briefly check)
| |
| | [https://sourceforge.net/projects/dbmonster/ DBMonster ]
| |
| | generates test data
| |
| |
| |
| | SQL
| |
| | Command-Line (Java)
| |
| | Apache License
| |
| |-
| |
| |
| |
| | [http://rubyforge.org/projects/datagen CSV Data generator]
| |
| |
| |
| |
| |
| | CSV?
| |
| | (Ruby)
| |
| |
| |
| |-
| |
| |
| |
| | [http://sourceforge.net/projects/datagenerator/ Datagenerator]
| |
| |
| |
| |
| |
| |
| |
| | library / GUI
| |
| | GPL
| |
| |-
| |
| |
| |
| | [http://sourceforge.net/projects/dgmaster/ dqMaster]
| |
| |
| |
| |
| |
| | text,xml,db
| |
| | GUI (extensible)
| |
| |
| |
| |-
| |
| |
| |
| | [http://sourceforge.net/projects/spawner/ Spawner Data Generator]
| |
| | random proper names, terms and connectors
| |
| |
| |
| | delimited text / SQL
| |
| | apptype
| |
| | license
| |
| |-
| |
| |
| |
| | [http://sourceforge.net/projects/test-dictionary/ Test Dictionary]
| |
| |
| |
| |
| |
| |
| |
| | java interface
| |
| |
| |
| |-
| |
| | data at most
| |
| | [http://sourceforge.net/projects/freshtrash/ Fresh Trash Generator]
| |
| | Random Website, Email, Family and First Names, Phone Number, Company, Birthday (at least some of the resource data might be interesting)
| |
| | Greek Names and Companies, German Streets
| |
| |
| |
| | java utility package
| |
| |
| |
| |-
| |
| | nn
| |
| | [http://code.google.com/apis/ajax/playground/ google api toolkit ]
| |
| | nn
| |
| |
| |
| |
| |
| | Web API
| |
| |
| |
| |-
| |
| | -
| |
| | [http://www.webresourcesdepot.com/data-science-toolkit-a-set-of-useful-datasets-with-a-unified-api/ Data Science Toolkit]
| |
| | convert address to coordinates, vv, ip to coordinates etc
| |
| |
| |
| |
| |
| | Web API / VM
| |
| |
| |
| |-
| |
| | -
| |
| | [http://de.fakenamegenerator.com/ fakenamegenerator.com]
| |
| | Names,Adresses from many countries
| |
| |
| |
| |
| |
| | Website / Web API
| |
| | proprietary for API (kostenlos, but attribution)
| |
| |-
| |
| | -
| |
| | [http://fabricator.codeplex.com/ .net Fabricator]
| |
| | (no addresses, so not suitable, but seems nice framework)
| |
| |
| |
| |
| |
| | Framework using .net
| |
| | MIT
| |
| |-
| |
| | - (com)
| |
| | [http://www.gedis-studio.com/ GEDIS Studio for Test Data]
| |
| | "Realistic Test Data" (not viewed)
| |
| |
| |
| | CSV, XML, SQL, or HTML
| |
| | Windows / Scripting
| |
| | community edition kostenlos / commercial
| |
| |-
| |
| | - (com)
| |
| | [http://www.codeforexcelandoutlook.com/blog/2009/02/random-data-generator-add-in-for-excel/ Excel random data generator]
| |
| | Generates sample data, somewhat acclaimed [http://www.codeforexcelandoutlook.com/blog/2009/02/random-data-generator-add-in-for-excel/ here]
| |
| |
| |
| |
| |
| | MS Excel Plugin
| |
| | commercial
| |
| |-
| |
| | - (com)
| |
| | [http://www.red-gate.com/products/sql-development/sql-data-generator/ SQL Data Generator]
| |
| | Generates complex sample data (addresses, companies, interaction), a business person liked it on stackoverflow. Would probably be the right thing except it is SQL Server and commercial.
| |
| |
| |
| |
| |
| | Application for MS SQL Server
| |
| | commercial
| |
| |-
| |
| | - (com)
| |
| | [http://msdn.microsoft.com/en-us/library/dd193262%28v=vs.90%29.aspx Microsoft Visual Studio Database Edition]
| |
| | Generates sample data, and several people pointed to it on stackoverflow.
| |
| |
| |
| |
| |
| | Part of Visual Studio
| |
| | commercial
| |
| |-
| |
| | - (com)
| |
| | [http://www.upscene.com/products.adg.moreinfo.php Advanced Data Generator]
| |
| |
| |
| |
| |
| |
| |
| | Windows Application
| |
| | commercial
| |
| |-
| |
| | - (com)
| |
| | [http://www.sqlmanager.net/en/products/datagenerator SQL Manager]
| |
| |
| |
| |
| |
| |
| |
| | Windows Application
| |
| | commercial
| |
| |-
| |
| |}
| |
|
| |
|
| | === Databene Benerator === |
|
| |
|
| List of others (not checked): date of last change + project (checked april 2011)
| | * Project Website: https://www.benerator.de/ |
| | * Note: The project is a java project |
| | * Used version: 0.7.1 (as of 2011-November) |
| | * Other requirements: Java Runtime Enviroment |
|
| |
|
| * 2011-02 [https://sourceforge.net/projects/dagen/ sf dagen]
| |
| * 2007-05 [https://sourceforge.net/projects/pharaon/ sf pharaon]
| |
| * 2011-03 [https://sourceforge.net/projects/encapet/ sf encapet]
| |
| * 2010-08 [https://sourceforge.net/projects/adag/ sf adag]
| |
| * 2009-11 [https://sourceforge.net/projects/jrando/ sf jrando]
| |
| * 2010-05 [https://sourceforge.net/projects/bbf-data-genera/ sf bbf-data-genera]
| |
|
| |
|
| === Coding === | | * [https://sourceforge.net/project/platformdownload.php?group_id=222964 Download the Benerator distribution] from SourceForge |
| Some coding has been done already:
| | * Unzip into Program files (e.g. C:\Program Files\databene-benerator-0.7.1) |
| See csharp\ICT\PetraTools\GenerateSampleData for transforming sample data into family records etc.
| | * nant should pick that version up, otherwise you need to add to your OpenPetra.build.config: |
| | | <property name="external.Benerator" value="${OP::GetFileInProgramDirectory('/databene-benerator-0.7.1/bin/benerator.bat')}"/> |
| Also see partner import module, which processes csv and yaml files.
| |
| csharp\ICT\Petra\Client\lib\MPartner\gui\PartnerImport.ManualCode.cs
| |
| | |
| == Importing sample data ==
| |
| | |
| The import is done via import file. Decision: simple for user.
| |
| | |
| Keeping the focus: the focus is creating sample data for the database, not import/export.
| |
| Import/export is a simple tool - which we put effort into, to keep it nice and simple and easy to understand. But in this case, a tool for sample data only.
| |
| | |
| Make the import file as simple as possible for the user, e.g. consciously limit the scope of the import files capability (one address per person), but rather not powerful import-file.
| |
| | |
| Data format: This [http://stackoverflow.com/questions/2629255/dsl-to-generate-test-data stackoverflow question] suggests YAML. I am still split, rather yaml than xml, but perhaps simple csv would fit.
| |
| | |
| === Concider data liberation? ===
| |
| | |
| Not necessarily - only if useful to keep it simple and make it work quickly.
| |
| | |
| * [https://sourceforge.net/apps/mediawiki/openpetraorg/index.php?title=Data_liberation Open Petra Data Liberation]
| |
| * [http://www.dataliberation.org/ Googles effort]
| |
| | |
| === Intended location of data in OpenPetra ===
| |
|
| |
|
| {|
| | == See also == |
| ! Data
| |
| ! Table
| |
| |-
| |
| | Person
| |
| | p_family
| |
| |-
| |
| | Address
| |
| | -
| |
| |-
| |
| | Donations
| |
| | -
| |
| |-
| |
| |}
| |
|
| |
|
| p_family will be used for all data, and p_person ignored (This is in line with the attempts to replace p_person by p_family).
| | * List of possible 3rd party [[3rd-party Test Data Generators]] |
Overview
Goal: creating sample data for the database.
The sample data should have
- many donors
- many recipients
- many donations
Data creation and import is done in a two step process:
- The 3rd-party data creation tool benerator creates raw data (names, addresses, phone numbers...)
- The OpenPetra SampleDataConstructor-tool is specifically built to take this data, assemble, and import it into OpenPetra
Related Bugtracking-Issues
Running Sample Data Constructor
Creation of raw sample data and assembling this into the database are two separate steps which can be executed with nant:
Creating Raw Sample Data
The following assumes that databene benerator is installed and functioning (see below on how to do this). If it is not installed, you may skip this step if you already have working sample data csv-files.
Creating Raw Sample Data with Benerator
To start the generation of data, run:
nant generateDemodata
This should create csv files containing sample data in the directory demodata\generated
: "People.csv","Addresses.csv","Organisations.csv".
Running Sample Data Constructor
The Sample Data Constructor assembles the raw data, and saves it onto the Petra Server. It requires raw sample data files ("People.csv","Addresses.csv","Organisations.csv") to be in the directory demodata\generated
.
To run the Sample Data Constructor, run:
nant resetDatabase
nant importDemodata
Note: It is necessary to reset the database content with nant resetDatabase
before running nant importDemodata
as otherwise the RDBMS's Referential Integrity checks will throw duplicate key Exceptions when Units are created!
After having run the sample data constructor, you can access the created data from the OpenPetra Client.
Current State
- creating names and addresses and putting these into the OpenPetra database is completed
- creating donations is not done yet
Installing third party software
Currently the only external software used is Benedata Benerator for creation of the raw data, but this in turn requires the java runtime enviroment.
Java
- Required: Java Runtime Environment only
Abridged Instructions
PATH = ....; "C:\Program Files (x86)\Java\jre6\bin"
- Give it a quick test by opening a new command prompt / terminal and typing:
java -version
Example output:
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
Databene Benerator
- Project Website: https://www.benerator.de/
- Note: The project is a java project
- Used version: 0.7.1 (as of 2011-November)
- Other requirements: Java Runtime Enviroment
- Download the Benerator distribution from SourceForge
- Unzip into Program files (e.g. C:\Program Files\databene-benerator-0.7.1)
- nant should pick that version up, otherwise you need to add to your OpenPetra.build.config:
<property name="external.Benerator" value="${OP::GetFileInProgramDirectory('/databene-benerator-0.7.1/bin/benerator.bat')}"/>
See also