Sample data create and import
Definition
Goal: creating sample data for the database.
The sample data should have
- many donors
- many recipients
- many donations
Current State
- names and addresses are created, and assembled
- putting these into the OpenPetra database is in progress (hopefully nearly finished)
- creating donations is not done yet
Steps:
- create raw data as csv files (sufficiently complete!) -- done
- import and put together raw data in temporary program (for creating relationships) --done
- connect to petra server from temporary program, assemble data as OpenPetra TDS, save --in progress
Additionally:
- create nant job to import data
Overview
Data creation and import is done in a two step process:
- The 3rd-party data creation tool benedata benerator creates raw data (names, addresses, phone numbers...)
- The OpenPetra SampleDataConstructor-tool is specifically built to take this data, assemble, and import it into OpenPetra
Related Bugtracking-Issues
- create sample data (issue #29)
- import this data (issue #220)
Installing third party software
Currently the only external software used is Benedata Benerator for creation of the raw data, but this in turn requires the java runtime enviroment.
Java
- Required: Java Runtime Environment only
Abridged Instructions
- Download the Java Runtime environment and install
- You might need to add the bin directory to your PATH:
PATH = ....; "C:\Program Files (x86)\Java\jre6\bin"
- Give it a quick test by opening a new command prompt / terminal and typing:
java -version
Example output:
java version "1.6.0_25" Java(TM) SE Runtime Environment (build 1.6.0_25-b06) Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
Databene Benerator
- Project Website: http://databene.org/databene-benerator.html
- Note: The project is a java project
- Used version: 0.6.6 (as of 2011-July - but it may be advisable to use the recent one, we just use simple features anyway)
- Other requirements: Java Runtime Enviroment
These instructions are largely copied from benerator website and slightly abridged. The original instructions are more detailed and contain details about installing benerator on a Windows / Mac / Linux-System.
Abridged Instructions
- Download the Benerator distribution from SourceForge
- Unzip into a directory of your choice (e.g. C:\Program Files\Development\databene-benerator-0.6.6)
- Create an environment variable BENERATOR_HOME that points to the path you extracted benerator (example located below).
- Append BENERATOR_HOME/bin to your environment variable PATH (examples located below)
- Give benerator a quick test by just opening a new command prompt / terminal and typing
benerator -v
Example Output:
Local classpath: .; ... Benerator 0.6.6 build 1255 Java version 1.6.0_25 JVM Java HotSpot(TM) Client VM 20.0-b11 (Sun Microsystems Inc.) OS Windows 2003 5.2 (x86) Installed JSR 223 Script Engines: - Mozilla Rhino[js, rhino, JavaScript, javascript, ECMAScript, ecmascript]
Examples for Environment Variables
BENERATOR_HOME = C:\Program Files\Development\databene-benerator-0.6.6
PATH = C:\WINDOWS; ....; %BENERATOR_HOME%\bin
Test data generators / Sample data (SPLIT this chapter out)
There are a number of good test data generators out there, building our own would not have been beneficial for just finding something to work and quickly. I looked at a number of them, with an emphasis on they should be recommended by people and were open source.
Decision was: look at benerator and generatedata.com.
I looked at benerator and decided to stick with that for now, if it works out.
Idea to be checked: use data from generatedata / geo-database / briandunning together benerator to compile data to common format, which is then imported as shown below.
generatedata.com
- creates name, address, email ... looks very nice!
- creates data for Australia, Belgium, Canada, Netherlands, United States, United Kingdom
- but: e.g. the UK postal codes don't seem to be real UK codes. So not only are the Codes not correct, but the combination code / address neither. Perhaps this is different for the US. In any case, this would not be a show-stopper as they look close enough, and we just want lots of data anyway.
But:
- Have not looked at the code yet.
- Have not thought about how we can integrate this with OP
benerator
- Has generators for all sorts of information, and can create xml files
- is not actually GPL - has a "GPL v2 with exceptions" ???. Should chat with the author.
Lists of test data generators
- http://www.webresourcesdepot.com/test-sample-data-generators/
- http://databene.org/databene-benerator/similar-products.html
The majority of the software listed below was extracted from former page. Criteria for judging: actively maintained + fits the job + documentation (less important)
interest? | Program | creates | area | Output | App-Type | License |
---|---|---|---|---|---|---|
* | benerator | creates data / transforms given data to test data | various databases, xml, csv, excel | Framework | GPL / commercial (WARNING! GPL "with exceptions") | |
* | generatedata.com | Addresses / Cities / Countries | Netherlands, Canada, UK, US | XML, Excel, HTML, CSV, SQL | Webapp (JS,PHP,MySQL) | GPL v2 |
* | Geographical Places Database | geographical locations (schools, universities, whitehouse, eiffel tower...) | tab delimited | website, download, libraries (various languages), webservice | creative commons attribution | |
http://www.briandunning.com/sample-data/ | Website with real address and company data (US and Canada) but with fake names. This could be useful with testing map services as well since there are real geographic locations. | US, Canada | free | |||
(still want to briefly check) | DBMonster | generates test data | SQL | Command-Line (Java) | Apache License | |
CSV Data generator | CSV? | (Ruby) | ||||
Datagenerator | library / GUI | GPL | ||||
dqMaster | text,xml,db | GUI (extensible) | ||||
Spawner Data Generator | random proper names, terms and connectors | delimited text / SQL | apptype | license | ||
Test Dictionary | java interface | |||||
data at most | Fresh Trash Generator | Random Website, Email, Family and First Names, Phone Number, Company, Birthday (at least some of the resource data might be interesting) | Greek Names and Companies, German Streets | java utility package | ||
nn | google api toolkit | nn | Web API | |||
- | Data Science Toolkit | convert address to coordinates, vv, ip to coordinates etc | Web API / VM | |||
- | fakenamegenerator.com | Names,Adresses from many countries | Website / Web API | proprietary for API (kostenlos, but attribution) | ||
- | .net Fabricator | (no addresses, so not suitable, but seems nice framework) | Framework using .net | MIT | ||
- (com) | GEDIS Studio for Test Data | "Realistic Test Data" (not viewed) | CSV, XML, SQL, or HTML | Windows / Scripting | community edition kostenlos / commercial | |
- (com) | Excel random data generator | Generates sample data, somewhat acclaimed here | MS Excel Plugin | commercial | ||
- (com) | SQL Data Generator | Generates complex sample data (addresses, companies, interaction), a business person liked it on stackoverflow. Would probably be the right thing except it is SQL Server and commercial. | Application for MS SQL Server | commercial | ||
- (com) | Microsoft Visual Studio Database Edition | Generates sample data, and several people pointed to it on stackoverflow. | Part of Visual Studio | commercial | ||
- (com) | Advanced Data Generator | Windows Application | commercial | |||
- (com) | SQL Manager | Windows Application | commercial |
List of others (not checked): date of last change + project (checked april 2011)
- 2011-02 sf dagen
- 2007-05 sf pharaon
- 2011-03 sf encapet
- 2010-08 sf adag
- 2009-11 sf jrando
- 2010-05 sf bbf-data-genera
Coding
Some coding has been done already: See csharp\ICT\PetraTools\GenerateSampleData for transforming sample data into family records etc.
Also see partner import module, which processes csv and yaml files. csharp\ICT\Petra\Client\lib\MPartner\gui\PartnerImport.ManualCode.cs