Sample data create and import: Difference between revisions

From OpenPetra Wiki
Jump to navigation Jump to search
(Extracted List of Test data generators into seperate page)
Line 104: Line 104:
</code>
</code>


== Test data generators / Sample data (SPLIT this chapter out) ==


There are a number of good test data generators out there, building our own would not have been beneficial for just finding something to work and quickly. I looked at a number of them, with an emphasis on they ''should be recommended by people'' and were ''open source''.
== See also ==


Decision was: look at '''benerator''' and '''generatedata.com'''.
* List of possible 3rd party [Test-Data-Generators]
 
''I looked at benerator and decided to stick with that for now, if it works out.''
 
Idea to be checked: use data from generatedata / geo-database / briandunning together benerator to compile data to common format, which is then imported as shown below.
 
=== generatedata.com ===
 
* creates name, address, email ... looks very nice!
* creates data for Australia, Belgium, Canada, Netherlands, United States, United Kingdom
* but: e.g. the UK postal codes don't seem to be real UK codes. So not only are the Codes not correct, but the combination code / address neither. Perhaps this is different for the US. In any case, this would not be a show-stopper as they look close enough, and we just want lots of data anyway.
 
But:
* Have not looked at the code yet.
* Have not thought about how we can integrate this with OP
 
=== benerator ===
 
* Has generators for all sorts of information, and can create xml files
* is not actually GPL - has a "GPL v2 ''with exceptions''" ???. Should chat with the author.
 
=== Lists of test data generators ===
 
* http://www.webresourcesdepot.com/test-sample-data-generators/ 
* http://databene.org/databene-benerator/similar-products.html
 
The majority of the software listed below was extracted from former page. Criteria for judging: actively maintained + fits the job + documentation (less important)
 
{| border="1" cellspacing="0"
! interest?
! Program
! creates
! area
! Output
! App-Type
! License
|-
| *
| [http://databene.org/databene-benerator.html benerator]
| creates data / transforms given data to test data
|
| various databases, xml, csv, excel
| Framework
| GPL / commercial (WARNING! GPL ''"with exceptions"'')
|-
| *
| [http://www.generatedata.com/ generatedata.com]
| Addresses / Cities / Countries
| Netherlands, Canada, UK, US
| XML, Excel, HTML, CSV, SQL
| Webapp (JS,PHP,MySQL)
| GPL v2
|-
| *
| [http://www.webresourcesdepot.com/free-geographical-database-of-all-countries-over-8-million-places-geonames/ Geographical Places Database]
| geographical locations (schools, universities, whitehouse, eiffel tower...)
|
| tab delimited
| website, download, libraries (various languages), webservice
| creative commons attribution
|-
|
| http://www.briandunning.com/sample-data/
| Website with real address and company data (US and Canada) but with fake names. This could be useful with testing map services as well since there are real geographic locations.
| US, Canada
|
|
| free
|-
| (still want to briefly check)
| [https://sourceforge.net/projects/dbmonster/ DBMonster ]
| generates test data
|
| SQL
| Command-Line (Java)
| Apache License
|-
|
| [http://rubyforge.org/projects/datagen CSV Data generator]
|
|
| CSV?
| (Ruby)
|
|-
|
| [http://sourceforge.net/projects/datagenerator/ Datagenerator]
|
|
|
| library / GUI
| GPL
|-
|
| [http://sourceforge.net/projects/dgmaster/ dqMaster]
|
|
| text,xml,db
| GUI (extensible)
|
|-
|
| [http://sourceforge.net/projects/spawner/ Spawner Data Generator]
| random proper names, terms and connectors
|
| delimited text / SQL
| apptype
| license
|-
|
| [http://sourceforge.net/projects/test-dictionary/ Test Dictionary]
|
|
|
| java interface
|
|-
| data at most
| [http://sourceforge.net/projects/freshtrash/ Fresh Trash Generator]
| Random Website, Email, Family and First Names, Phone Number, Company, Birthday (at least some of the resource data might be interesting)
| Greek Names and Companies, German Streets
|
| java utility package
|
|-
| nn
| [http://code.google.com/apis/ajax/playground/ google api toolkit ]
| nn
|
|
| Web API
|
|-
| -
| [http://www.webresourcesdepot.com/data-science-toolkit-a-set-of-useful-datasets-with-a-unified-api/ Data Science Toolkit]
| convert address to coordinates, vv, ip to coordinates etc
|
|
| Web API / VM
|
|-
| -
| [http://de.fakenamegenerator.com/ fakenamegenerator.com]
| Names,Adresses from many countries
|
|
| Website / Web API
| proprietary for API (kostenlos, but attribution)
|-
| -
| [http://fabricator.codeplex.com/ .net Fabricator]
| (no addresses, so not suitable, but seems nice framework)
|
|
| Framework using .net
| MIT
|-
| - (com)
| [http://www.gedis-studio.com/ GEDIS Studio for Test Data]
| "Realistic Test Data" (not viewed)
|
| CSV, XML, SQL, or HTML
| Windows / Scripting
| community edition kostenlos / commercial
|-
| - (com)
| [http://www.codeforexcelandoutlook.com/blog/2009/02/random-data-generator-add-in-for-excel/ Excel random data generator]
| Generates sample data, somewhat acclaimed [http://www.codeforexcelandoutlook.com/blog/2009/02/random-data-generator-add-in-for-excel/ here]
|
|
| MS Excel Plugin
| commercial
|-
| - (com)
| [http://www.red-gate.com/products/sql-development/sql-data-generator/ SQL Data Generator]
| Generates complex sample data (addresses, companies, interaction), a business person liked it on stackoverflow. Would probably be the right thing except it is SQL Server and commercial.
|
|
| Application for MS SQL Server
| commercial
|-
| - (com)
| [http://msdn.microsoft.com/en-us/library/dd193262%28v=vs.90%29.aspx Microsoft Visual Studio Database Edition]
| Generates sample data, and several people pointed to it on stackoverflow.
|
|
| Part of Visual Studio
| commercial
|-
| - (com)
| [http://www.upscene.com/products.adg.moreinfo.php Advanced Data Generator]
|
|
|
| Windows Application
| commercial
|-
| - (com)
| [http://www.sqlmanager.net/en/products/datagenerator SQL Manager]
|
|
|
| Windows Application
| commercial
|-
|}
 
 
List of others (not checked): date of last change + project (checked april 2011)
 
* 2011-02 [https://sourceforge.net/projects/dagen/ sf dagen]
* 2007-05 [https://sourceforge.net/projects/pharaon/ sf pharaon]
* 2011-03 [https://sourceforge.net/projects/encapet/ sf encapet]
* 2010-08 [https://sourceforge.net/projects/adag/ sf adag]
* 2009-11 [https://sourceforge.net/projects/jrando/ sf jrando]
* 2010-05 [https://sourceforge.net/projects/bbf-data-genera/ sf bbf-data-genera]
 
=== Coding ===
Some coding has been done already:
See csharp\ICT\PetraTools\GenerateSampleData for transforming sample data into family records etc.
 
Also see partner import module, which processes csv and yaml files.
csharp\ICT\Petra\Client\lib\MPartner\gui\PartnerImport.ManualCode.cs

Revision as of 17:18, 2 August 2011

Goal: creating sample data for the database.

The sample data should have

  • many donors
  • many recipients
  • many donations

Current State

  • names and addresses are created, and assembled
  • putting these into the OpenPetra database is in progress (hopefully nearly finished)
  • creating donations is not done yet

Steps:

  1. create raw data as csv files (sufficiently complete!) -- done
  2. import and put together raw data in temporary program (for creating relationships) --done
  3. connect to petra server from temporary program, assemble data as OpenPetra TDS, save --in progress

Additionally:

  1. create nant job to import data

Overview

Data creation and import is done in a two step process:

  1. The 3rd-party data creation tool benedata benerator creates raw data (names, addresses, phone numbers...)
  2. The OpenPetra SampleDataConstructor-tool is specifically built to take this data, assemble, and import it into OpenPetra

alt This diagram shows the data flow of created sample data from benedata benerator into the sample data constructor and the into OpenPetra Server

Related Bugtracking-Issues

Installing third party software

Currently the only external software used is Benedata Benerator for creation of the raw data, but this in turn requires the java runtime enviroment.

Java

  • Required: Java Runtime Environment only

Abridged Instructions

PATH = ....; "C:\Program Files (x86)\Java\jre6\bin"

  • Give it a quick test by opening a new command prompt / terminal and typing:

java -version

Example output:

java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)

Databene Benerator

  • Project Website: http://databene.org/databene-benerator.html
  • Note: The project is a java project
  • Used version: 0.6.6 (as of 2011-July - but it may be advisable to use the recent one, we just use simple features anyway)
  • Other requirements: Java Runtime Enviroment

These instructions are largely copied from benerator website and slightly abridged. The original instructions are more detailed and contain details about installing benerator on a Windows / Mac / Linux-System.

Abridged Instructions

  • Download the Benerator distribution from SourceForge
  • Unzip into a directory of your choice (e.g. C:\Program Files\Development\databene-benerator-0.6.6)
  • Create an environment variable BENERATOR_HOME that points to the path you extracted benerator (example located below).
  • Append BENERATOR_HOME/bin to your environment variable PATH (examples located below)
  • Give benerator a quick test by just opening a new command prompt / terminal and typing

benerator -v

Example Output:

Local classpath: .; ...
Benerator 0.6.6 build 1255
Java version 1.6.0_25
JVM Java HotSpot(TM) Client VM 20.0-b11 (Sun Microsystems Inc.)
OS Windows 2003 5.2 (x86)
Installed JSR 223 Script Engines:
- Mozilla Rhino[js, rhino, JavaScript, javascript, ECMAScript, ecmascript]

Examples for Environment Variables

BENERATOR_HOME = C:\Program Files\Development\databene-benerator-0.6.6

PATH = C:\WINDOWS; ....; %BENERATOR_HOME%\bin


See also

  • List of possible 3rd party [Test-Data-Generators]