Working with multiple DB Connections: Difference between revisions
No edit summary |
|||
Line 62: | Line 62: | ||
===Changing (Library) Methods so That They Can Work With the Default DB Connection or a Separate DB Connection=== | ===Changing (Library) Methods so That They Can Work With the Default DB Connection or a Separate DB Connection=== | ||
====When Such a Method Has Got a <code>TDBTransaction</code> instance available==== | ====When Such a Method Has Got a <code>TDBTransaction</code> instance available==== | ||
''' | Replace any occurrence of <code>DBAccess.GDBAccessObj</code> with a call to <code>DBAccess.GetDBAccessObj(ATransaction)</code>, passing the reference of the <code>TDBTransaction</code> instance that the Method has got available. | ||
The overloaded <code>DBAccess.GetDBAccessObj(...)</code> Method acts as a 'set of points' (using figurative railway language here) and will yield the 'globally available' <code>DBAccess.GDBAccessObj</code> in case a null value for the <code>ATransaction</code> gets passed, otherwise it will yield the <code>TDataBase</code> instance which started the DB Transaction that got passed in in its <code>TDBTransaction ATransaction</code> Argument. | |||
====When Such a Method Has NOT Got a <code>TDBTransaction</code> instance available==== | ====When Such a Method Has NOT Got a <code>TDBTransaction</code> instance available==== | ||
''' | * Either... | ||
** add a new optional Argument that allow the passing-in of a <code>TDataBase</code> instance; | |||
*** you are encouraged to use '<code>TDataBase ADataBase = null</code>' at the very end of the Methods' argument list for the sake of consistency; | |||
** or add a new optional Argument that allow the passing-in of a <code>TDBTransaction</code> instance; | |||
*** you are encouraged to use '<code>TDBTransaction ATransaction = null</code>' at the very end of the Methods' argument list for the sake of consistency. | |||
* Then replace any occurrence of <code>DBAccess.GDBAccessObj</code> with a call to either <code>DBAccess.GetDBAccessObj(ADataBase)</code> or <code>DBAccess.GetDBAccessObj(ATransaction)</code>. | |||
The overloaded <code>DBAccess.GetDBAccessObj(...)</code> Method acts as a 'set of points' (using figurative railway language here) and will yield the 'globally available' <code>DBAccess.GDBAccessObj</code> in case a null value for the <code>ADataBase</code> or <code>ATransaction</code> gets passed, otherwise it will either yield the <code>TDataBase</code> instance passed in in its <code>TDataBase ADataBase</code> Argument, or the <code>TDataBase</code> instance which started the DB Transaction that got passed in in its <code>TDBTransaction ATransaction</code> Argument (depending on which overload of the Method is used). | |||
===Standardised XML Comments=== | |||
For the sake of consistency you are encouraged to use the following XML Comments when adding an Argument as described above to existing Methods: | |||
/// <param name="ADataBase">An instantiated <see cref="TDataBase" /> object, or null (default = null). If null | |||
/// gets passed then the Method executes DB commands with the 'globally available' | |||
/// <see cref="DBAccess.GDBAccessObj" /> instance, otherwise with the instance that gets passed in with this | |||
/// Argument!</param> | |||
and | |||
/// <param name="ATransaction">An instantiated <see cref="TDBTransaction" /> object, or null (default = null). | |||
/// If null gets passed then the Method executes DB commands with the 'globally available' | |||
/// <see cref="DBAccess.GDBAccessObj" /> instance, otherwise with the the <see cref="TDataBase" /> instance | |||
/// which started the DB Transaction that gets passed in with this Argument!</param> | |||
=== TODO === | === TODO === |
Revision as of 09:34, 21 September 2015
THIS PAGE IS WORK IN PROGRESS AND DESCRIBES FEATURES THAT ARE NOT YET IN TRUNK !!!
What is described here is subject to change without notice until this is in trunk !!!
Working with multiple DB Connections: Overview
The Limitations That it Overcomes
- Multiple Threads can run arbitrary DB commands against a single DB Connection in a safe way due to the introduction of the Co-ordinated DB Access (Thread-safe DB Access) feature, but there are limitations to this:
- Each Thread must start a DB Transaction with the GetNewOrExistingTransaction or GetNewOrExistingAutoTransaction Methods as it usually not known which of the Threads will start the DB Transaction first;
- Two Threads cannot take out/piggy-back on a DB Transaction when the IsolationLevels that are required don't match (as the Threads are essentially sharing the same DB Transaction in a thread-safe way);
- True 'parallel' DB access is not achievable (as the Threads are sharing the same DB Transaction in a safe way by running DB commands one-after-the-other instead of truly parallel);
- Handling the outcome of a Thread 'losing out' on obtaining Co-ordinated DB Access when a time-out occurs is either done automatically, which results in 'stock messages' being shown to the user (asking the user to retry the action that [s]he wanted to take a bit later), or - to avoid this - needs to be done intentionally by the software engineers. As the timing in which such situations happen cannot easily be foreseen and not easily be 'forced to happen' during development this can be a cumbersome aspect of the Co-ordinated DB Access approach;
- Historically, only a single DB Connection per connected Client was possible. (This 'Default DB Connection' gets established automatically when a Client connects and gets closed automatically when a Client disconnects or crashes.)
- While an OpenPetra software engineer could make an attempt to open a separate DB Connection, start a DB Transaction and try to run DB commands against it this failed with a whole variety of Exceptions due to several bugs in the 'OpenPetra DB Access and Abstraction Layer' (in the
TDataBase
Class and related Classes).
- While an OpenPetra software engineer could make an attempt to open a separate DB Connection, start a DB Transaction and try to run DB commands against it this failed with a whole variety of Exceptions due to several bugs in the 'OpenPetra DB Access and Abstraction Layer' (in the
From trunk commit Rev. XXXX (September xx, 2015) onwards it has become possible for OpenPetra software engineers to...
- reliably open (a) DB Connection(s) that are truly separate from the 'Default DB Connection' which exists for each Client connection (that is even possible in parallel in multiple Threads!);
- reliably start a new DB Transaction with the BeginTransaction / BeginAutoTransaction commands (or GetNewOrExistingTransaction/GetNewOrExistingAutoTransaction commands) on such a separate DB Connection, with any IsolationLevel;
- reliably run an arbitrary number of arbitrary SQL Commands that are enlisted in such a DB Transaction;
- Commit or Rollback such a DB Transaction completely independent from DB Transactions on other DB Connections (also from the 'Default DB Connection');
- Close such a separate DB Connection completely independent from other DB Connections (also from the 'Default DB Connection').
True parallel DB Access from multiple Threads where it is desired/needed has become possible now!
The Solution
The primary solution was to find bugs in the TDataBase
Class (OpenPetra's Database Access and Abstraction Layer) and related Classes that prevented reliable independent DB Connections, DB Transactions and DB Commands, and to fix them.
This was done by extending our existing NUnit Tests for the TDataBase
Class with multi-threaded NUnit Tests (found in the csharp/ICT/Testing/lib/Common/DB/tests.Multithreading.cs class file) and through those new NUnit Tests probing what ought to work, finding out through those what didn't work, and then addressing the bugs that prevented what ought to have worked. The feature Bug for this solution is Bug [https://tracker.openpetra.org/view.php?id=4364%7C 4364).
Relation to Co-ordinated DB Access (Thread-safe DB Access)
The ability to work with multiple parallel DB Connections does not away with the need for the Co-ordinated DB Access (Thread-safe DB Access) approach. The latter is needed everywhere where we don't deliberately start a new Thread for parallel DB Access (that is, most cases in OpenPetra)! Rather, the ability to work with multiple parallel DB Connections opens up the way of true parallel DB command execution, and not just pseudoparallel execution, and for independent DB Transactions (both independent in their IsolationLevels and the timings of starting, committing and rolling back).
Details of the Implementation
Requirements For Safe Multi-threaded DB Access
Introduction
In every 'deliberate multi-threading scenario' (that is, a scenario that we create on purpose and which we control) in which some form of DB Access is required, the creation of a private, independent TDataBase
instance and working solely with that instance (and with no other TDataBase
instance!) throughout the Threads' lifetime is required.
The reason for this is that ADO.NET providers (and specifically the PostgreSQL ADO.NET Provider 'npgsql') are not thread-safe.
To be able to find violations of those rules early, our TDataBase
Class blocks attempts to take out / work with parallel DB Transactions and raises specific Typed Exceptions. Whenever it raises such an Exception various information is logged in the server log file, too. (See Verbose Logging for influencing the verbosity of the logging information.) Would such Exceptions not be raised, a whole variety of Exceptions from various sources would be thrown when parallel DB access through the same TDataBase
instance would be attempted (see Bug 4364).
Rules That Need to be Adhered to
- Server-side code that runs in a Thread that got started in a 'deliberate multi-threading scenario' (that is, a scenario that we create on purpose and which we control) must perform any DB access only through an instance of
TDataBase
that got created in that Thread (see next paragraph)! - Server-side code that doesn't run in a deliberately started Thread:
- If the server-side code got invoked from the client side it DOES get run in a separate Thread which .NET Remoting creates for us 'under the hood'! When that server-side code access the DB it usually uses the 'globally available'
DBAccess.GDBAccessObj
TDataBase
instance.- Co-ordinated DB Access (Thread-safe DB Access) will be at work in this case and will provide Thread-safe DB access to that
TDataBase
instance.
- Co-ordinated DB Access (Thread-safe DB Access) will be at work in this case and will provide Thread-safe DB access to that
- If the server-side code got invoked from the client side it DOES get run in a separate Thread which .NET Remoting creates for us 'under the hood'! When that server-side code access the DB it usually uses the 'globally available'
Server-side code that runs in a deliberately started Thread
The creation of a private, independent TDataBase
instance and solely working with that from within that Thread is required.
In any deliberately started Thread, ...
- Create a private, independent
TDataBase
instance and store a reference to it in a Variable. If you store a reference to it in a Field you must make sure it isn't accessed from anywhere outside of that Thread, or if that should be needed, it must be accessed in completely thread-safe manner. - Open a private, independent DB Connection for this Thread by calling the
EstablishDBConnection
Method on that Threads'TDataBase
instance.- Supply a special name for the DB Connection with the optional '
AConnectionName
' Argument of theEstablishDBConnection
Method as this can be a big help in debugging (see Verbose Logging)! - Example: \csharp\ICT\Petra\Server\lib\MCommon\Main.cs, Method 'EstablishDBConnection'.
- Supply a special name for the DB Connection with the optional '
- Start a private, independent DB Transaction on that Threads'
TDataBase
instance. You can use any of our usual methods for that and you can choose any IsolationLevel!- Supply a special name for the DB Transaction with the optional '
ATransactionName
' Argument of the Method you are using for this as this can be a big help in debugging (see Verbose Logging)!
- Supply a special name for the DB Transaction with the optional '
- Run arbitrary DB Commands on the private, independent DB Transaction on that Threads'
TDataBase
instance. - Commit or Rollback the private, independent DB Transaction on that Threads'
TDataBase
instance. - Close the private, independent DB Connection on that Threads'
TDataBase
instance.
That way what that specific Tread that needs to execute commands against the DB is completely independent of what any other running Thread(s) might be doing in any (of their) DB Transaction(s) (with whatever IsolationLevel) against the same DB (using the commong DBAccess.GDBAccessObj
or its/their own separate TDatabase
instance) at roughly the same time - except from any DB Table Locks, of course!
Changing (Library) Methods so That They Can Work With the Default DB Connection or a Separate DB Connection
When Such a Method Has Got a TDBTransaction
instance available
Replace any occurrence of DBAccess.GDBAccessObj
with a call to DBAccess.GetDBAccessObj(ATransaction)
, passing the reference of the TDBTransaction
instance that the Method has got available.
The overloaded DBAccess.GetDBAccessObj(...)
Method acts as a 'set of points' (using figurative railway language here) and will yield the 'globally available' DBAccess.GDBAccessObj
in case a null value for the ATransaction
gets passed, otherwise it will yield the TDataBase
instance which started the DB Transaction that got passed in in its TDBTransaction ATransaction
Argument.
When Such a Method Has NOT Got a TDBTransaction
instance available
- Either...
- add a new optional Argument that allow the passing-in of a
TDataBase
instance;- you are encouraged to use '
TDataBase ADataBase = null
' at the very end of the Methods' argument list for the sake of consistency;
- you are encouraged to use '
- or add a new optional Argument that allow the passing-in of a
TDBTransaction
instance;- you are encouraged to use '
TDBTransaction ATransaction = null
' at the very end of the Methods' argument list for the sake of consistency.
- you are encouraged to use '
- add a new optional Argument that allow the passing-in of a
- Then replace any occurrence of
DBAccess.GDBAccessObj
with a call to eitherDBAccess.GetDBAccessObj(ADataBase)
orDBAccess.GetDBAccessObj(ATransaction)
.
The overloaded DBAccess.GetDBAccessObj(...)
Method acts as a 'set of points' (using figurative railway language here) and will yield the 'globally available' DBAccess.GDBAccessObj
in case a null value for the ADataBase
or ATransaction
gets passed, otherwise it will either yield the TDataBase
instance passed in in its TDataBase ADataBase
Argument, or the TDataBase
instance which started the DB Transaction that got passed in in its TDBTransaction ATransaction
Argument (depending on which overload of the Method is used).
Standardised XML Comments
For the sake of consistency you are encouraged to use the following XML Comments when adding an Argument as described above to existing Methods:
/// <param name="ADataBase">An instantiated <see cref="TDataBase" /> object, or null (default = null). If null /// gets passed then the Method executes DB commands with the 'globally available' /// <see cref="DBAccess.GDBAccessObj" /> instance, otherwise with the instance that gets passed in with this /// Argument!</param>
and
/// <param name="ATransaction">An instantiated <see cref="TDBTransaction" /> object, or null (default = null). /// If null gets passed then the Method executes DB commands with the 'globally available' /// <see cref="DBAccess.GDBAccessObj" /> instance, otherwise with the the <see cref="TDataBase" /> instance /// which started the DB Transaction that gets passed in with this Argument!</param>
TODO
TODO
Writing new (Library) Methods That Can Work With the Default DB Connection or a Separate DB Connection
Refer to the section Changing (Library) Methods so That They Can Work With the Default DB Connection or a Separate DB Connection on how to achieve this and use the overloaded DBAccess.GetDBAccessObj(...)
Method to work with either the Default DB Connection or a separate DB Connection right from the start.
Extra Things that You Need to Know
Automatic (and Fully Transparent): Multi-threading and multi-connection 'Sanity Checks' in the TDataBase
Class
TODO
Locking of DB Tables
TODO
Debugging
Verbose Logging
An option for very verbose logging got introduced that is helpful especially for pinning-down problems in multi-threading and multi-connection situations. To switch it on, turn the logging level ('DebugLevel') of the Server to 11 in the servers' config file.
Extra information you will get in the log files when it is turned on:
- Unique DB Connection Identifiers (GUIDs) for each DB Connection;
- Unique DB Transaction Identifiers (GUIDs) for each DB Transaction and the DB Connection Identifier of the DB Connection which the DB Transaction got started with;
- For
BeginTransaction
/BeginAutoTransaction
:- IsolationLevel of the new DB Transaction (if it was started with a specific IsolationLevel);
- DB Transaction Identifier (GUID) of the new DB Transaction;
- DB Connection Identifier (GUID) of the DB Connection that started the DB Transaction;
- StackTrace at the point of DB Transaction start and the Thread and AppDomain in which the DB Transaction got started;
- In case of
GetNewOrExistingTransaction
/GetNewOrExistingAutoTransaction
commands if 'piggy-backing' on an existing DB connection occurs:- the DB Transaction Identifier (GUID) of the DB Transaction it is 'piggy-backing' on;
- IsolationLevel of the DB Transaction it is 'piggy-backing' on;
- Thread and AppDomain in which the DB Transaction that it is 'piggy-backing' on got started;
- DB Connection Identifier (GUID) of the DB Connection that originally started the DB Transaction;
- Thread and AppDomain in which the
GetNewOrExistingTransaction
/GetNewOrExistingAutoTransaction
command got called.
- For
CommitTransaction
orRollbackTransaction
the following gets logged after the command got executed, but with information that was gathered before the command executed:- the DB Transaction Identifier (GUID) of the DB Transaction;
- whether the DB Transaction was still valid (i.e. not already Committed / Rolled back);
- IsolationLevel of the DB Transaction;
- whether the DB Transaction was reused;
- Thread and AppDomain in which the DB Transaction got started;
- Thread and AppDomain in which the
CommitTransaction
orRollbackTransaction
command got called.
- For any DB Command that gets executed:
- Thread and AppDomain in which the DB Command gets executed;
- the DB Transaction Identifier (GUID) of the DB Transaction that the DB Command is enlisted in;
- DB Connection Identifier (GUID) of the DB Connection that originally started the DB Transaction that the DB Command is enlisted in.
Tips:
- If the (new) options to name DB Connections and DB Transactions are used then the name of each named DB Connection and named DB Transaction is included in the logging, too!
- This very verbose logging is extremely helpful when one wants to ascertain that a certain function in OpenPetra, e.g. a Report, Extract, etc. (including any DB access that any library/helper functions it might call!) really perform every DB access it does on the separate DB Connection in a separate Thread that you purposefully created for it:
- Turn the very verbose logging on before executing the function in OpenPetra;
- Scrutinise the log output after the execution of the function has finished. Check whether all DB Commands that it ran were indeed run only on the separate DB Connection and on the separate Thread.
- This is quite some work but the only proof that things are indeed happening as they ought to!
Caveat: This logging is indeed very verbose and hence will slow down execution of the program. It is therefore conceivable that multiple threads will run with different timings when logging is turned up that much than if logging wouldn't be turned up that much...
Finding Out Which Queries Are Run (in Parallel) on Which RDBMS Connection (with PostgreSQL)
It can be quite helpful to see from the RDBSMs' end what queries are running against which DB Connection. For PostgreSQL this is possible by issuing the following SQL query:
SELECT datname, usename, application_name, pid, client_addr, client_port, waiting, query_start, query FROM pg_stat_activity WHERE datname='demo' AND usename='petraserver' AND state LIKE '%in transaction%' ORDER BY pid;
Replace 'demo' with the name of the DB that the PetraServerConsole instance is connected to and run this query in pgAmind III repeatedly (press <F5> for that) while the function of OpenPetra that you want to monitor for its DB connection use is executing.
How to read the output of that query: PostgreSQL starts a separate DB Server process for each DB connection. Each DB Server process has got its own 'pid' (Process Identification). This means that if several DB commands are executing in parallel on several DB connections you will get a list of all the processes that are currently running DB queries - one result line for each process/DB Connection. If you intended the function of OpenPetra to run solely on a specific DB connection and you are seeing multiple lines at times it isn't doing this. In that case refer to the section Verbose Logging.
Multi-threading: General Advice
General caution/advice: Programming with multiple threads is always much, much harder than programming with a single thread. When multiple threads access the same program data (Fields, static Fields, structs, etc.) one can easily get intermittent problems which can be very hard to pin-point and to resolve. Extreme diligence needs to be employed when multiple threads should access some shared program data in a writing fashion! Debugging a multi-threaded program is also considerably more difficult than debugging a single-threaded program. Whole books are devoted to the topic of multi-threaded programming as multi-threading is a quite difficult discipline that is hard to master...
FAQ's
Could calling GetNewOrExistingXXXTransaction
every give my Thread a running DB Transaction from another Thread?
No. When a Method in 'Thread A' calls GetNewOrExistingXXXTransaction
it must do that on the TDataBase
instance that got instantiated on this Thread otherwise a EDBAttemptingToWorkWithTransactionThatGotStartedOnDifferentThreadException
gets thrown. That way 'Thread A' will never (be able to) get a DB Transaction that belongs to a TDataBase
instance of a different Thread ('Thread B') using any of the GetNewOrExistingXXXTransaction
Methods!
Conversely, 'Thread B' (a 'worker thread') will never (be able to) get a DB Transaction with GetNewOrExistingXXXTransaction
that belongs to a TDataBase
instance of a different Thread ('Thread A', which might e.g. use the 'globally available' DBAccess.GDBAccess
instance) because it must not call GetNewOrExistingXXXTransaction
on the instance of 'Thread A' - otherwise a EDBAttemptingToWorkWithTransactionThatGotStartedOnDifferentThreadException
gets thrown!
Could calling BeginXXXTransaction
every give my Thread a new DB Transaction from another Thread?
Yes, and you must make sure that this isn't happening by only ever calling BeginXXXTransaction
on the separate instance of TDataBase
that gets purposefully created for 'your' Thread!
Reason why TDataBase
needs to allow this and cannot not throw an Exception should that happen: If server-side code got invoked from the client side it DOES get run in a separate Thread which .NET Remoting creates for us 'under the hood'! When that server-side code accesses the DB it usually uses the 'globally available' DBAccess.GDBAccessObj TDataBase
instance. That instance got created when the user logged in to the OpenPetra server, and it got created on a different Thread than the auto-created Thread that .NET Remoting manages.
Questions?
In case you have questions regarding working with multiple DB Connections please contact ChristianK, the developer that fixed the issues that prevented it from working in the past, introduced the advanced multi-threading and multi-connection 'sanity checks' in the TDataBase Class and who wrote this documentation.