|
Gadfly Recovery
|
In the event of a software glitch or crash Gadfly may terminate without having stored committed updates.
A recovery strategy attempts to make sure
that the unapplied commited updates are applied when the database restarts.
It is always assumed that there is only one primary (server) process controlling the database (possibly with
multiple clients).
Gadfly uses a simple LOG with deferred updates recovery mechanism. Recovery should be possible in the
presence of non-disk failures (server crash, system crash). Recovery after a disk crash is not available
for Gadfly as yet, sorry.
Due to portability problems Gadfly does not prevent multiple processes from "controlling" the database at
once. For read only access multiple instances are not a problem, but for access with modification, the processes
may collide and corrupt the database. For a read-write database, make sure only one (server) process controls
the database at any given time.
The only concurrency control mechanism that provides serializability for Gadfly as yet is the trivial one --
the server serves all clients serially. This will likely change for some variant of the system at some point.
This section explains the basic recovery mechanism.
Normal operation
Precommit
During normal operations any active tables are in memory in the process.
Uncommitted updates for a transaction are kept in "shadow tables" until the transaction commits using
connection.commit()
The shadow tables remember the mutations that have been applied to them. The permanent table copies
are only modified after commit time. A commit commits all updates for all cursors for the connection.
Unless the autocommit feature is disabled (see below) a
commit normally always triggers a checkpoint too.
A rollback
connection.rollback()
explicitly discards all uncommitted updates and restores the connection to the previously
committed state.
There is a 3rd level of shadowing for statement sequences executed by a cursor.
In particular the design attempts to make sure that if
cursor.execute(statement)
fails with an error, then the shadow database will contain no updates from
the partially executed statement (which may be a sequence of statements)
but will reflect other completed updates that may have not been committed.
Commit
At commit, operations applied to shadow tables are written
out in order of application to a log file before being permanently
applied to the active database. Finally a commit record is written to
the log and the log is flushed. At this point the transaction is considered
committed and recoverable, and a new transaction begins.
Finally the values of the shadow tables replace
the values of the permanent tables in the active database,
(but not in the database disk files until checkpoint, if autocheckpoint
is disabled).
Checkpoint
A checkpoint operation brings the persistent copies of the tables on
disk in sync with the in-memory copies in the active database. Checkpoints
occur at server shut down or periodically during server operation.
The checkpoint operation runs in isolation (with no database access
allowed during checkpoint).
Note: database connections normally run a checkpoint
after every commit, unless you set
connection.autocheckpoint = 0
which asks that checkpoints be done explicitly by the program using
connection.commit() # if appropriate
connection.checkpoint()
Explicit checkpoints should make the database perform better,
since the disk files are written less frequently, but
in order to prevent unneeded (possibly time consuming)
recovery operations after a database
is shutdown and restarted it is important to always execute an explicit
checkpoint at server shutdown, and periodically during long server
runs.
Note that if any outstanding operations are uncommitted
at the time of a checkpoint (when autocheckpoint is disabled) the
updates will be lost (ie, it is equivalent to a rollback).
At checkpoint the old persistent value of each table that has been updated since
the last checkpoint is copied to a back up file, and the currently active value is
written to the permanent table file. Finally if the data definitions have changed
the old definitions are stored to a backup file and the new definitions are written
to the permanent data definition file. To signal successful checkpoint the
log file is then deleted.
At this point (after log deletion) the database is considered
quiescent (no recovery required). Finally all back up table files are deleted.
[Note, it might be good to keep old logs around... Comments?]
Each table file representation is annotated with a checksum,
so the recovery system can check that the file was stored correctly.
Recovery
When a database restarts it automatically determines whether
the last active instance shut down normally and whether recovery
is required. Gadfly discovers the need for recovery by detecting
a non-empty current log file.
To recover the system Gadfly first scans the log file to determine committed transactions.
Then Gadfly rescans the log file applying the operations of committed
transactions to the in memory table values in the order recorded.
When reading in table values for the purpose of recovery Gadfly looks
for a backup file for the table first. If the backup is not corrupt,
its value is used, otherwise the permanent table file is used.
After recovery Gadfly runs a normal checkpoint before resuming
normal operation.
Please note: Although I have attempted to provide a robust
implementation
for this software I do not guarantee its correctness. I hope
it will work well for you but I do not assume any legal
responsibility for problems anyone may have during use
of these programs.
feedback
home
Gadfly home
|