Ben Point

 

The Anthropology of Software

Ben Point RSS feed   

Southwestern Uganda
Flying over northern Arizona
Santa Barbara

Software Related

Joel on Software

JoS: General

JoS: Business

JoS: Design

The Old New Thing

Sorting It All Out

Dare Obasanjo

ongoing

firstobject news

cmarkup

xmljungle

 

Startup/µISV

MicroISV

CodeSnipers

Planet MicroISV

My Micro-ISV

Keith Casey (KC)

Gavin Bowman

Ben/BRKStudio

Paul Dix

Ian Landsman

Bobby Strickland

Neville Franks

Phil

Mike

Ben McGaughey

Serge Wautier

Bob Walsh

Carmen Ferrara

Christopher Hawkins

Nick Bradbury

Dharmesh Shah

Philipp Schumann

NGEDIT guy (J)

Gurock Brothers

 

Me at CodeSnipers.com

Glossary of Text Encoding

Splitting Surrogate Pairs

The Enigma of Encoding Versions

How I Invented Base64

EBCDIC to ASCII (and SBCS) Conversion

That Ol' OEM Code Page

Phantom Currency Signs in Japan and Korea

The Euro Sign Predicament

Oh No! Mojibake!

How to Determine Text File Encoding

The secret family split in Windows code page functions

Whether Double-Byte Is ANSI

Strange case of two system locale ANSI charsets

 

 

« Tropical island life for a software developer

 | Main | 

The Market Software Development Paradigm »

Rowboat: A Database Design

If your database consists of a boatload of rows, you might call it a rowboat.

In this one short article I cover all the essentials of a complete multi-session database design. I haven't studied any database theory since college (1991), and I'm not even sure I did then -- although I do have a Computer Science degree. So I am writing this entirely off the cuff based on experiences with databases and my computer science knowledge.

The rowboat database is a single file with a small peer database engine and client application. I use only established techniques (I am not inventing anything here), but I think my design is unique in its simplicity power speed and versatility.

Challenges

In its simplest form, a database is just a group of tables consisting of rows (records). The term database can be used broadly to mean any data container, but I am talking about the common relational archetype. There are 3 primary challenges in implementing a database:

  1. I/O: allowing rows to be created and deleted quickly and efficiently
  2. Transactions: allowing grouped "all or nothing" actions and recovery on failure
  3. Multithreading: ensuring the integrity of the data model despite concurrent activity

All of these need to be designed together. As rows are created they must be:

  1. added without moving around other data on disk or wasting too much space
  2. marked so that they can be rolled back if the transaction does not complete
  3. marked so they are not included in another thread's query begun beforehand

Similar things need to be considered for update and delete of rows. Those are the primary challenges in a nutshell.

File Blocks

Blocks are the universal underpinning of every computer storage technique. The basic idea is that you set a block size and implement a linked list of blocks. When a block is added, it can be allocated anywhere and logically inserted in between two other blocks without having to move any blocks in the storage medium. When a block is removed it can be added to a linked list of deleted blocks in order to be re-used later.

The file system happens to use blocks too, but we manage our own version of blocking or "paging" within a single file system file. I/O reads and writes always occur in terms of whole blocks, that makes I/O simple. In practice, it does not take longer to write one byte or one block because the primary factor is seek time.

A particular block only holds data from one table (a re-used block can of course be assigned to a different table), because when you query the data of a table, you want to involve as little of the disk as possible.

Column Widths Are For Wimps

Based on assumptions rooted in ancient mainframe database technology, most databases like to be able to know column widths when the table is defined (except for large objects). Heck, FORTRAN originally had a rigid input layout with column-based restrictions. But the column width requirement in databases is one of the chief aspects of inflexibility that should be done away with.

Column widths are where most databases miss the boat. Character widths are even more meaningless with the onset of cases where text fields pass through multiple encodings (that can use different numbers of bytes) like when the client and server text encodings differ.

Although relational in attitude, my design does not involve column widths. Rows (or records) are the basis of relational functionality. Columns are a prism through which we select and modify a particular row, but all activities occur in terms of rows. So you've got a bunch of rows and they're all in the same boat, so we'll call it a rowboat.

For the best efficiency, instead of choosing column widths you should choose a base row width. This row block size is the smallest amount of memory that will be used for the row, but the number of these blocks for a particular row can grow arbitrarily.

Row Blocks

In addition to file blocks, there is another level needed to handle the variable width rows. The row block size can be 128 bytes by default and tailored for every table as long as it divides evenly into the file block size. If a particular row is larger than a row block it occupies multiple linked row blocks. Each file block keeps track of how many row blocks are free in it so that the file block can be deleted if it becomes empty. So:

  • A logical table is a linked list of logical rows.
  • A physical table is a linked list of file blocks.
  • A physical row is a linked list of row blocks.

Multi-user Model

There are two ways to handle multiple processes manipulating a database at the same time. One is to use a lock file that each process uses to synchronize their actions. The second is to have a server through which all actions occur.

This is over simplified, but the lock file mechanism depends on the file system's file open command where if two processes simultaneously try to obtain exclusive access to the lock file, one of them is denied or held up until access is granted. The nice part is that the lock file can support multi-user access between processes running on different machines on a network with visibility to a shared file system.

The server mechanism strikes me as easier to conceive and implement since it depends on direct inter-process communication where a single application instance can coordinate all client requests. This is the model I pursue here. A pool of threads can service the queue of data access requests arriving via TCP/IP.

Locking

You can't just add a plain row into a table because until it is successfully committed it is only visible to the session that is inserting it. Similarly when you update a row you cannot just replace the row in the table because it must retain its old value for other sessions until it is committed. But still, a new row is added to the end of the linked list of rows. An updated row is inserted into the linked list of rows adjacent to the row it is replacing.

Different sessions need ways of accessing rows that allow them to determine the state of each row as it pertains to them. To satisfy this requirement, rows are marked as mid-transaction insertion or deletion. In the case of an update the replaced row is marked for deletion. A mid-transaction insertion row is ignored by every session other than the one that obtained the lock, while a mid-transaction deletion row is treated as if it exists by other sessions.

Table locking means that any number of sessions can be reading a table simultaneously, only one can be writing at a time. Table locks are maintained in memory by the server process.

Recovery

For the sake of crash recovery, we need to write any modified block to disk twice. Once in a transaction recovery block, and then overwriting the actual block of the table. A linked list of recovery blocks is maintained in the file. When the commit is successful, the recovery blocks are released back to the pool of deleted blocks.

If the failure occurs while parts of the transaction are being written to the recovery block, the transaction will be discarded. If the failure occurs while parts are being written to the data file, the transaction will be completed during recovery to restore integrity to the data file. Whenever the data file is opened, recovery is performed automatically.

Data Types

Notice that I've said nothing about data types. Each row value is treated by the core database as an opaque string of bytes.

A row (i.e. a record) is generally a structured group of fields. In most tables this will be a simple list of columns implemented for efficiency with either a prefix table of integer offsets or a persisted parsed XML fragment. Data types and row implementations are not integral to the core database design, but are useful in providing the constructs such as indexes, queries and stored procedures.

Indexes are a means of speeding up access by maintaining an ordered or mapped list of keys for a table. An index is an internally maintained table containing links into the indexed table. The database doesn't need to know how to compute the key from a row, but it does need to provide the infrastructure for maintaining the index on table modifications.

Plan of Action

My goal would be to implement this complete database technology in 3 classes in this order:

  1. A sockets class to manage thread pools, queue requests, and negotiate peer to peer relations
  2. A data core class to provide all the linked lists, file I/O, locking, transactions and recovery
  3. An access class to provide the row schemas, data access language and SQL subset

The sockets class would not understand the content of requests but it would send and receive them and delegate threads to usher received requests through the access class. The sockets class could be tested for scalability and throughput by generating requests from multiple peers to each other and performing some random file I/O to represent database processing. The classes would be developed inside a test harness dialog application used to interactively generate requests.

The data core and access classes would actually go hand in hand to some extent. The access class would act as a front end to the data core class but in the beginning would only provide some stubbed out simple methods to service requests from the sockets class. All of the database functionality could be fleshed out while maintaining only rudimentary CRUD methods in the access class. The data core class would not "know" about the access class but would provide hooks for the access class to generate index keys and service triggers.

The access class would also have some client methods used to package requests to be sent, in addition to the server methods that call the data core methods. I would leave both client and server aspects of the access class together because they involve the same knowledge, plus the architecture is peer to peer so a database can be on both sides. Finally, the access class would be fleshed out with data types, row schemas, and full blown language functionality.

I Repeat, No Column Widths

I watched all the excitement about "XML databases" and "Object Oriented Databases" come and go. The hype went away with a resounding thud, although few seemed to notice. We're left with artifacts of the craze such as shallow "object" features in Oracle schemas. The relational model still underpins all the bulk of database programming, while extreme XML databases are more akin to file systems with supporting methods.

But the real question people should have asked is: why do I have to specify a column width at all? We use CLOB fields to store XML documents. Why the complexity? It is just text! Why does my database schema restrict my username length when that should be a business logic or UI decision? If my average username is 8 characters but I reserve 32 for every one of them, does anyone else see the inefficiency here?

I'm trying to rock the boat! "No column widths" is the key to the next real revolution in database programming.

 

see discussion on column widths at Joel's forum:
http://discuss.joelonsoftware.com/default.asp?design.4.372016

note by original poster Ben Bryant, 05 Aug 2006 09:22:00 -0500

Post a comment

« Tropical island life for a software developer

 | Main | 

The Market Software Development Paradigm »

 

Ben Bryant
Software Developer
Anthropologist

View Ben Bryant's profile on LinkedIn
 

The Market Software Development Paradigm

Death March in an Information Technology Project

Building the Machine That Will Build the Machine

I See Markup

Anthropologists In Software Design