Yesterday Arin Sime and I saw Jean Bauer’s presentation on developing the Early American Foreign Service Database that is part of her disseration attempting to identify why the Foreign Service was so effective when it was intrinsicaly a very chaotic unorganized period for what we now call the Foreign Service.
Jean gave a great brief on some of the challenges the Foreign Service faced in the late 1790’s, and walked us through her database for modeling the social network of the Foreign Service. Source code available at http://projectquincy.rubyforge.org. She modeled both traditional relational data such as individuals and where they are posted, letters they wrote to who, as well as more GIS oriented data, such as what states belong to what countries and what countries were part of what empires (ie British Empire, Ottoman Empire).
One of the things that struck me is that while the schema was very clean, all done up to 3rd normal form standards with good use of primary and foreign keys, what I didn’t like was that many of the relationship tables such as the assignments table for various folks had begin and end dates. And it struck me that many of the queries that you would want to issue are all based around time spans. I want all individuals posted to Barbadoes between 1770 and 1780. Which can easily be done with a select statement that looks something like:
SELECT * from INDIVIDUALS i, POSTINGS p, LOCATION l where l.name = 'barbados' and l.id = p.location_id and p.individual_id = i.id and p.start_date > 1770 and p.end_date < 1781
However it strikes me that this extra section at the end “
p.start_date > 1770 and p.end_date < 1781” is really the key part of the relationships, potentially on the level of the primary keys and foreign keys for identifying the data. How would I model a single posting to a country that was interrupted for a small time period? I guess the traditional answer is to have another table called something like “Residency” that would track actual time periods as well but that seems icky and roundabout adding extra layers of complexity to my data model.
So part of me wonders, why don’t the Primary Keys and Foreign Keys contain the time axis of data as well as the relationship between individual tables? Why don’t we have a Time Based Database for storing data whose primary relationship is time? Document Based Databases have joined our pantheon of data stores, so what would a Time Based Database look like? I think it would deal with various calendars very simply.. Tell me when Rome was founded in Gregorian or Julian calendar. Or, what year did Rosa Park’s stage her protest according to the Jewish calendar?
I can imagine that a TBDB would need to understand the degree of confidence/accuracy in dates. If I selected when all the worlds cities were founded, it would natively let me know that Rome was founded middle of the 8th century BC, but that New York was founded in 1624 AD.
And lastly, it would naturally handle time zones. No need to tediously convert all dates into UTC, just use whatever and know the TBDB would handle the selects.
I can see this being very useful for data such as flight schedules, and figuring out the best route between cities. How do I get from Charlottesville Virginia to Brisbane Australia with multiple stops on the way? A TBDB would handle all that icky date handling so that a stop in Dubai wouldn’t overlap with the departing flight from Dubai.
While there don’t seem to be any candidates yet, feel free to comment on what you would want in your TBDB!