Persistency Interfaces

This is a direct copy of Appendix A of the Astro-WISE Architectural Design document and is use to give a more detailed explanation of how the Astro-WISE system works at its lower levels.

Introduction

This chapter describes the specification and Python implementation of persistent objects on top of a relational database back end. The aim of this implementation is twofold:

  1. Provide a transparent mapping from a definition of a persistent class to a table in a relational database, preserving inheritance relationships, and allowing attributes to refer to other persistent objects.
  2. Provide a native Python syntax to express queries, and leverage the advantages of the relational model (SQL) when using persistent objects.

In this paper we will first introduce a number of concepts from Object Oriented Programming (OOP) and Relational Database Management Systems (RDBMS), in order to clarify the problem we wish to solve. We will then provide the specification of the database interface provide by the Astro-WISE prototype. Finally, we will clarify some of the implementation issues addressed by the current prototype.

Background

Object Oriented Programming

It is difficult to give a meaningful definition of “object”. However, the following “definition” introduces some intimately related terms that will be used throughout this document:

object
An object is something that comprises type, identity and state. The type of an object, specifies what kind of object it is, specifically what kind of behavior the object is capable of. The identity is what distinguishes one object from another. The state of an object specifies the values of the properties of the object.

In Object Oriented Programming (OOP) we have an operational definition of objects:

object
An object is an instance of a class, and encapsulates both data and behavior

The class defines what operations (methods) can be performed on its instances, and what attributes those instances will have. In general ‘class’ and ‘type’ are synonymous, as are ‘instance’ and ‘object’. That is, when we talk about the type of an object we mean the class of which it is an instance.

It is important to note that the values of the attributes of an object will themselves be objects, although most programming languages distinguish between (instances of) primitive data types (integers, strings, etc) and instances of classes.

Inheritance is the mechanism by which one can use the definition of existing classes to build new classes. A child (derived) class will inherit behavior from its parent (base) class. In defining the child class the programmer has the opportunity to extend the child class with new methods and attributes, and/or modify the implementation of methods defined in the parent class. However, the child class is expected to conform to the interface (specification) of the parent class, to the extent that instances of the child class can behave as if they are instances of the parent class. In particular it is expected that procedures taking an object of a base type as argument, should also work when given a derived type as argument. This key property of objects is called polymorphism

Persistency

An object is said to be persistent if it is able to ‘remember’ its state across program boundaries. This concept should not be confused with the concept of a program saving and restoring its data (or state). Rather, persistency, implies that object identity is meaningful across program boundaries, and can be used to recover object state.

Persistency is usually implemented by an explicit mapping from (user-defined) object identities to object states and by then saving and restoring this mapping. However, this implementation assume that the object identity of the object one is interested in can be independently and easily obtained. For many applications this is not the case. On the contrary, one usually has a (partial) specification of the state, and are interested in the corresponding objects that satisfy this specification. That is, many interesting applications depend on a mapping of a partially specified object state to object identity (and then to object). This is the domain of the relational database.

Relational Databases

A relational database management system (RDBMS) stores, updates and retrieves data, and manages the relation between different data. A RDBMS has no concept of objects, inheritance and polymorphism, and it is therefore not a-priory obvious that one would like to use such a database to implement object persistence. However, using the following mapping

type :math:`longleftrightarrow ` table
identity :math:`longleftrightarrow ` row index
state :math:`longleftrightarrow ` row value

it is (hopefully) obvious that one might, at least in principle, implement object persistency using a relational database. That is, given a type and object identity, one can store and retrieve state from the specified row in the corresponding table.

Relational databases provide a powerful tool to view and represent their content using structured queries. It would be extremely useful if we were able to leverage this power to efficiently search for object whose state matches certain criteria. Special consideration has to be given to inheritance in this case.

Assume, for example, that we define a persistent type DomeFlatImage, derived from a more general type FlatfieldImage. A query for all R-band flatfield images, should result in a set including all R-band domeflat images. This behavior of queries is what inheritance means in a relational database context. Hence, a query for objects of a certain type maps to queries (returning row indices/object identities) on the tables corresponding to that type, and all of its subtypes. The results of these queries are then combined in to a single set of all objects, of that type or one of its sub types, that satisfy the selection.

Problem specification

The implementation of the interface (should) address(es) the following issues:

defining a persistent class
Defining a persistent class (type), will give its instances the property of being persistent. The class definition should provide sufficient information about the attributes (possible state) of the objects to build the corresponding database table. This table should be present in the database when the first object of the class is instantiated. Presently, this is achieved by dynamically creating the table (if it doesn’t yet exist), when processing the class definition [1]
retrieving state of persistent object
Instantiating a persistent object with an existing object identity should result in retrieval of state from the database.
saving state of persistent objects
Persistent objects, whose state has been modified, should save their state to the database before they cease to exist.
references
persistent objects will contain references to (read: instances of) other persistent objects. Care has to be taken that instantiation of a persistent object does not recursively instantiates all objects it refers to. Only when the attribute corresponding to the reference are accessed should the corresponding object be instantiated.
expressing selections

It should be possible to express selections of the form

\[\{x | x \in X \wedge (x.attr1 \in A \wedge x.attr2 \in B \vee x.attr3 \in C ...)\}\]

i.e.: the set of all objects of type \(X\) whose attributes have certain properties. This set should be translated in to an SQL query to the database, and result in an iterable sequence of objects satisfying the selection.

In addition, the following issues need to be addressed, though not necessarily by the interface to persistent objects.

managing database connections
The interface does not specify how or when the database connection is established.
transactions
The interface doesn’t specify if and how transactions are implemented
efficiency
No effort has yet been made to maximize performance and/or scalability. Initial efforts has focussed on a demonstration of technology and simplicity of implementation.

Interface Specification

In this section we describe how to implement and use persistent objects, using the interface defined in the Astro-WISE prototype. This section includes Python source code fragments. For those not familiar with Python we advise that they have a look at the main web site at https://www.python.org/.

Persistent classes

Persistent objects are instances of persistent classes, which specify explicitly which attributes (properties) are saved in the database. We call these attributes persistent properties. Executing a program defining

Defining persistent classes

A new persistent class is defined by deriving from an existing persistent class, or by deriving from the root persistent class DBObject. E.g.:

#example1.py
from common.database.DBMain import DBObject
class A(DBObject):
    pass
class B(A):
    pass

specifies two persistent classes (A and B). Neither of them extends their parent classes, so instances of A and B will behave exactly like instances of DBObject.

Defining persistent properties

A persistent property is defined by using the following expression in the class definition:

prop_name = persistent(prop_docs, prop_type, prop_default),

where, prop_name is the name of the persistent property, and persistent is constructed using three arguments: the property documentation, the type of the property, and the default value for the property respectively. For example:

#example2.py
from common.database.DBMain import DBObject, persistent
class Address(DBObject):
      street = persistent('The street', str, '')
      number = persistent('The house number', int, 0)

This program defines a persistent class ‘Address’, with two persistent properties, ‘street’ and ‘number’, of type str(ing) and int(eger) respectively.

We distinguish between 5 different types of persistent properties, based on the signature of the arguments to persistent()

descriptors
If the type of the persistent property is a basic (built-in) type, then we call the persistent property a descriptor. Valid types are: integers (int), floating point numbers (float), date-time objects (datetime), and strings (str).
descriptor lists
Persistent properties can also be homogeneous variable length arrays of basic built in types, called descriptor lists. Valid types are the same as those for descriptors. descriptor lists are distinguished from descriptors by the property default. If the default is a Python list, the the property is descriptor list, else it is a simple descriptor.
links
Persistent objects can refer to other persistent objects. The corresponding properties are called links. If the type of the persistent property is a subclass of DBObject, then the property is a link.
link lists
Persistent properties can also refer to arrays of persistent objects, in which case they are called link lists. Link lists are distinguished from links by the property default. If the default is a Pythonlist, the the property is link list.
self-links
A special case of links are links to other objects of the same type. These are called self-links. if no type and default are specified for the call to persistent, then the property is a self-link.

Keys

It is possible to use persistent properties as alternative object identifiers for the default object identifier (object_id). Only descriptors can be used as keys. Keys are alway unique and indexed.

The special attribute keys contains a list of attributes and tuples of attributes tuples, each specifying one key. For example:

#example3.py
class Employee(DBObject):
    ssi = pesistent('Social Security Number', str, '')
    name = persistent('Name', str, '')
    birth = persistent('Birth data', datetime, None)
    keys = [('ssi',), ('name', 'birth')]

In this example ssi is a key. The pair of attributes (‘name’, ‘birth’) is also a key.

Indices

Databases use indices to optimize queries. It is possible to specify which persistent properties should be used as indices.Only descriptors can be used as indices.

The special attribute indices contains a list of attributes which should be indexed. E.g.:

# example4.py
class Example(DBObject):
    attr = persistent('A measurement', float, 0.0)
    indices = ['attr']

Persistent Objects

Having specified persistent classes, we can now use these classes to instantiate and manipulate persistent objects. In most respects these objects behave just like instances of ordinary classes. There are two exceptions: special rules for instantiation, and special rules for assigning values to persistent properties.

Object instantiation

We can distinguish between three different modes of instantiating a persistent object.

New
We are creating a new persistent object, for which the object_id needs to be generated. This can be accomplished by instantiating an object without specifying object_id.
Existing
We are using an existing object. If the object has already been instantiated in this application we want a copy to its reference, otherwise we want an instance, whose state has been retrieved from the database. This can be accomplished by instantiating the object with an existing object_id.
Transient
it may be useful to build an object of a persistent type that is not itself persistent (whose state, will not be save to the database). This can be accomplished by instantiating the object with an object_id equal to 0 (zero)

or, in code:

a = MyObject()               # A new instance of MyObject
b = MyObject(object_id=1000) # An existing instance of MyObject
c = MyObject(object_id=0)    # A transient instance of MyObject

In practice, objects are rarely instantiated with an explicit object_id, because, we will generally not know the object_id of the objects we are interested in. Rather, objects are instantiated using keys or as the result of a query (see below)

Instantiating an object using a key, will result a restored object (if an object of that key did exist before) or a new object. In code:

class Filter(DBObject):
    band = persistent('the band name', str, '')
    keys = ['band']

f = Filter(band='V')       # The V-band filter

Assigning values to properties

Python is a dynamically typed language. This means that there is no such thing as the type of a variable. However, since database values (e.g. columns) are statically typed, the interface performs type checks when binding values to object attributes. The type is specified in the property definition, as outlined earlier.

Queries

In order to represent selections in native Python code, we have defined a notation that is based on the idea that a class is in some sense equivalent to the set of all its instances. To illustrate the concept, let us give a few examples.

Given a persistent class X with persistent property y, then the expression

X.y == 5

represents the set of all instances x of X, or subclasses of X, for which x.y==5 is true. To obtain these objects the expression needs to be evaluated, which can be done by passing it to the select function, which returns a list of objects satisfying the selection.

Given a class X with a descriptor desc, a descriptor list dsc_lst, and a link lnk, then

select(X.desc > 2.0 && X.dsc_lst[2]=='abc' and X.lnk.attr == 5)

will return a list of instances x of X, or subclasses of X, for which

x.desc > 2.0 and x.dsc_lst[2]=='abc' and x.lnk.attr == 5

is true.

Functionality not addressed by the interface

New persistent objects may have an owner. The owner can defined as the user running the process in which the persistent object is created or it can be defined as an attribute of the persistent object. In either case, it is the responsibility of the implementation of the interface for a certain database to handle ownership of persistent objects.

[1]This implementation neatly avoids the problem of having to maintain both the class hierarchy and the corresponding database schema