Syntax for a Relation class

How do we get to B from A?

  • x= R.first( A )
  • x= R.second( A )
  • I don’t think this is useful
  • Other

0 voters


I suggested a Relation class to enable two-way lookups on mapping objects, that if not removing redundancy, at least encapsulates it; and possibly tons of other searches and optimizations. However, I can’t even get off the ground.

I want to know what your intuition says about the syntax. If we declare our relation and one member in it like this:

R = Relation( )
R( A, B )

Then how do we get to B from A?

x= R.first( A ) 


x= R.second( A )

? In the first case, we are saying, ‘where the first field is A’; but in the second, we are saying, ‘get the second of A’. In both cases, the corresponding lookup on B to get to A would be the reverse.

Well, this had been inspired by some experiences in Panda.


Can anyone recommend a better venue for pursuing this?

(Also, nudge.)

So you have to keep R around to know whats the relation?
Thats sound bad. Why not just keep 2 dicts?

R is this case would “be two dicts”. It would want to do many other things, so I need to think of some and give them a syntax.

Why would you keep two dicts around when you could just keep R?

i would do it this way then:

class dictmap(dict):
    def __setitem__(self,a,b):

r = dictmap()

r["A"] = "B"
r["C"] = "D"

print r["A"]
print r["B"]
print r["D"]

That is the correct structure, but it requires a lot of repeated code, which it not obvious can be generated by a program.

The subject of this post is approximately, “How do we design the getitem method?”, assuming we have agreed on your structure so far.

For example, the Relation class would permit a many-to-one member.

r[ A ] = B
r[ C ] = B

print r[ B ]

For the many-to-many case, we would want values in both dicts to be sets, populated with one element when the key is added first, and removed when the last mapping for the key is removed.

For the ternary relation case, we would generally map each index to a tuple corresponding to each recordset, then return the tuple or a slice when queried. This gives us a more general form.

R = Relation( )
R( A, B, C )

These will return the entire recordset, that is, participant in the relation.

x= R[ 0 ][ A ]
x= R[ 1 ][ B ]

They are equivalent to selecting ‘WHERE column0 = A’ and ‘WHERE column1 = B’; and they return their entire tuple. To get C, we would use

x= R[ 0 ][ A ][ 2 ]

or ‘SELECT column2 FROM r WHERE column0 is A’. This gets us a list with one element, so if we know we only have one return, we can declare that fact about the second field when we declare R, or just use the ‘.one’ descriptor or the ‘[ 0 ]’ index.

x= R[ 0 ][ A ][ 2 ].one
x= R[ 0 ][ A ][ 2 ][ 0 ]

My idea here is that we want to be able to use Python equality. I also want rich comparison, which means that some of the mapping types must be sequences, maxing-out at log-N search and keeping sorted order. Ideally, we would keep a hash and a tree for every field, attempting hash lookups first on equality, defaulting to b-tree lookups for failed hash tests and non-equality comparisons, as well as distinguish between equality lookups and identity selections and lookups. For completeness, we could include the unsorted unhashed collection too: linear. However, if we are testing anything but identity, we can’t use getitem syntax; we’ll have to call a method. I also want to fulfill the native mapping interface and additionally the sequence interface for an additional field option.

We have not addressed the issue of return type. Where it’s known that one field is necessarily unique, we want to return an item, not a sequence; and a sequence otherwise.

Let’s look at a generic parent-children structure. We’re not assuming there’s only one graph; it could be disjoint while still participating in R. We want to define elements like this:

class Something:
    def parent( self ):
        return self.R.where child is self

We could build the relation into the class, as with a subclass,

class SomethingSpecific( Something ):
    R = Relation( )

so ‘self.R’ would succeed (even though R is in the child class!), or use an initializer.

class Something:
    def __init__( self, R ).

My favorite slick idea is to use a string to execute our where clauses, and compile it using the runtime language service. It has to be a valid Python string though we get to do anything we want with identifiers. Or we could eval it in a custom context. Basically we’re freed from the And( Or( ), Or( ) ) construct; we get ‘( a or b ) and ( c or d )’.

For an example of a ternary relation, we can look at the parent-children tree, adding order is important among children.

Other examples are an ordered dict (3-way with sequence), a thread-to-lock association (many-many), xml parent-children, and Panda’s scene graph (3-way, with path IDs).

The big problem I keep hitting, is that we’ll be storing mutable values; unless the only additional types we want to add are tuples and frozensets.

In the bisect module, values can be mutable but a list can be maintained in sorted order, relying on merely the discretion of the programmer for structural integrity; so our project isn’t without precedent. In ‘bisect’, if an item’s value changes, you ought to remove and re-insert it. Is this sufficient for a relational class, or ought we to provide an ‘is_mutating’ operation?

In another issue, I discovered that fields will need to know their index in the recordset (tuple), if they are to be “managed”, such as in an ordered membership. Further, if the entire table isn’t ordered together, the field will need subsets, determined by other fields, which starts to sound most precarious. Lastly, an ordered membership (roughly the odict in newer Python), can have either constant-time read or constant-time write, but not both; corresponding to an array and a linked-list respectively.

However, the syntax does appear to turn out to be about as simple as I wanted:

    # declare your own relation
    TreeRelation= Relation(
        ( 'parent', HashedField( ) ),
        ( 'child', HashedField( ) ),
        ( 'position', MultiSequenceField( operator.itemgetter( 0, 1 ) ) )

    # declare a participant with descriptor access
    class TreeNode: #directed graph
        def parent( self ):
            #return self.R.where child= self
            a= self.R.fields[ 1 ][ 1 ].hashed.get( self, None )
            if a is not None:
                a= a[ 0 ][ 0 ]
            return a

    class TreeOne( TreeNode ): #*is* a relation, or *has* a relation?
        R= TreeRelation # will need a factory for >1

    nodes= [ TreeOne( ) ] # first and top entry

    for _ in range( 100 ): # random additions
        parent= random.choice( nodes )
        child= TreeOne( )
        TreeRelation( parent, child, -1 )
        nodes.append( child )

    for x in nodes: # print!
        print x.parent

Basically, you get to leave your cyclic references in your locker. Our local ‘nodes’ collection may be superfluous as well.

The problem is the “about”.

Where we run into the queries itself, is:

    class TreeNode: #directed graph
        def parent( self ):
            #return self.R.where child= self
            a= self.R.fields[ 1 ][ 1 ].hashed.get( self, None )
            if a is not None:
                a= a[ 0 ][ 0 ]
            return a
        def children( self ):
            #return self.R.where parent= self
            a= self.R.fields[ 0 ][ 1 ].hashed.get( self, [ ] )
            a= [ x[ 1 ] for x in a ]
            return a

Here, as you can see, ‘TreeNode’'s methods are relying on the details of the implementation of Relation, which aren’t shown but can be. Of course they shouldn’t rely on them, but what is the proper encapsulation?