Dominic Williams

Occasionally useful posts about RIAs, Web scale computing & miscellanea

Locking and transactions over Cassandra using Cages

with 43 comments

Introduction

Anyone following my occasional posts will know that me and my team are working on a new kids game / social network called http://www.FightMyMonster.com. We are trying to break new ground with this project in many ways, and to support the data intensive nature of what we are trying to do we eventually selected the Cassandra database after working with several others.

This post is about a library we are using with Cassandra called Cages. Using Cages, you can perform much more advanced data manipulation and storage over a Cassandra database. This post explains why and gives some more information.

You can find Cages here http://cages.googlecode.com.

Brief Background

For those that aren’t already familiar with Cassandra (skip this if you are), it can be described as the best representative of a new breed of fast, easily scalable databases. Write operations are evenly spread across a cluster of machines, removing the bottleneck found in traditional SQL database clusters, and it can continue operating even when some nodes are lost or partitioned. The cluster is symmetric in the sense there is no master node, nodes communicate with each other using a P2P protocol and can be easily added and removed by an administrator.

In order to deliver these characteristics,  which are particularly valuable to Web 2.0 enterprises but also will likely prove useful in other industries too, Cassandra offers what is known as a NoSQL model. This model is significantly different to a traditional SQL model, and many coming from more traditional database backgrounds will more easily understand Cassandra as a highly scalable, highly resilient distributed structured storage engine. While NoSQL offers some unique advantages to developers when compared to SQL, it is also the case that whereas in SQL complex operations can be specified in a single statement that is either executed or not (i.e. that have ACID properties), in Cassandra complex operations on data must usually be comprised from several different operations, which can only be made reliable individually.

What Cages is for?

In many cases, websites and systems can be built against Cassandra without regard to ACID issues. Data storage and manipulation can be limited to operations against single rows (and for those that don’t know, rows in NoSQL models are really like multi-level hash tables which can contain hierarchical “ready-joined” data, and generally offer many more possibilities than SQL rows). Where a mutation of these rows must be reliable, or immediately seen by other clients of the database, Cassandra allows the developer to choose from a range of consistency levels that specify the tradeoff between performance, safety of storage and the timeliness with which data becomes visible to all.

This system is undeniably very effective, but when the systems you are building involve complex data structures and manipulation, you can still quickly reach situations where your logical operations necessarily involve several individual Cassandra read and write operations across multiple rows. Cassandra does not get involved in managing the safety and reliability of operations at the higher logical level, which means guaranteeing the logical consistency of your data can require some extra work. Some people, particularly those wedded to SQL databases, advocate storing some parts of your data in traditional SQL databases. For us though, it is most definitely preferable to develop and use Cages!

What is Cages?

Cages is a new Java library that provides distributed synchronization functionality, and soon additional functionality for things like transactions, by using the services of a ZooKeeper server or cluster. ZooKeeper is a very active project and the system is currently widely used. It started life as a Yahoo Research project, see here http://research.yahoo.com/project/1849 and is now an important Apache project, see http://hadoop.apache.org/zookeeper/. Cages has wide application, but its development will be very much driven by needs in relation to Cassandra.

Using Cages for locking

Cages offers three locking classes, ZkReadLock, ZkWriteLock and ZkMultiLock.

Single path locking

The simplest application of Cages can be to enforce correct updates on data values inside Cassandra (or some other NoSQL database). For example, you may have issues with that old chestnut, the Lost Update Problem. This happens where you read the data with one operation, modify the data and then write it back with a second operation. Problems occur when another client performs the same operation simultaneously, such that the last client to write back the modified value will overwrite the modifications made by the other.

Thus in its most simple form, two clients wish to donate some money to a bank balance. Both simultaneously read the same bank balance value B1. The first client adds donation D1, and writes back (B1 + D1). The second client adds donation D2, and writes back (B1 + D2). Unfortunately bank balance B2 = B1 + D2, and donation D1 has been lost.

Cages provides an easy fix:

    void depositMoney(int amount) {
        ZkWriteLock lock = new ZkWriteLock(“/accounts/” + accountId + “/balance”);
        lock.acquire();
        try {
            // 1. Read the balance
            // 2. Update the balance
            // 3. Write the balance back
        } finally {
            lock.release();
        }
    }

Note that the paths passed to the lock classes can represent actual data paths within a NoSQL model, or can simply represent logical control over parts of a wider data model (so long as your application faithfully adheres to the rules you set).

Multi path locking

The Lost Update Problem is the most simple locking scenario where Cages can be applied. In our case, while many parts of our system even use Cassandra without locking, often with low consistency levels for maximum performance, there are several areas where we necessarily perform complex operations over contended data that involve numerous individual read and write operations. To begin with, we decided to treat the cases by nesting the ZkReadLock and ZkWriteLock single path locking primitives. However, there is a problem doing this in a distributed environment.

It is a simple fact that in a distributed environment, many situations where you acquire single path locks in a nested manner can result in deadlock. For example, if one operation sequentially tries to acquire R(P1) then W(P2), and a second operation simultaneously tries to acquire R(P2) then W(P1), deadlock will likely result: the first operation will acquire R(P1) and the second operation will acquire R(P2), but then the first operation will block waiting to acquire W(P2) and the second operation will block waiting to acquire W(P1).

Avoiding these problems with single path locks is no simple matter. For a start, the logical detection of closed wait graphs (deadlock) in the distributed environment is difficult and expensive to perform. The simplest approach to solving the problem is to try to acquire locks with a timeout, such that if you get into a deadlock situation, your acquire() calls throw an exception and you abandon your attempt. The problem here though is that your code has to handle the exception, and possibly rollback parts of the operation performed earlier under the protection of the outer lock.

For all these reasons, when an operation needs to acquire locks over multiple paths in the distributed environment, ZkMultiLock is the class to use.

ZkMultiLock allows you to specify any number of read and write locks over paths, which may then all be acquired “simultaneously”. If your operation can acquire all the locks it needs together at the outset using ZkMultiLock, this avoids any possibility of deadlock. This provides slightly worse performance where multiple paths are specified and locks on the paths are highly contended. But in practice, locks are rarely that highly contended, and you just need to guard against the disaster of simultaneously running operations interfering with each other and corrupting data. Because of the dangers of deadlock, in the Fight My Monster project have mandated that only ZkMultiLock can be used unless there are very special reasons, a situation we have not yet encountered.

    void executeTrade(long lotId, String sellerId, String buyerId) {
        // In the following we need to hold write locks over both the seller and buyer's account balances
        // so they can be checked and updated correctly. We also want a lock over the lot, since the value
        // of lots owned might be used in conjunction with the bank balance by code considering the
        // total worth of the owner. Acquiring the required locks simultaneously using ZkMultiLock avoids
        // the possibility of accidental deadlock occurring between this client code and other client code
        // contending access to the same data / lock paths.
        ZkMultiLock mlock = new ZkMultiLock();
        mlock.addWriteLock("/Bank/accounts/" + sellerId);
        mlock.addWriteLock("/Bank/accounts/" + buyerId);
        mlock.addWriteLock("/Warehouse/" + lotId);
        mlock.acquire();
        try {
            // 1. check buyer has sufficient funds
            // 2. debit buyer's account
            // 3. credit seller's account
            // 4. change ownership of goods
        } finally {
             mlock.release();
        }
    }

Transactions for Cassandra

Transactions are a planned feature at the time of writing, 12/5/2010. It should be too long before they make it into the library, so I will explain a bit about them here.

Locking allows you to synchronize sequences of read and write (mutation) operations across rows stored on your Cassandra cluster. However, the locking classes do not solve the problem that occurs when part way through a complex operation your client machine expires, leaving the data inside the distributed database in a logically inconsistent state. For many applications the likelihood of this occurring is low enough for the locking classes alone to be sufficient. But there may be a small number of operations within applications for which data simply must be logically consistent, and even a very rare failure is unacceptable. This is where transactions come in.

For those that are interested, the following explains how they will work.

A new ZkTransaction class will provide the functionality, and it will need to be used in conjunction with the ZkMultiLock class. ZkTransaction will provide a simplified version of the Cassandra Thrift API that allows a series of data mutation operations to be specified. Client operations wil will proceed by first specifying the necessary locks that must be held, and then specifying the set of data mutations that must be performed by the transaction. When the transaction has been specified, it’s commit() method must be called passing the ZkMultiLock instance as a parameter.

At this point, internally Cages will add a reference to a transaction node created on ZooKeeper from each single path lock node held. The ZkTransaction instance reads from Cassandra the current values of the data it is required to modify, and writes it into the transaction node as a “before” state. Once this is done, it sets about applying the data mutations specified in the necessary sequence of individual Cassandra read and write (mutate) operations. Once all operations are performed, the references to the transaction node from within the locks are removed, and then finally the transaction node itself is deleted – the transaction has now been committed, and the developer can release() the ZkMultiLock.

ZkTransaction can provide a guarantee of consistency for Cages clients because if during the execution of the sequence of individual Cassandra mutation operations the client machine suddenly dies, Cages will immediately revoke the locks the client holds. From this point any instances of ZkReadLock, ZkWriteLock or ZkMultiLock wishing to acquire the released paths must first rollback the transaction node by returning the relevant data to its original “before” state specified. The key point is that any processes that need to see the data in a logically consistent state, and therefore always acquire locks referencing the data in question before accessing it, will always see it as such. This provides a form of ACID for complex operations against a Cassandra database.

    void executeTrade(long lotId, String sellerId, String buyerId) {
        ZkMultiLock mlock = new ZkMultiLock();
        mlock.addWriteLock("/Bank/accounts/" + sellerId);
        mlock.addWriteLock("/Bank/accounts/" + buyerId);
        mlock.addWriteLock("/Warehouse/" + lotId);
        mlock.acquire();
        try {
            // 1. check that buyer has sufficient funds
            // ....

            // 2. perform mutations using transaction object
            ZkTransaction transaction = new ZkTransaction(NoSQL.Cassandra);
            transaction.begin(mlock);
            try {
                // 2. debit buyer's account
                transaction.insert(buyerId, "accounts", bytes("balance"), bytes(newBalance));
                // 3. credit seller's account
                // ...
                // 4. change ownership of goods
                // ...
            } finally {
                transaction.commit();
            }
        } finally {
             mlock.release();
        }
    }

Scalability and hashing ZooKeeper clusters

It is worth saying first off that a three node ZooKeeper cluster using powerful machines should be able to handle a considerable workload, and that where usage of locking and transactions is limited on an as-needed basis, such a setup will be able to provide for the needs of many Internet scale applications. However, it is easy to conceive of Cassandra being applied more widely outside of typical Web 2.0 norms where usage of locking and transactions is much heavier, and therefore the scalability of ZooKeeper must be examined.

The main issue is that for the purposes described it is not desirable to scale ZooKeeper clusters beyond three nodes. The reason for this is that while adding nodes scales up read performance, write performance actually starts degrading because of the need to synchronize write operations across all members, and therefore clustering really offers availability rather than performance. A good overview of the actual performance parameters can be found here http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperOver.html. The question then, is what to do where ZooKeeper becomes a bottleneck.

The solution we suggest is simply to run more than one ZooKeeper cluster for the purposes of locking and transactions, and simply to hash locks and transactions onto particular clusters. This will be the final feature added to Cages.

Note: since I wrote the above Eric Hauser kindly drew my attention to the new “Observers” feature in ZooKeeper 3.3. This may greatly raise the limit at which hashing to separate 3 node clusters becomes necessary. I am hoping to collate performance information and tests in the near future so people have more of an idea what to expect. See http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html

That’s it. Hope it was interesting. Please bear with me as Cages develops further over the coming weeks and feel free to test and report.

Final note

Check out the comments too because there are already useful several clarifications and expansions there.

Written by dominicwilliams

May 12, 2010 at 10:10 pm

43 Responses

Subscribe to comments with RSS.

  1. The zookeeper docs you point clearly indicate that 5 node ZK clusters offer the best performance at high read-rate – once you hit ~ 75% read rate, the 5 node cluster substantially outperforms the 3 node cluster.

    I fear in the end ZK performance will lower your TPS and increase the per-TX time to an unacceptable level. Using the same graph, it seems to indicate you will top out at about 45k ops/sec at a 50% read load. Also I believe that the individual TX execution time would be limited to the fastest a ZK operation can complete (I think about 2-10ms) * # of operations (realistically minimum 2).

    We’d all love to hear how performant your implementation is in production when you get it there!

    Good luck!

    ryan

    May 12, 2010 at 11:27 pm

    • Hi Ryan,

      Thanks for the post. Will fill you in on our thinking in relation to performance.

      First of all we definitely don’t advocate using Cages for everything. In fact, we think one of the main lessons of NoSQL is that you can get too obsessive about locking and transactions. So in our current project wherever possible we try to design things so that locking isn’t necessary. We also use the lowest levels of consistency possible to maximize performance, and often use ZERO and ANY.

      I think that it’s worth noting that in the past people have often over estimated the real commercial cost associated with the risk of a client machine failing half way through a complex data manipulation. Yes, this is bad, but it only happens very rarely, and if you put transactions on everything, and your systems regularly slow to a crawl upsetting customers, and you have to spend money on serious hardware, that’s a much bigger cost. As I understand, eBay dropped transactions from their SQL stuff many years ago for that reason.

      That being said, I reckon most complex projects will have a few areas where locking really is necessary. In our case, we absolutely need locking to avoid regular problems in some of the more complex stuff we do. Our attitude towards transactions is that we don’t think client machines (application layer servers) will be failing regularly enough to make them a commercial priority, but I want to get them added to Cages anyway for completeness.

      Going back to performance, it is of course possible that some very large scale systems would hit the limits of their 3 node ZooKeeper cluster if they are not conservative as they should be in their use of locking (and later transactions).

      The solution however is reasonably simple – you add new ZooKeeper clusters, and hash lock paths onto them. With this you can have horizontally scalable locking and transactions.

      Cages transaction system will only be slightly more complex, in that references to transaction nodes from within lock nodes must reference the particular cluster on which the transaction resides.

      Quick note about performance, we need to get some proper benchmarks done but we are working to the assumption that a single ZooKeeper cluster on standard hardware should be able to do about 15K ops per seconds. Doesn’t sound too good right, but remember that locking will only be applied to a small minority of our most important database operations. In our case if we are doing 15K of those a second, we really will be doing very well (so well in fact we could even hire guys to work full time on ZooKeeper!)

      dominicwilliams

      May 13, 2010 at 11:25 am

  2. Sounds good to me, except I’m unclear on one thing with regards to hashing a transaction to a particular zookeeper cluster. What would you hash to select the zk cluster? If you hash the individual row keys, they could end up on different clusters. If you hash the txn id, then the individual rows could get involved in multiple transactions at the same time on different zk clusters.

    Or maybe i haven’t thought that through enough.

    Matt Corgan

    May 14, 2010 at 1:02 am

    • Hi Matt, I’ve just added some pseudo code to show more clearly how the transactions will work.

      The first point is that a ZkTransaction protects logical consistency for clients synchronizing data access using any of the paths held by the outer ZkMultiLock. The paths that the multi-lock acquires together are abstract – that is, they may correspond to actual paths inside Cassandra, and often will, but they can just as well correspond to abstractions within your application. For example, an application might decide that it wants to work in a very coarse grained way, and that code wanting to synchronize any logical operation over the data pertaining to a user should acquire a path like “/Users/DomW”.

      When you acquire the outer multi-lock, it is in fact internally acquiring single path locks for each path i.e. ZkReadLock or ZkWriteLock, albeit in a special way. These single path locks decide on which ZooKeeper cluster their nodes belong by hashing their paths. So for example, a ZKReadLock over “/Users/DomW” might be hashed to cluster A, while a ZkReadLock over “/Users/MattC” might be hashed to cluster B.

      Once you have acquired your multi-lock, you can then create the transaction. Just like the single path locks held by the multi-lock, ZkTransaction also needs to create a node to hold the “before” state of the data, and it simply uses a hash of a concatenation of the paths in the multi-lock to find its cluster. We can do this because we never need to “lookup” a transaction, since a transaction places references to itself inside the lock nodes of the single path locks.

      If the for some reason the transaction does not complete, the first future lock over any of the paths held by the original multi-lock the transaction was bound to, will in fact take responsibility for rolling back the transaction.

      dominicwilliams

      May 14, 2010 at 10:07 am

  3. Have you seen the observers feature in 3.3?

    http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html

    It would allow you to scale beyond 3 machines before going to multiple ensembles.

    Eric Hauser

    May 14, 2010 at 5:04 pm

  4. Is the source for Cages available?

    Jonathan Ellis

    May 15, 2010 at 2:54 pm

  5. … Ah, it’s linked in the sidebar there. Might want to include it in the article text for clarity.

    Jonathan Ellis

    May 15, 2010 at 2:58 pm

  6. Is Cages open source? IF so, where can I find it? Was thinking of implementing something like this for a few months now to manage updates across a UI cluster and API cluster which hit the same store.

    HP

    May 16, 2010 at 8:19 pm

  7. Why not just use a hybrid system: a database for transactional stuff and Cassandra for non-transactional stuff?

    Mike perham

    May 21, 2010 at 4:09 am

    • Hi some people have suggested and are even doing just that, but there are a number of issues e.g. (i) you now need to maintain multiple database systems (ii) your data access code is going to be a mixture of SQL and NoSQL (iii) you need to coordinate all this, and (iv) perhaps most importantly of all, if the operations you want to synchronize involve high write volume, you may still hit the SQL writes bottleneck. I believe that properly implemented distributed synchronization for NoSQL will allow for horizontally scalable synchronized writes and transactions, although for the moment this remains to be proven. My current project involves quite a lot of synchronization and I’m hoping it will go some way to showing how it can work in practice.

      dominicwilliams

      May 21, 2010 at 9:58 am

  8. mm very interesting; and a pleasant read too. Thanks!

    DrTune

    June 8, 2010 at 10:39 am

  9. Hi Dominic, very interesting post.
    One thing is not clear to me is the transactional aspect.
    When you mention transactions(TX) and rollback a TX, does that include to rollback the data in Cassandra?

    Your words were: …”From this point any instances of ZkReadLock, ZkWriteLock or ZkMultiLock wishing to acquire the released paths must first rollback the transaction node by returning the relevant data to its original “before” state specified.”….

    What is not clear to me is the following:
    In a TX that involves inserting data in multiple keys, it might happen that only few rows were updated and some other ones not. Thus, the TX failed.
    Later, another client goes and read those rows and turned out to be that their statutes correspond to an incomplete previous TX.

    Would you mind elaborating a bit more how TX works?

    Thanks a lot. The quality of the post is excellent.

    Patricio E.

    June 13, 2010 at 10:06 pm

    • Hi the point here is that unlike in a traditional SQL database locking, transactions and their observance is optional and performed in the application layer. An application can use Cages to define some locks, and create transactions associated with them which store a “before” state. Other application code that observes the locks according to the scheme described will only ever “see” the database in a consistent state i.e. either where the whole transaction is applied, or not at all. It is important to realize that your Cassandra clients have to agree to use Cages locks to access the data concerned, or it does not work. Otherwise there could even be negative interference – for example, you could lock some data using Cages, begin a transaction, and then part way through fail such that a future waiter on one of your locks will start rolling back. But if in the mean time another client not following the scheme is mutating the data in question without first acquiring the lock your scheme requires, then that data might be overwritten and lost as the transaction is rolled back. Hope makes sense

      dominicwilliams

      June 14, 2010 at 12:37 pm

  10. I’ve been doing some work with ZK and Cassandra and the scenario that worries me (related to needing write locks ) is a sort of split brain scenario.

    Specifically: You often see a pattern like: Get Lock, Do stuff, Release lock. etc (Which you show a variant above)…

    If you take a long time to “Do stuff” after the test, you expose yourself to some scenarios where you’ve actually lost the lock, someone else got it, and now you have to writers.

    Rare, I know.

    What I have done is added a bit of backoff time to my Lock protocol Implementation. The data on the ZkNode has an “Owner/bit/string/etc” If the Owner dies while that is set, then the next owner will insert a bit of sleep before proceeding and setting it’s own bit/etc on the zkNode. If the Owner cleans up successfully before releasing, then the bit is cleared, and the next owner can proceed without delay. Size your backoff according to your paranoia.

    anonymous

    July 5, 2010 at 11:07 pm

  11. I start to use cages in our system, and I met a problem. When I invoke lock(), I think it will create a path persist in the zookeeper, after I invoke unlock(), the path will not be removed. Next time I lock the same path, it will occur the zNode existing exception in the zookeeper server. I think cages not check zNode existing, Is there any ways to avoid this?

    caroline

    July 9, 2010 at 7:49 am

    • Hi Caroline, it is currently a feature of ZooKeeper that you cannot have transient parent nodes i.e. nodes that are automatically deleted when the last child node is deleted. This is not necessarily a big issue – periodically the “dead” lock nodes could be deleted during downtime if so many accumulated that your disk was running short. It is a little ugly and we have been in discussions on the ZooKeeper list about changing this, although other pressures have delayed implementation.
      Nonetheless, you should always be able to acquire and release the same lock path! The code should be handling the case where a node already exists etc… I suspect there may be some other problem at play here, but anyway I’ve asked a developer to check it out (I’m currently on holiday). Maybe something went wrong. Some news soon.
      Thanks, Dominic

      dominicwilliams

      July 12, 2010 at 12:44 pm

      • Hi I’m on holiday but apparently it all works fine according to tests – in our Fight My Monster project we regularly lock/unlock paths so something would have showed (and developer also just double checked) so sounds like something else may have gone awry. Hope this help.s

        dominicwilliams

        July 12, 2010 at 1:02 pm

  12. How does cages (or zookeeper) handle the scenario where you grab a lock, the process dies or hangs, and now the lock is orphaned. does cages or zookeeper have a configurable lock timeout? thx!

    btoddb

    August 2, 2010 at 6:41 pm

    • Hi this comes out of the box thanks to the design of ZooKeeper – if your process dies it will lose its ZooKeeper session, which will mean the “temporary” sub-nodes of all locks created by Cages will disappear, allowing other processes to acquire those locks

      dominicwilliams

      August 2, 2010 at 7:36 pm

  13. Hi Dominic,
    Could cages be used similar to a Barrier for temporary locking during failover? We have a Leader/Follower architecture where the Leader partitions the data sets all nodes will process. When a node is added or removed. We need to temporarily halt processing on our cluster while the partition occurs. Read locks on followers with a Write lock on the leader seems like it would work at first glance. Is this the case?

    Todd Nine

    August 26, 2010 at 9:51 pm

  14. I’ve being playing with the cages code, mainly for write locks, and had a similar question to one asked above.

    If a client has lost a lock due to session timeout, I would expect the lock.release() to fail in some fashion. It doesn’t, so I’m wondering why this might be OK. I guess the release point might be too late to revert whatever activities the client was performing, but an exception might allow logging information about potential lost updates.

    aebaugh

    October 26, 2010 at 10:42 pm

    • Hi, first of all, a general request: please make sure you are using 0.7.0-SNAPSHOT from github. This includes a very important bug fix to ZkMultiPathLock. Also we would recommend that you make programmers use ZkMultiPathLock for everything, because it prevents distributed deadlocks. We now have cages in production on http://www.FightMyMonster.com, where it has been getting a real thrashing at peak times i.e. weekends and in the hours after school (UK time, as only launched in UK so far) and is working fine.

      Back to your question.. yes I can see the advantages of throwing from release(). My only worry is that release() might be happening before other stuff inside a finally clause, which would then be disrupted… maybe the ZkBaseLock could detect failure in release() and log stack trace to SL4J, but not throw.. what do you think?

      dominicwilliams

      November 18, 2010 at 10:22 am

  15. We needed locking behavior similar to the ZkWriteLock, but different enough that we created a new ZkSyncPrimitive subclass.
    In this case we didn’t want to queue on acquiring an async lock, we wanted the lock path to be ephemeral, and wanted to avoid the call to create the base/persistent path when the lock object was created.
    Wouldn’t mind sharing it back if this sounds useful to others.

    aebaugh

    November 17, 2010 at 10:25 pm

  16. I just started using cages and have a need to use ZkTransaction. I found that current source in github does not include “transaction” support. Could you please let me know where can I get the update version of cages?

    Baskar Duraikannu

    April 3, 2011 at 12:13 am

    • Hi, the current version of Cages doesn’t support commit/rollback transactions yet. This post explains how that functionality can be added, but for the moment it is a distributed locking system. In our case (FightMyMonster.com) we can do without transactions. We write to the Cassandra database using Pelops, which has a very low failure rate because operations are retried, and the database itself can lose nodes but continue operation. Since the “nuggets” we write are virtual, we can take this risk – which anyway is very low and partial operation completion is a problem we have never detected. For a “real” bank though transactions would be needed, and I hope someone in the community can justify adding them to Cages.

      dominicwilliams

      April 4, 2011 at 8:24 am

  17. Hi Dominic, do you guys have any Base class support to be able to run ZooKeeper in memory for testing purposes in a similar way to what cassandraDaemon does?

    Thanks

    Patricio

    April 6, 2011 at 8:39 pm

    • Hi, I’m not linked to the ZooKeeper team, we just develop Cages which uses it. That said, it would be great to add some unit tests to Cages that use an in-memory version of ZooKeeper. This is what we do with Pelops (our Cassandra client, which loads a copy of the Cassandra database in memory for unit tests). This is open source of course, so if you can find a way to do this with ZooKeeper, any patches very welcome 😉

      dominicwilliams

      April 7, 2011 at 10:37 am

  18. Hi, yeah it’s clear you are in ZK team.

    I got it working last night. I’ll polish it a bit and will send you a pull request to make it part of cages for testing environment.

    Thanks

    Patricio Echague

    April 7, 2011 at 5:53 pm

  19. Please ignore my previous post. It was my mistake. I was acquiring twice a read lock, and of course it succeeded.

    Patricio

    April 24, 2011 at 10:00 pm

  20. Hi, william.
    I have two thoughts on Cages.
    Firstly, in order to prevent the deadlock in concurrency control you take the strategy of MultiLock which get all the lock at a time or none. However, I found the performance of this strategy is bad which is worse than that in Mysql Ndb cluster in my TPCW-Benchmark for that clients always get failed to acquire the multilock when there exists many clients request for the same lcok at same time. The Exponential Back off Algorithm is not suitable in this situation. Compared with it I think maintaining the lock wait-for graph in the Zookeeper side will be better althouth it still cause the bottleneck in the system. And we can fix the issue to make the graph into distributed state.
    Secondly, In the implementation of transaction, compared with storing the snapshot of locked data in the zookeeper, it could be better to store them in a specified keyspace of cassandra for reason of speed and reliability.

    wish you a happy life!

    kiddlee

    May 11, 2011 at 8:48 am

    • Hi yup the implementation of ZkMultiLock we have released uses the paranoid algorithm of try acquire all, if fail to acquire any, then release all and retry after BEB. This will not work well if you have very high contention for specific locks – generally we don’t and that’s why we have been using this. However, you will notice that ZkMultiLock sorts all its lock paths, and acquires them in sorted order. So long as all processes in your system use ZkMultiLock to acquire locks, it is perfectly safe for ZkMultiLock to ***wait*** to acquire each lock path i.e. you will never get distributed deadlock and you will not get performance issues when there is high contention for specific locks. What is possible, is that the next release of Cages will make ZkReadLock and ZkWriteLock private classes, and ZkMultiLock will be updated to wait on paths. Since locks can only be acquired via ZkMultiLock, deadlock will be impossible, but the issue you mention will be removed.

      dominicwilliams

      May 11, 2011 at 10:35 am

  21. Thanks for this post! Great job! I have similar use case and trying to use Cages. APIS is quite simple but did you plan to publish some more code sample (with zookeeper connection, some tests,…) and/or documentation? Did you plan to put Cages in a maven repository?

    Jérémy

    May 11, 2011 at 2:02 pm

  22. So do you guys have an updated version of cages? Or have you really not had to touch the code since throwing it out on google code? Are you still using it?

    zanson

    September 22, 2011 at 6:33 pm

    • Hi Cages went through a few iterations before being uploaded. There are some improvements we’d like to make, but it works and it is what we are using in production right now

      dominicwilliams

      November 11, 2011 at 12:20 pm

  23. Hi, is the whole concept the assumption that access to Cassandra is limited to only via Cages. And through Cages you lock/unlock resources outside Cassandra (i.e. zookeeper resources) to indirectly lock Cassandra record ?

    Thanks.

    A

    November 29, 2011 at 5:58 pm

    • Sometimes operations on your data sets need to be serialized, sometimes they don’t. Serialization/locking creates an additional performance overhead, so you should only use it if required by the data modification operation in question.

      dominicwilliams

      November 29, 2011 at 6:15 pm

  24. Thank you for this post and the simple explanation. We are using cassandra and the bank update issue you mentioned is exactly the kind of problem we have. So with this I am very hopeful that the problem will be resolved.
    I want to add that we are very early users of cassandra and we are so grateful to you for this opensource cages and also taking time and effort of explaining , which helps early user fight the problem much more faster that it usually can be. Regards.

    Ramesh

    February 17, 2012 at 4:42 pm


Leave a comment