Experimental Cassandra Lock
·Intro
Cassandra is my favourite column oriented data store, I’ve decided to create a simple Cassandra lock system to be able to do synchronised updates on Cassandra, I hear you say there are several DLMs like Zookeeper or Memcached can be implemented as a simple locking system, but I like a solution purely based on Cassandra itself. After lots of surfing on the net I came up with nothing! absolutely nothing! well, I created one (better to say experimented).
Disclaimer
Disclaimer normally comes at the end of article but, I put it here because, this is PURELY EXPERIMENTAL IMPLEMENTATION AND THIS SOLUTION WAS NOT USED IN PRODUCTION, THIS IS JUST A CRAZY IDEA INTO ACTION. Actually I’m not sure it’ll work at all :-) - sigh - If you are looking for something that works please stop reading here and go back to Google otherwise you are welcome to test it and improve it.
Idea
The original idea is not mine and you can find it on Apache Cassandra web site and it uses Lamport’s Bakery locking algorithm. You can find more information about that in Wikipedia.
Lamport envisioned a bakery with a numbering machine at its entrance so each customer is given a unique number. Numbers increase by one as customers enter the store. A global counter displays the number of the customer that is currently being served. All other customers must wait in a queue until the baker finishes serving the current customer and the next number is displayed. When the customer is done shopping and has disposed of his or her number, the clerk increments the number, allowing the next customer to be served. That customer must draw another number from the numbering machine in order to shop again. – Wikipedia
Definition
I use two Column Families
, Numbers
and Choosing
. Whenever a process wants to enter a critical section
- first it must get a new number, in order to do that it must look into
Choosing
CF
fortrue
value regardless of column names to make sure that nobody is withdrawing new number at the moment. - Then informs other processes by puting its
PID
as column name (in actual code PID is optional and any otherunique ID
can be used) intoChoosing
CF
with value oftrue
, meaning that it’s going to get new number fromNumbers
CF
. - Then the process withdraws new number from
Numbers
CF
by fetching the last number which is registered inNumbers
CF
, and incrementing it by one. - Then puts that new number into
Numbers
CF
along with itsPID
as column name. - Releases the
Choosing
lock by changing thetrue
tofalse
then in its own column (identified byPID
). - At this point the process waits for all other processes to get their number again by looking into
Choosing
CF
. - When all process got their numbers it waits for process with lower numbers to leave critical section and remove their PID from Numbers CF.
- As we can not guarantee any two processes don’t get the same number, we use both chosen number and PID (unique id) to find the higher priority process to enter critical section.
- When the process finds that there is no one with higher priority in the list then it will enter the critical section.
- Lastly when a process wants to leave critical section simply removes its PID from both
Numbers
andChoosing
CF
s.
Talk is cheap, show me the code – Torvalds
Implementation
I used PHP
to implement this concept and PHPCassa library is used to communicate with Cassandra. The column families in Cassandra (ie. Numbers
and Choosing
) must be created with consistency level of QUORUM. Both column families have row key and column name as integers and column values as boolean defined. The PHP
code is available on GitHub.
Final note
Well, that’s all for now, thank you for reading this up to here :-) and please leave comments if you have any interesting idea about this. All patch, fixes, improvements are welcomed only on GitHub.