Posts Tagged ‘connection pool’
Cassandra: Up and running quickly in Java using Pelops
Pelops
In Greek mythology Cassandra is captured by the triumphant king Agamemnon after the fall of Troy, with whom she has two sons, Pelops and Teledamus. This Java client library is Pelop’s namesake nicknamed “Cassandra’s beautiful son” because it offers a beautiful way to code against the Cassandra database. This is a quick introduction to the library.
You can find the open source code here http://pelops.googlecode.com/
Objectives
Pelops was born to improve the quality of Cassandra code across a complex commercial project that makes extensive use of the database. The main objectives the library are:
- To faithfully expose Cassandra’s API in a manner that is immediately understandable to anyone:
simple, but beautiful - To completely separate low-level concerns such as connection pooling from data processing code
- To eliminate “dressing code”, so that the semantics of data processing stand clear and obvious
- To accelerate development through intellisense, function overloading and powerful high-level methods
- To implement strategies like load balancing based upon the per node running operation count
- To include robust error handling and recovery that does not mask application-level logic problems
- To track the latest Cassandra releases and features without causing breaking changes
- To define a long-lasting paradigm for those writing client code
Up and running in 5 minutes
To start working with Pelops and Cassandra, you need to know three things:
- How to create a connection pool, typically once at startup
- How to write data using the
Mutator
class - How to read data using the
Selector
class.
It’s that easy!
Creating a connection pool
To work with a Cassandra cluster, you need to start off by defining a connection pool. This is typically done once in the startup code of your application. Sometimes you will define more than one connection pool. For example, in our project, we use two Cassandra database clusters, one which uses random partitioning for data storage, and one which uses order preserving partitioning for indexes. You can create as many connection pools as you need.
To create a pool, you need to specify a name, a list of known contact nodes (the library can automatically detect further nodes in the cluster, but see notes at the end), the network port that the nodes are listening on, and a policy which controls things like the number of connections in your pool.
Here a pool is created with default policies:
Pelops.addPool( "Main", new String[] { "cass1.database.com", "cass2.database.com", "cass3.database.com"}, 9160, new Policy());
Using a Mutator
The Mutator
class is used to make mutations to a keyspace (which in SQL speak translates as making changes to a database). You ask Pelops
for a new mutator, and then specify the mutations you wish to make. These are sent to Cassandra in a single batch when you call its execute
method.
To create a mutator, you must specify the name of the connection pool you will use and the name of the keyspace you wish to mutate. Note that the pool determines what database cluster you are talking to.
Mutator mutator = Pelops.createMutator("Main", "SupportTickets");
Once you have the mutator, you start specifying changes.
/** * Write multiple sub-column values to a super column... * @param rowKey The key of the row to modify * @param colFamily The name of the super column family to operate on * @param colName The name of the super column * @param subColumns A list of the sub-columns to write */ mutator. writeSubColumns( userId, "L1Tickets", UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time mutator.newColumnList( mutator.newColumn("category", "videoPhone"), mutator.newColumn("reportType", "POOR_PICTURE"), mutator.newColumn("createdDate", NumberHelper.toBytes(System.currentTimeMillis())), mutator.newColumn("capture", jpegBytes), mutator.newColumn("comment") )); /** * Delete a list of columns or super columns... * @param rowKey The key of the row to modify * @param colFamily The name of the column family to operate on * @param colNames The column and/or super column names to delete */ mutator.deleteColumns( userId, "L1Tickets", resolvedList);
After specifying the changes, you send them to Cassandra in a single batch by calling execute
. This takes the Cassandra consistency level as a parameter.
mutator.execute(ConsistencyLevel.ONE);
Note that if you need to know a particular mutation operation has completed successfully before initiating some subsequent operation, then you should not batch your mutations together. Since you cannot re-use a mutator after it has been executed, you should create two or more mutators, and execute them with at least a QUORUM consistency level.
Browse the Mutator
class to see the methods and overloads that are available
here
Using a Selector
The Selector
class is used to read data from a keyspace. You ask Pelops
for a new selector, and then read data by calling its methods.
Selector selector = Pelops.createSelector("Main", "SupportTickets");
Once you have a selector
instance, you can start reading data using its many overloads.
/** * Retrieve a super column from a row... * @param rowKey The key of the row * @param columnFamily The name of the column family containing the super column * @param superColName The name of the super column to retrieve * @param cLevel The Cassandra consistency level with which to perform the operation * @return The requestedSuperColumn
*/ SuperColumn ticket = selector.getSuperColumnFromRow( userId, "L1Tickets", ticketId, ConsistencyLevel.ONE); assert ticketId.equals(ticket.name) // enumerate sub-columns for (Column data : ticket.columns) { String name = data.name; byte[] value = data.value; } /** * Retrieve super columns from a row * @param rowKey The key of the row * @param columnFamily The name of the column family containing the super columns * @param colPredicate The super column selector predicate * @param cLevel The Cassandra consistency level with which to perform the operation * @return A list of matching columns */ List<SuperColumn> allTickets = selector.getSuperColumnsFromRow( userId, "L1Tickets", Selector.newColumnsPredicateAll(true, 10000), ConsistencyLevel.ONE); /** * Retrieve super columns from a set of rows. * @param rowKeys The keys of the rows * @param columnFamily The name of the column family containing the super columns * @param colPredicate The super column selector predicate * @param cLevel The Cassandra consistency level with which to perform the operation * @return A map from row keys to the matching lists of super columns */ Map<String, List<SuperColumn>> allTicketsForFriends = selector.getSuperColumnsFromRows( Arrays.asList(new String[] { "matt", "james", "dom" }, // the friends "L1Tickets", Selector.newColumnsPredicateAll(true, 10000), ConsistencyLevel.ONE); /** * Retrieve a page of super columns composed from a segment of the sequence of super columns in a row. * @param rowKey The key of the row * @param columnFamily The name of the column family containing the super columns * @param startBeyondName The sequence of super columns must begin with the smallest super column name greater than this value. Passnull
to start at the beginning of the sequence. * @param orderType The scheme used to determine how the column names are ordered * @param reversed Whether the scan should proceed in descending super column name order * @param count The maximum number of super columns that can be retrieved by the scan * @param cLevel The Cassandra consistency level with which to perform the operation * @return A page of super columns */ List<SuperColumn> pageTickets = getPageOfSuperColumnsFromRow( userId, "L1Tickets", lastIdOfPrevPage, // null for first page Selector.OrderType.TimeUUIDType, // ordering defined in this super column family true, // blog order 10, // count shown per page ConsistencyLevel.ONE);
There are a huge number of selector methods and overloads which expose the full power of Cassandra, and others like the paginator methods that make otherwise complex tasks simple. Browse the Selector
class to see what is available here
Other stuff
All the main things you need to start using Pelops have been covered, and with your current knowledge you can easily feel your way around Pelops inside your IDE using intellisense. Some final points it will be useful to keep in mind if you want to work with Pelops:
- If you need to perform deletions at the row key level, use an instance of the
KeyDeletor
class (callPelops.createKeyDeletor
). - If you need metrics from a Cassandra cluster, use an instance of the
Metrics
class (callPelops.createMetrics
). - To work with Time UUIDs, which are globally unique identifiers that can be sorted by time – which you will find to be very useful throughout your Cassandra code – use the
UuidHelper
class. - To work with numbers stored as binary values, use the
NumberHelper
class. - To work with strings stored as binary values, use the
StringHelper
class. - Methods in the Pelops library that cause interaction with Cassandra throw the standard
Cassandra exceptions defined here.
The Pelops design secret
One of the key design decisions that at the time of writing distinguishes Pelops, is that the data processing code written by developers does not involve connection pooling or management. Instead, classes like Mutator
and Selector
borrow connections to Cassandra from a Pelops pool for just the periods that they need to read and write to the underlying Thrift API. This has two advantages.
Firstly, obviously, code becomes cleaner and developers are freed from connection management concerns. But also more subtly this enables the Pelops library to completely manage connection pooling itself, and for example keep track of how many outstanding operations are currently running against each cluster node.
This for example, enables Pelops to perform more effective client load balancing by ensuring that new operations are performed against the node to which it currently has the least outstanding operations running. Because of this architectural choice, it will even be possible to offer strategies in the future where for example nodes are actually queried to determine their load.
To see how the library abstracts connection pooling away from the semantics of data processing, take a look at the execute method of Mutator
and the tryOperation
method of Operand
. This is the foundation upon which Pelops greatly improves over existing libraries that have modelled connection management on pre-existing SQL database client libraries.
*–
That’s all. I hope you get the same benefits from Pelops that we did.