Dominic Williams

Occasionally useful posts about RIAs, Web scale computing & miscellanea

Cassandra: Up and running quickly in Java using Pelops

with 52 comments

Pelops

In Greek mythology Cassandra is captured by the triumphant king Agamemnon after the fall of Troy, with whom she has two sons, Pelops and Teledamus. This Java client library is Pelop’s namesake nicknamed “Cassandra’s beautiful son” because it offers a beautiful way to code against the Cassandra database. This is a quick introduction to the library.

You can find the open source code here http://pelops.googlecode.com/

Objectives

Pelops was born to improve the quality of Cassandra code across a complex commercial project that makes extensive use of the database. The main objectives the library are:

  • To faithfully expose Cassandra’s API in a manner that is immediately understandable to anyone:
    simple, but beautiful
  • To completely separate low-level concerns such as connection pooling from data processing code
  • To eliminate “dressing code”, so that the semantics of data processing stand clear and obvious
  • To accelerate development through intellisense, function overloading and powerful high-level methods
  • To implement strategies like load balancing based upon the per node running operation count
  • To include robust error handling and recovery that does not mask application-level logic problems
  • To track the latest Cassandra releases and features without causing breaking changes
  • To define a long-lasting paradigm for those writing client code

Up and running in 5 minutes

To start working with Pelops and Cassandra, you need to know three things:

  1. How to create a connection pool, typically once at startup
  2. How to write data using the Mutator class
  3. How to read data using the Selector class.

It’s that easy!

Creating a connection pool

To work with a Cassandra cluster, you need to start off by defining a connection pool. This is typically done once in the startup code of your application. Sometimes you will define more than one connection pool. For example, in our project, we use two Cassandra database clusters, one which uses random partitioning for data storage, and one which uses order preserving partitioning for indexes. You can create as many connection pools as you need.

To create a pool, you need to specify a name, a list of known contact nodes (the library can automatically detect further nodes in the cluster, but see notes at the end), the network port that the nodes are listening on, and a policy which controls things like the number of connections in your pool.

Here a pool is created with default policies:

Pelops.addPool(
    "Main",
    new String[] { "cass1.database.com", "cass2.database.com", "cass3.database.com"},
    9160,
    new Policy());

Using a Mutator

The Mutator class is used to make mutations to a keyspace (which in SQL speak translates as making changes to a database). You ask Pelops for a new mutator, and then specify the mutations you wish to make. These are sent to Cassandra in a single batch when you call its execute method.

To create a mutator, you must specify the name of the connection pool you will use and the name of the keyspace you wish to mutate. Note that the pool determines what database cluster you are talking to.

Mutator mutator = Pelops.createMutator("Main", "SupportTickets");

Once you have the mutator, you start specifying changes.

/**
 * Write multiple sub-column values to a super column...
 * @param rowKey                    The key of the row to modify
 * @param colFamily                 The name of the super column family to operate on
 * @param colName                   The name of the super column
 * @param subColumns                A list of the sub-columns to write
 */
mutator. writeSubColumns(
    userId,
    "L1Tickets",
    UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time
    mutator.newColumnList(
        mutator.newColumn("category", "videoPhone"),
        mutator.newColumn("reportType", "POOR_PICTURE"),
        mutator.newColumn("createdDate", NumberHelper.toBytes(System.currentTimeMillis())),
        mutator.newColumn("capture", jpegBytes),
        mutator.newColumn("comment") ));

/**
 * Delete a list of columns or super columns...
 * @param rowKey                    The key of the row to modify
 * @param colFamily                 The name of the column family to operate on
 * @param colNames                  The column and/or super column names to delete
 */
mutator.deleteColumns(
    userId,
    "L1Tickets",
    resolvedList);

After specifying the changes, you send them to Cassandra in a single batch by calling execute. This takes the Cassandra consistency level as a parameter.

mutator.execute(ConsistencyLevel.ONE);

Note that if you need to know a particular mutation operation has completed successfully before initiating some subsequent operation, then you should not batch your mutations together. Since you cannot re-use a mutator after it has been executed, you should create two or more mutators, and execute them with at least a QUORUM consistency level.

Browse the Mutator class to see the methods and overloads that are available
here

Using a Selector

The Selector class is used to read data from a keyspace. You ask Pelops for a new selector, and then read data by calling its methods.

Selector selector = Pelops.createSelector("Main", "SupportTickets");

Once you have a selector instance, you can start reading data using its many overloads.

/**
 * Retrieve a super column from a row...
 * @param rowKey                        The key of the row
 * @param columnFamily                  The name of the column family containing the super column
 * @param superColName                  The name of the super column to retrieve
 * @param cLevel                        The Cassandra consistency level with which to perform the operation
 * @return                              The requested SuperColumn
 */
SuperColumn ticket = selector.getSuperColumnFromRow(
    userId,
    "L1Tickets",
    ticketId,
    ConsistencyLevel.ONE);

assert ticketId.equals(ticket.name)

// enumerate sub-columns
for (Column data : ticket.columns) {
    String name = data.name;
    byte[] value = data.value;
}

/**
 * Retrieve super columns from a row
 * @param rowKey                        The key of the row
 * @param columnFamily                  The name of the column family containing the super columns
 * @param colPredicate                  The super column selector predicate
 * @param cLevel                        The Cassandra consistency level with which to perform the operation
 * @return                              A list of matching columns
 */
List<SuperColumn> allTickets = selector.getSuperColumnsFromRow(
    userId,
    "L1Tickets",
    Selector.newColumnsPredicateAll(true, 10000),
    ConsistencyLevel.ONE);

/**
 * Retrieve super columns from a set of rows.
 * @param rowKeys                        The keys of the rows
 * @param columnFamily                   The name of the column family containing the super columns
 * @param colPredicate                   The super column selector predicate
 * @param cLevel                         The Cassandra consistency level with which to perform the operation
 * @return                               A map from row keys to the matching lists of super columns
 */
Map<String, List<SuperColumn>> allTicketsForFriends = selector.getSuperColumnsFromRows(
    Arrays.asList(new String[] { "matt", "james", "dom" }, // the friends
    "L1Tickets",
    Selector.newColumnsPredicateAll(true, 10000),
    ConsistencyLevel.ONE);

/**
 * Retrieve a page of super columns composed from a segment of the sequence of super columns in a row.
 * @param rowKey                        The key of the row
 * @param columnFamily                  The name of the column family containing the super columns
 * @param startBeyondName               The sequence of super columns must begin with the smallest super column name greater than this value. Pass null to start at the beginning of the sequence.
 * @param orderType                     The scheme used to determine how the column names are ordered
 * @param reversed                      Whether the scan should proceed in descending super column name order
 * @param count                         The maximum number of super columns that can be retrieved by the scan
 * @param cLevel                        The Cassandra consistency level with which to perform the operation
 * @return                              A page of super columns
 */
List<SuperColumn> pageTickets = getPageOfSuperColumnsFromRow(
    userId,
    "L1Tickets",
    lastIdOfPrevPage, // null for first page
    Selector.OrderType.TimeUUIDType, // ordering defined in this super column family
    true, // blog order
    10, // count shown per page
    ConsistencyLevel.ONE);

There are a huge number of selector methods and overloads which expose the full power of Cassandra, and others like the paginator methods that make otherwise complex tasks simple. Browse the Selector class to see what is available here

Other stuff

All the main things you need to start using Pelops have been covered, and with your current knowledge you can easily feel your way around Pelops inside your IDE using intellisense. Some final points it will be useful to keep in mind if you want to work with Pelops:

  • If you need to perform deletions at the row key level, use an instance of the KeyDeletor class (call Pelops.createKeyDeletor).
  • If you need metrics from a Cassandra cluster, use an instance of the Metrics class (call Pelops.createMetrics).
  • To work with Time UUIDs, which are globally unique identifiers that can be sorted by time – which you will find to be very useful throughout your Cassandra code – use the UuidHelper class.
  • To work with numbers stored as binary values, use the NumberHelper class.
  • To work with strings stored as binary values, use the StringHelper class.
  • Methods in the Pelops library that cause interaction with Cassandra throw the standard
    Cassandra exceptions defined here.

The Pelops design secret

One of the key design decisions that at the time of writing distinguishes Pelops, is that the data processing code written by developers does not involve connection pooling or management. Instead, classes like Mutator and Selector borrow connections to Cassandra from a Pelops pool for just the periods that they need to read and write to the underlying Thrift API. This has two advantages.

Firstly, obviously, code becomes cleaner and developers are freed from connection management concerns. But also more subtly this enables the Pelops library to completely manage connection pooling itself, and for example keep track of how many outstanding operations are currently running against each cluster node.

This for example, enables Pelops to perform more effective client load balancing by ensuring that new operations are performed against the node to which it currently has the least outstanding operations running. Because of this architectural choice, it will even be possible to offer strategies in the future where for example nodes are actually queried to determine their load.

To see how the library abstracts connection pooling away from the semantics of data processing, take a look at the execute method of Mutator and the tryOperation method of Operand. This is the foundation upon which Pelops greatly improves over existing libraries that have modelled connection management on pre-existing SQL database client libraries.

*–

That’s all. I hope you get the same benefits from Pelops that we did.

Written by dominicwilliams

June 11, 2010 at 12:31 pm

52 Responses

Subscribe to comments with RSS.

  1. nice work, i’ m trying now on a test env for big project; do you use maven for you builds?

    i can provide the pom and project structure moved to maven if you need.

    have a nice day

    Enrico

    June 15, 2010 at 9:41 am

    • Hi we are copying Pelops directly out of our main project tree into a separate folder, then uploading to Googlecode. We are creating the jar from inside Eclipse using Export…->Java->JAR file.

      Would be great to have a maven build other people could use. Please send to dwilliams at fightmymonster.com

      dominicwilliams

      June 15, 2010 at 1:00 pm

      • I’ve forked pelops and mavenized it over on github:
        http://github.com/danwashusen/pelops

        Dominic, let me know if you want to pull it back into the main Google Code repo.

        Cheers,
        Dan

        Dan

        June 27, 2010 at 4:29 am

      • Check out the issues in google code. I attached a build file that uploads dependencies that aren’t found in maven automatically.

        Todd Nine

        June 30, 2010 at 11:15 pm

  2. Great blog post! I’ve adopted pelops for my Datanucleus plugin. I’ll be working with Pedro to update the plugin on github

    Todd Nine

    June 16, 2010 at 1:11 am

  3. I really like your simple API. It is much better than the generated thrift one.

    I’m confused however by your KeyDeletor. It has two main methods: deleteRow(key,cl) and deleteColumnFamily(key,cf,cl). What does the first one do? Remove the key from all column families? And how do I remove multiple keys/rows in a batch?

    Hugo

    June 17, 2010 at 2:54 pm

    • UPDATE:

      Hugo picks up on an important point here in relation to the API of KeyDeletor, which has now been changed (thanks Hugo). Our original discussion has been removed to avoid confusing people new to Pelops.

      All you need to know now is that the KeyDeletor class has a function called deleteRow, which deletes a row from a specified column family.

      Additional note:-

      In the accepted Cassandra model, each keyspace (the nearest thing NoSQL has to a SQL database) contains any number of column families (the nearest thing NoSQL has to a table) each of which may contain any number of rows (which unlike SQL rows can contain unlimited columns, which can contain sub-columns if they are super columns, but you get the idea). Once you understand this model, its good practice to keep in mind that the row key determines which Cassandra cluster nodes store the copies of your rows, and that rows from different column families are actually stored on the same nodes if they they share the same key.

      The foregoing is the reason why methods in the Pelops API list the rowKey parameter before, for example, the columnFamily parameter. In principle, this means that batch read and write operations that work across multiple column families should be more efficient where a shared key is involved. In practice, because of current limitations with Cassandra’s Thrift API, which Pelops uses, only the Mutator class is able to offer this benefit: so for example, if you use Mutator to specify multiple mutations to rows in different column families which share a row key (e.g. think a user id or some other top-level identifier in your application) and then execute the mutation operation with ConsistencyLevel.ONE, internally Cassandra only needs to contact one cluster node for the operation to “succeed”. I believe in the future the Thrift API will hopefully be updated to support batch read, and therefore the same efficiency can be had there, at which point we will update Selector.

      dominicwilliams

      June 17, 2010 at 4:38 pm

  4. WLS SAS Catalog or Command Files and Bit Arrays…

    I found your entry interesting thus I’ve added a Trackback to it on my weblog :)…

  5. Do you have any examples using this client? I am having problems trying to use the mutator to insert records. thanks.

    Chris

    June 21, 2010 at 7:45 pm

  6. I really like the simple API and the documentation on methods is great.
    I am having the same problem as Chris, getting the methods up and running, if you have some complete code files that exercise the methods that would be great.

    Gavan

    June 26, 2010 at 1:21 pm

  7. I like the API, and the pooling looks solid. However, If my org wants to use a load balancer, then I’d assume my Pelops pool has only one host. Is that reasonable to do with Pelops if I set my Policy.killNodeConnsOnException to false?

    On another note; Would it also make sense to put the pool API methods behind an interface and to use a factory in case I want to do my own pooling?

    thanks,
    –Michael

    Michael Moores

    June 30, 2010 at 7:25 pm

    • Hi Michael, I’ve not thought about using a load balancer (and haven’t heard if it is possible) but in principle yes, both you would proceed as described.

      The pool functionality could be put behind an interface at some point, but for the moment there are some optimizations we’ll probably have to prioritize first. But again in principle, yes.

      dominicwilliams

      June 30, 2010 at 11:26 pm

  8. After playing around a while I was able to figure out how to insert records using the mutator. Part of my problem wasn’t understanding the Cassandra data model. Here is the code that worked for me:

    
    		Mutator mutator = Pelops.createMutator(CassandraConnection.getInstance().getPoolName(), 
    				CassandraConnection.getInstance().getKeyspace());
    		
    		List cols = new ArrayList();
    		Set<Entry> entrySet = fieldMap.entrySet();
    		for(Entry entry : entrySet){
    			String colName = entry.getKey();
    			String colValue = entry.getValue();
    			Column col = mutator.newColumn(colName, colValue);
    			cols.add(col);
    		}
    		
    		Column[] columns = (Column[])cols.toArray(new Column[cols.size()]);
    		
    		//String epoch = String.valueOf(new Date().getTime());
    		String id = devId +"."+ sensorType;
    
    		mutator.writeSubColumns(id, COL_FAMILY, UuidHelper
    				.newTimeUuidBytes(), // using a UUID value that sorts by time
    				mutator.newColumnList(columns));
    		
    		log.debug("set "+ CassandraConnection.getInstance().getKeyspace() +"."+ COL_FAMILY +"['"+ id +"']");
    
    		mutator.execute(ConsistencyLevel.ONE);
    
    

    Chris

    July 1, 2010 at 5:14 pm

  9. Pelops look simple API, very easy to understand. Does this library support load balancing and failover like in Hector?

    Sam

    July 6, 2010 at 11:12 pm

    • Yes we do. One of the reasons for this project was to implement a better load balancing and failover system

      dominicwilliams

      July 7, 2010 at 9:40 am

  10. Hi, we are currently try your client for relatively big project, but we are concern about authentication and “known” hosts set up ability. From what I seen from source codes, I didn’t saw where I could set up this stuff although I see how I could do this by extending Selector and Mutator classes. But maybe I just missed point where you already implemented this.

    Thanks.

    Andriy Kopachevsky

    July 7, 2010 at 8:57 am

  11. Have you thought about setting up an email discussion list for your project? I have switched from Hector to Pelops and would like to have an easy mechanism for discussing issues/ideas.

    Also, is the codebase and issue tracking staying at google code, or as Dan mentions above, going to github?

    thx for the good work

    btoddb

    July 8, 2010 at 10:19 pm

    • Hi, I’m currently project leader for Pelops (and also Cages) and am on holiday, which is why things are going a bit slowly.

      I’m ok for all these things, and will also happily review any code that is submitted and commit too.

      The general situation of the Fight My Monster team is that we are on a real tight schedule, which will prevent us looking at some options.

      We are using Subversion inside Eclipse, and have quite a few SVN repositories. My plan on return is to try and get up to speed with Maven, because it is becoming a kind of de facto Apache standard, and bring that into Googlecode.

      I’ve heard great things about git but it would be quite a big move to make at the moment.

      dominicwilliams

      July 12, 2010 at 12:32 pm

  12. Is there any plan to pull back the maven stuff from github ?

    Norman Maurer

    July 10, 2010 at 4:44 pm

    • Hi Normam, yes there is. Please checkout my above comment for more complete explanation. Many thanks.

      dominicwilliams

      July 12, 2010 at 12:33 pm

  13. Pelops looks great! Any plans for switching from Thrift to Avro? Also it would be interesting to see how it compares to Hector, performance-wise.

    Alex

    July 14, 2010 at 12:51 pm

  14. I am testing cassandra using pelops, Is there any option in pelops to list all rows with selected columns in column family?

    Sam

    July 22, 2010 at 4:09 pm

    • Pelops wraps the Cassandra thrift API so you should be able to do anything that the thrift API can do.

      http://wiki.apache.org/cassandra/API

      I believe the Pelops method you are looking for is:

      selector.getColumnsFromRows(keyRange, columnFamily, colPredicate, cLevel);

      Use a keyRange with “” for both start_key and end_key and count >= number of rows. In practice with more than a few thousand rows you will need to paginate.

      Matt

      July 22, 2010 at 7:19 pm

      • Hi Matt,
        Thanks for the reply. This method works for me.

        I am trying to delete the row using KeyDeletor as suggested in the above “other stuff”

        it’s deleting all the columns in that particular row, but I can see row without any columns after I delete row. How can I delete the row entry also?

        Sam

        July 22, 2010 at 8:18 pm

  15. It looks like if I deleted a Row and try to insert a new Row with the same Key I can’t. Is this a known issue? Am I doing something wrong?

    Right now I’m using keyDeletor.deleteRow. Not sure if I’m supposed to use mutator.deleteColumns.

    Jose

    July 23, 2010 at 11:55 pm

    • I don’t see any issue with keyDeletor.deleteRow, it’s working good for me. Only issue I see is, it’s deleting all the columns for that row, but it’s not deleting rowID/rowKey. Check how you inserting it again after you delete it, are you using any exist clause.

      Sam

      July 26, 2010 at 7:49 pm

      • I’ve experienced Sam’s problem with keyDeletor.deleteRow, clearing the columns but leaving the rowKey.
        It makes rows undeletable, it’s planned to be solved in future releases? or exists som kind of pelops-work-around to delete the entire row?

        Thanks

        Jimmy

        August 4, 2010 at 9:31 am

      • the only Thrift method that throws NotFoundException on a read is “get”. all other read methods return the row (key), but an empty column list, regardless of tombstones (markers left to mark column deletions until Cassandra does cleanup). so it depends on how pelops is retrieving the data. don’t have code handy, so i leave it to the reader to determine.

        btoddb

        August 5, 2010 at 5:07 am

  16. If you turn on dynamicNodeDiscovery and specify a list of canonical host names, then your connection pool will double in number of nodes because the dynamic discovery uses IP address.

    I just added node level metrics to ThriftPool, tracking the connection cache/load, connections created, exceptions, etc. I can see the poolRefiller is doubling the size of the pool.

    Michael Moores

    July 27, 2010 at 9:19 pm

  17. TIME_WAIT sockets, when I run stress testing using Pelops client, I can see lot of TIME_WAIT sockts, client failing in a minutes because of no free sockets avilable, it’s not happening with hector. Is there any fix for it?

    Sam

    July 29, 2010 at 9:56 pm

  18. Thanks for sharing. Was of good help.

    Parag Arora

    August 4, 2010 at 7:43 am

  19. I was thinking about using pelops, because I heard many good things about it. And it’s ease of use looked promising. I tried my first project, and received thrift connection pool errors..
    When I checked the Issues on the project page I saw that other people had the same errors and even provided patches to solve them months ago. But nothing happened on your side..

    that’s sad.

    stefan

    August 6, 2010 at 8:07 am

  20. Did the latest checkin deprecated old classes? Pelops.addPool seems to be detracted.

    Vipul Sharma

    August 11, 2010 at 10:54 pm

    • Hi Vipul,
      Please make sure you looking at the new home of Pelops http://github.com/s7.
      Various changes have been made, and indeed the new 0.7 Pelops branch (for Cassandra 0.7) makes a number of breaking changes. Hopefully this will be the last major refactoring.
      If it is any consolation, my company had to make literally hundreds of changes to accommodate this version, but it has been worth it.

      dominicwilliams

      August 18, 2010 at 9:17 am

      • Do you have any examples using the new API? I am trying to convert things over but not sure how to set it up since creating a selector, mutator, and Pool is now different.

        Chris

        August 27, 2010 at 10:17 pm

  21. looks like this is the new home: http://github.com/s7/scale7-pelops

    It would be nice to have a google group or something too.

    Andres

    August 17, 2010 at 5:32 pm

  22. Hi is there a branch/tag of Pelops that supports the current stable release of Cassandra (0.6.4)? In the log all I see is reference to 0.6.3 and 0.7,

    David Erickson

    August 19, 2010 at 5:27 pm

    • Hi there are no breaking changes between those versions. Personally I would advise going to 0.7 though. Feedback I have received implies that it is in fact equally stable, and it contains dramatic feature changes over previous versions – so it’s better to start developing against that. The version of Pelops for 0.7 is better too (there are fixes we need to rollback to the earlier versions) and contains breaking interface changes made in light of the changes to Cassandra. If it is any consolation, we recently migrated and had to change hundreds of lines of code, but it has been worth it. Go for 0.7

      dominicwilliams

      August 19, 2010 at 10:37 pm

  23. Hi dominic. Do you by chance have any migration utilities you’ve created? I’m the author of this plugin.

    http://github.com/tnine/Datanucleus-Cassandra-Plugin

    I use Pelops for this as well as for our own code for dealing with Satellite data. We have a ton of refactoring, so any tools that could help would be awesome.

    Thanks,
    Todd

    Todd Nine

    August 20, 2010 at 3:29 am

  24. […] looked really cool with its Mutators and Selectors, but it too dealt with columns – see the description.  What I was looking for was an object-oriented way to load and query Java […]

  25. hi, Dominic

    Pelops.addPool(
    “Main”,
    new String[] { “cass1.database.com”, “cass2.database.com”, “cass3.database.com”},
    9160,
    new Policy());

    when the Pelops client get a list of IP address of available cassandra nodes and try to talk to one of these nodes. If this node is suddenly not available, can Pelops detect the failure of the node and try to connect to the next available node?
    In my test, I notice that Pelops tries to contact to all the nodes, if one of them is unavailable, connection refused Exception is thrown. I wonder how can I make Pelops skip the unavailable node and try the next node in the list.

    Liyu

    October 16, 2010 at 1:54 am

    • Hi there are some limitations and points worth covering here. I’m assuming you’re using trunk…

      Firstly if you look at the Cluster class, you’ll notice its refreshNodesSnapshot() function. When this is called, it queries the cluster to find the current list of nodes that key ranges have been assigned to. This list can be queried using the getCurrentNodesSnapshot() function. If you permanently remove a node from the cluster i.e. so that no keys are assigned to it, then when you (or more likely Pelops!) calls refreshNodesSnapshot the list will be appropriately updated.

      Pelops has a Runnable object called clusterWatcher, which periodically calls cluster.refreshNodesSnapshot() and then for each node in the resultant list calls its own function touchNodeContext(node).

      You can see that if a new node is added to the cluster, then Pelops will create a cache context for it as you would hope. However, at the moment it is not removing cache contexts for nodes that have been removed. This is something that needs to be addressed.

      However, from the point of view of code written against Pelops, Pelops should be able to detect when Thrift connections to a node are broken and not use them in operations, and in such cases where it does use a broken connection, or the operation fails because of some other cluster configuration issue, failover internally to repeat the operation with another connection from its cache that is hopefully not broken. So… in theory this is not something you should need to worry about.

      That all having been said, I think that we need to update Pelops so that it deletes the caches for removed nodes. I will put this on the todo list.

      Bes,t Dominic

      dominicwilliams

      October 18, 2010 at 10:46 am

    • Btw. I should mention, there is a Google Group for Pelops now. You might like to join that as is a better place for such tech discussion.

      dominicwilliams

      October 18, 2010 at 10:47 am

  26. Hi!

    great work with cages and pelops, i’m using them in a project i’m going to release soon (graphandra!).

    At the moment i’m using lucandra to index my data but i’ve noticed you have cassyindex on your github repo. I see you don’t use lucandra although in the pelops demo you mention it.

    Could you share why you left lucandra and what’s better for you in cassyndex?

    Thanks!

    claudio martella

    November 1, 2010 at 7:15 pm

    • Originally we tried to use Lucandra to index a database of school addresses (about 300,000 of them). The database needed to be quick to serve predictive search text boxes i.e. where the user types in the terms and suggestions are produced “instantaneously” below, which they can select. Lucandra can’t deal with common terms very well at all. So for example, if you wanted to search on a term like “school” or “college” its performance would collapse. You can setup block words with Lucene of course, but in fact it was important for us that users could type “school” or “college” as distinguishing terms.

      I originally wrote Cassyndex to serve this database, but it could also be used for more general text indexing. It does support block words, and we also index combinations of words which may include block words like “Hogwarts College” or “The Mill”. We need to add some basic caching to maximize performance and scalability, but it already works well enough for us so that is on the back burner for a bit.

      In addition to full text indexing, you can also use Cassyndex for things like case-insensitive indexes, which are important for username searches on large sites.

      dominicwilliams

      November 2, 2010 at 4:06 pm

  27. .
    It is a pleasure to work with Pelops. Very developer-friendly. Great creation!
    .
    For folks looking a simple complete program that has been tested with Cassandra 0.7.0 (released on 2011-01-09), here it is –
    .
    ………………………………………………
    package com.sansthal.pelops;

    import java.util.List;

    import org.apache.cassandra.thrift.Column;
    import org.apache.cassandra.thrift.ConsistencyLevel;
    import org.apache.log4j.Logger;
    import org.scale7.cassandra.pelops.Cluster;
    import org.scale7.cassandra.pelops.Mutator;
    import org.scale7.cassandra.pelops.Pelops;
    import org.scale7.cassandra.pelops.Selector;

    /**
    * @author Naga Vijayapuram
    */
    public class MyPelops {

    /**
    * The Main Method
    * @param args
    */
    public static void main(String[] args) throws Exception {

    final Logger log = Logger.getLogger(MyPelops.class);

    //————————————————————-
    //– Nodes, Pool, Keyspace, Column Family ———————
    //————————————————————-

    // A comma separated List of Nodes
    String NODES = “localhost”;

    // Thrift Connection Pool
    String THRIFT_CONNECTION_POOL = “ThriftConnectionPool1”;

    // Keyspace
    String KEYSPACE = “ks1simple”;

    // Column Family
    String COLUMN_FAMILY = “CF_Std_BytesType”;

    //————————————————————-
    //– Cluster ————————————————–
    //————————————————————-

    Cluster cluster = new Cluster(NODES, 9160);

    Pelops.addPool(THRIFT_CONNECTION_POOL, cluster, KEYSPACE);

    //————————————————————-
    //– Mutator ————————————————–
    //————————————————————-

    Mutator mutator = Pelops.createMutator(THRIFT_CONNECTION_POOL);

    log.info(“- Write Column -“);
    mutator.writeColumn(COLUMN_FAMILY, “Row1″,
    new Column().setName(” Name “.getBytes()).
    setValue(” Naga Vijayapuram “.getBytes()));
    mutator.writeColumn(COLUMN_FAMILY, “Row1″,
    new Column().setName(” Work “.getBytes()).
    setValue(” Developer, Consultant “.getBytes()));
    mutator.writeColumn(COLUMN_FAMILY, “Row1″,
    new Column().setName(” Skills “.getBytes()).
    setValue(” Cassandra, Hector, Pelops “.getBytes()));
    mutator.writeColumn(COLUMN_FAMILY, “Row1″,
    new Column().setName(” Company “.getBytes()).
    setValue(” http://www.sansthal-us.com/ “.getBytes()));

    log.info(“- Execute -“);
    mutator.execute(ConsistencyLevel.ONE);

    //————————————————————-
    //– Selector ————————————————-
    //————————————————————-

    Selector selector = Pelops.createSelector(THRIFT_CONNECTION_POOL);

    int columnCount = selector.getColumnCount(COLUMN_FAMILY, “Row1”, ConsistencyLevel.ONE);
    log.info(“- Column Count = ” + columnCount);

    List columnList = selector.getColumnsFromRow(
    COLUMN_FAMILY, “Row1”,
    Selector.newColumnsPredicateAll(true, 10), ConsistencyLevel.ONE);
    log.info(“- Size of Column List = ” + columnList.size());

    for (Column column : columnList) {
    log.info(“- Column: (” + new String(column.getName()) + “,” +
    new String(column.getValue()) + “)”);
    }

    log.info(“- All Done. Exit -“);
    System.exit(0);
    }

    }
    ………………………………………………
    .
    Enjoy!
    .

    Naga Vijayapuram

    January 16, 2011 at 4:50 am

  28. Hi, i’ve some trouble with Pelops.shutdown();
    In fact when i use this method at the end of the program, it blocks the program and doesn’t end.

    Have you some idea that can resolve this problem ?

    Thanks!

    dacanalr

    April 22, 2011 at 2:10 pm

    • Hi, I’ve not heard of that one. Please make sure you are using the 0.7.x branch if you’re on 0.7 (production). The Master branch snapshot is tracking 0.8 which is still in beta. Dan beat me to the other bit of advice – there is a Google Groups mailing list where you can get support.

      dominicwilliams

      April 25, 2011 at 10:11 pm

      • Thanks, now it work !

        dacanalr

        April 27, 2011 at 9:47 am

  29. @dacanalr: What does the output of jstack say?

    p.s. There’s a mailing list for Pelops that might be more helpful…

    Dan

    April 25, 2011 at 10:07 pm


Leave a comment