<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: HBase vs Cassandra: why we moved</title>
	<atom:link href="http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/feed/" rel="self" type="application/rss+xml" />
	<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/</link>
	<description>Occasionally useful posts about RIAs, Web scale computing &#38; miscellanea</description>
	<lastBuildDate>Wed, 16 May 2012 19:38:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: jbellis</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-455</link>
		<dc:creator><![CDATA[jbellis]]></dc:creator>
		<pubDate>Wed, 16 May 2012 19:38:25 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-455</guid>
		<description><![CDATA[That&#039;s a pretty good list.  I&#039;d add a few:

* Cassandra supports 1000s of ColumnFamilies in a cluster; HBase supports a handful (HBase book says &quot;two or three&quot; but let&#039;s be generous)
* Cassandra&#039;s p2p design gives you &quot;read slaves&quot; for free; HBase reads and writes all go through a single regionserver per region
* Cassandra offers built-in caching; HBase installations tend to rely heavily on memcached
* Cassandra is far, far ahead of HBase in dealing with compaction; this includes parallel (multiple threads per compaction) and concurrent (multiple simultaneous compactions per server), throttling to avoid latency spikes, &quot;leveled&quot; compaction to optimize for reads, and the ability to remove deleted or expired data during minor compactions as well as major
* Cassandra&#039;s p2p design provides full availability, as well as the multi-datacenter capabilities you mentioned.  HBase relies on the HDFS namenode SPOF as well as regionserver mini-SPOFs
* Multi-DC can also solve the batch/interactive conflict you mention; best practice for Cassandra is to create a &quot;virtual DC&quot; for your Hadoop jobs and let Cassandra replicate between that and your interactive &quot;DC,&quot; bidirectionally
* Cassandra manages its own local storage per node, which allows you to (for instance) dedicate a spindle to your commitlog, or pin certain columnfamilies to SSDs]]></description>
		<content:encoded><![CDATA[<p>That&#8217;s a pretty good list.  I&#8217;d add a few:</p>
<p>* Cassandra supports 1000s of ColumnFamilies in a cluster; HBase supports a handful (HBase book says &#8220;two or three&#8221; but let&#8217;s be generous)<br />
* Cassandra&#8217;s p2p design gives you &#8220;read slaves&#8221; for free; HBase reads and writes all go through a single regionserver per region<br />
* Cassandra offers built-in caching; HBase installations tend to rely heavily on memcached<br />
* Cassandra is far, far ahead of HBase in dealing with compaction; this includes parallel (multiple threads per compaction) and concurrent (multiple simultaneous compactions per server), throttling to avoid latency spikes, &#8220;leveled&#8221; compaction to optimize for reads, and the ability to remove deleted or expired data during minor compactions as well as major<br />
* Cassandra&#8217;s p2p design provides full availability, as well as the multi-datacenter capabilities you mentioned.  HBase relies on the HDFS namenode SPOF as well as regionserver mini-SPOFs<br />
* Multi-DC can also solve the batch/interactive conflict you mention; best practice for Cassandra is to create a &#8220;virtual DC&#8221; for your Hadoop jobs and let Cassandra replicate between that and your interactive &#8220;DC,&#8221; bidirectionally<br />
* Cassandra manages its own local storage per node, which allows you to (for instance) dedicate a spindle to your commitlog, or pin certain columnfamilies to SSDs</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Baclace</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-454</link>
		<dc:creator><![CDATA[Paul Baclace]]></dc:creator>
		<pubDate>Sun, 13 May 2012 01:30:41 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-454</guid>
		<description><![CDATA[I am revisiting HBase vs. Cassandra for a client.  HBase is having it&#039;s first summit on May 22, 2012, so expect some announcements there.

Here are some notes that update discussion as of May 12 2012: 
*  Composite keys and ordering of records is now standard (just like HBase)
** this enables efficient parallel processing at the Mapper input stage for map-reduce.  Although it would not be as fast as map-reduce directly over hdfs, it would be similar to a Mapper stage reading from HBase RegionServers
* super columns are deprecated (good thing)
* atomic counters were added
* Both have TTL, time to live metadata, which means self-cleaning (yeah).
* When FB switched from Cassandra to HBase, they said the change was driven by Cassandra team members leaving (perhaps going to DataStax) and also because eventual consistency was a problem in Cassandra.  
* So far, Cassandra is easier to administer since they have a funded support company (DataStax) that gives out plenty of free advice.
** HBase might catch up eventually; big-table technology is proven to work fine for Google.
* Michael Stack has often said HBase is good for TB sized data and larger.
** For me, setting up HBase for 100GB on 4 nodes certainly seems harder than necessary. 
** A P2P installation mechanism for HBase is not yet available (I continue to nudge).
* Any distributed store can have performance killing hotspots (all activity on one or 2 nodes) if keys are not chosen wisely.
* Under a heavy write and insert load ANY storage mechanism (distributed or not) can get behind on compacting/splitting/re-org.
** This is a matter of admin discipline, there must be headroom for periodic cleanup/reorg (when Scotty says the warp drives cannot go faster without later down time, he means it.)
** The safer approach is to separate batch and interactive operations.
* Both have commit logs and can detect/repair disk errors (over-write, wrong sector written, etc.)
* Cassandra deployments split across data centers has actually been attempted by FB, but I have not seen any experience reports on how well that works.
* Cassandra only supports the Thrift interface which does not have a way to stream large objects.]]></description>
		<content:encoded><![CDATA[<p>I am revisiting HBase vs. Cassandra for a client.  HBase is having it&#8217;s first summit on May 22, 2012, so expect some announcements there.</p>
<p>Here are some notes that update discussion as of May 12 2012:<br />
*  Composite keys and ordering of records is now standard (just like HBase)<br />
** this enables efficient parallel processing at the Mapper input stage for map-reduce.  Although it would not be as fast as map-reduce directly over hdfs, it would be similar to a Mapper stage reading from HBase RegionServers<br />
* super columns are deprecated (good thing)<br />
* atomic counters were added<br />
* Both have TTL, time to live metadata, which means self-cleaning (yeah).<br />
* When FB switched from Cassandra to HBase, they said the change was driven by Cassandra team members leaving (perhaps going to DataStax) and also because eventual consistency was a problem in Cassandra.<br />
* So far, Cassandra is easier to administer since they have a funded support company (DataStax) that gives out plenty of free advice.<br />
** HBase might catch up eventually; big-table technology is proven to work fine for Google.<br />
* Michael Stack has often said HBase is good for TB sized data and larger.<br />
** For me, setting up HBase for 100GB on 4 nodes certainly seems harder than necessary.<br />
** A P2P installation mechanism for HBase is not yet available (I continue to nudge).<br />
* Any distributed store can have performance killing hotspots (all activity on one or 2 nodes) if keys are not chosen wisely.<br />
* Under a heavy write and insert load ANY storage mechanism (distributed or not) can get behind on compacting/splitting/re-org.<br />
** This is a matter of admin discipline, there must be headroom for periodic cleanup/reorg (when Scotty says the warp drives cannot go faster without later down time, he means it.)<br />
** The safer approach is to separate batch and interactive operations.<br />
* Both have commit logs and can detect/repair disk errors (over-write, wrong sector written, etc.)<br />
* Cassandra deployments split across data centers has actually been attempted by FB, but I have not seen any experience reports on how well that works.<br />
* Cassandra only supports the Thrift interface which does not have a way to stream large objects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ladislav Urban</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-453</link>
		<dc:creator><![CDATA[Ladislav Urban]]></dc:creator>
		<pubDate>Thu, 10 May 2012 17:20:10 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-453</guid>
		<description><![CDATA[I have read in this article that it is difficult to install HBase. I think it is no longer true. There is an open source installer with Hadoop, HBase, Zookeeper and Flume at https://sourceforge.net/projects/syoncloud/]]></description>
		<content:encoded><![CDATA[<p>I have read in this article that it is difficult to install HBase. I think it is no longer true. There is an open source installer with Hadoop, HBase, Zookeeper and Flume at <a href="https://sourceforge.net/projects/syoncloud/" rel="nofollow">https://sourceforge.net/projects/syoncloud/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Setup Hadoop and Hbase Environment &#124; Mandal&#039;s Musings</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-451</link>
		<dc:creator><![CDATA[Setup Hadoop and Hbase Environment &#124; Mandal&#039;s Musings]]></dc:creator>
		<pubDate>Thu, 19 Apr 2012 06:24:17 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-451</guid>
		<description><![CDATA[[...] &amp; NoSQL Comparison : http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/ http://natishalom.typepad.com/nati_shaloms_blog/hbase/ [...]]]></description>
		<content:encoded><![CDATA[<p>[...] &amp; NoSQL Comparison : <a href="http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis" rel="nofollow">http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis</a> <a href="http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/" rel="nofollow">http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/</a> <a href="http://natishalom.typepad.com/nati_shaloms_blog/hbase/" rel="nofollow">http://natishalom.typepad.com/nati_shaloms_blog/hbase/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dominicwilliams</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-447</link>
		<dc:creator><![CDATA[dominicwilliams]]></dc:creator>
		<pubDate>Wed, 04 Apr 2012 10:28:19 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-447</guid>
		<description><![CDATA[Many of these choices come down to idiosyncrasies of individual teams, situations and engineers at the time decisions were made, but here are some thoughts: (i) at the time the decision was made Cassandra was a much less mature database that HBase, a situation that has since changed, and (ii) Facebook has hundreds of engineers, and the additional administrative overhead of running a large HBase + HDFS installation won&#039;t have looked so bad, especially considering they were already expending resources to manage their sharded MySQL setup, which is the ultimate design/coding and admin time eater.]]></description>
		<content:encoded><![CDATA[<p>Many of these choices come down to idiosyncrasies of individual teams, situations and engineers at the time decisions were made, but here are some thoughts: (i) at the time the decision was made Cassandra was a much less mature database that HBase, a situation that has since changed, and (ii) Facebook has hundreds of engineers, and the additional administrative overhead of running a large HBase + HDFS installation won&#8217;t have looked so bad, especially considering they were already expending resources to manage their sharded MySQL setup, which is the ultimate design/coding and admin time eater.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: viji (@vijiconnect)</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-445</link>
		<dc:creator><![CDATA[viji (@vijiconnect)]]></dc:creator>
		<pubDate>Wed, 04 Apr 2012 04:38:03 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-445</guid>
		<description><![CDATA[one of the best article on HBase and Cassandra..  (any clue why Facebook using HBase over Cassandra..)]]></description>
		<content:encoded><![CDATA[<p>one of the best article on HBase and Cassandra..  (any clue why Facebook using HBase over Cassandra..)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mohammad Habbab</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-417</link>
		<dc:creator><![CDATA[Mohammad Habbab]]></dc:creator>
		<pubDate>Mon, 26 Dec 2011 18:57:46 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-417</guid>
		<description><![CDATA[thanx for the post ! although i&#039;m all new to all the distributed processing and NoSQL thing, i find your comparison very informative and close to a new-in-the-field developer.]]></description>
		<content:encoded><![CDATA[<p>thanx for the post ! although i&#8217;m all new to all the distributed processing and NoSQL thing, i find your comparison very informative and close to a new-in-the-field developer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Oliver</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-403</link>
		<dc:creator><![CDATA[Joseph Oliver]]></dc:creator>
		<pubDate>Fri, 07 Oct 2011 22:01:33 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-403</guid>
		<description><![CDATA[Brilliant article! Clearly differentiates the differences between Hbase and Cassandra....by the way at Ooyala we use a combination of Hadoop and Cassandra for dynamic realtime user analytics]]></description>
		<content:encoded><![CDATA[<p>Brilliant article! Clearly differentiates the differences between Hbase and Cassandra&#8230;.by the way at Ooyala we use a combination of Hadoop and Cassandra for dynamic realtime user analytics</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stefanorodighiero.net &#187; links for 2011-09-17</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-399</link>
		<dc:creator><![CDATA[stefanorodighiero.net &#187; links for 2011-09-17]]></dc:creator>
		<pubDate>Sat, 17 Sep 2011 15:05:49 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-399</guid>
		<description><![CDATA[[...] HBase vs Cassandra: why we moved « Dominic Williams (tags: cassandra hbase nosql bigdata) [...]]]></description>
		<content:encoded><![CDATA[<p>[...] HBase vs Cassandra: why we moved « Dominic Williams (tags: cassandra hbase nosql bigdata) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dominicwilliams</title>
		<link>http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/#comment-388</link>
		<dc:creator><![CDATA[dominicwilliams]]></dc:creator>
		<pubDate>Wed, 17 Aug 2011 09:01:59 +0000</pubDate>
		<guid isPermaLink="false">http://ria101.wordpress.com/?p=70#comment-388</guid>
		<description><![CDATA[Hi Bruno checkout the documentation from the Datastax.com website, and also the presentations from the last Cassandra conference in San Francisco.]]></description>
		<content:encoded><![CDATA[<p>Hi Bruno checkout the documentation from the Datastax.com website, and also the presentations from the last Cassandra conference in San Francisco.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
