I Techet: 2014

Fundamentals

Cassandra is a distributed nosql database without a master node (but there are seed nodes)
Seed nodes are only used to bootstrapping new nodes, i.e., they provide the source of gossip so the cluster does not have partition of information (where nodes form their own mini clusters and information becomes inconsistent)
Each node sends gossip message every second to 3 other nodes to tell them what it knows about the cluster.
New nodes need to contact seed nodes to learn about the cluster. It is crucial that all nodes have the same set of seed nodes configured in cassandra.yaml.
Dead nodes are marked down and other nodes will store "hinted handoff" for 3 hours by default.
Replication factor and consistency levels (W + R > replication factor => data consistency)
Cassandra uses three main protocols: gossip, thrift (called rpc) and binary (called native)
gossip is for inter-node communication only
thrift is used by CQLSH and clients to send commands to Cassandra (like initiating connections and authenticating users)
binary is used for transporting data

Configuring Cassandra Connection

To check the status of a node, use the following commands:

nodetool status // checks the status of the cluster

nodetool info // checks the status of the node

nodetool netstats // checks the network stats of the node

TODO: add some examples

rpc_host

This is the host name or IP address that CQLSH or client uses to connect to the node. Note in Cassandra 2.0.9 and before, the parameter is also the value in system.peers.rpc_address, which is used by client when doing auto node detection. Because this value must be resolved to a network interface recognised by Cassandra (so it can bind and listen to it), the value can't be a public IP address configured on a gateway. This has caused great trouble when we try to make a client connect via the public IP in AWS VPC.

How Datastax Java client find the nodes:

The Cluster.builder().addContactPoints() is used to add a list of Cassandra nodes that will be contacted when the client starts to figure out the cluster info. Each node is queried to get records from system.peers table, and for each record (together with the node itself, because it's not present in the peers able), the rpc_address is added to the list of Host(s), and Host(s) not in the records will be removed. Note the comparison is done with the record's rpc_address and the Host.address, which will not match if the Host.address is a public IP, and hence the Host will be removed.

Example: Suppose we have three Cassandra nodes

Node	Private IP	Public IP
Node1	10.220.0.20	54.220.0.20
Node2	10.220.0.21	54.220.0.21
Node3	10.220.0.22	54.220.0.22

And we call

Cluster.builder().addContactPoints("52.220.0.20", "52.220.0.21", "52.220.0.22");

Then when the client starts, you may see those Hosts, and only 54.220.0.22 is reachable from your client (because it cannot resolve private IP in a different VPC)

54.220.0.22, 10.220.20, 10.220.0.21

The only solution is to implement a AddressTranslater class that translates the private IP to public IP.

broadcast_rpc_host (only available for 2.1 and after)

This is the rpc host name used by Cassandra nodes in auto node detection, described here. It is used by clients to make connections to Cassandra, and can be configured to a public IP, hence solving the problem above.

broadcast_address

The IP address used by nodes from another DC to make connections to the node. This is to solve the private/public IP issue above, but only works for inter-node communication. Hence you should always set this value to the public IP.

listen_address

The IP address used by this node to listen for communication from another node. This is like the counterpart of broadcast_address, and should always be set to the private IP address. Like rpc_host, this address has to be recognised by Cassandra.

Please note this post may change in the future, to fix mistakes and add new materials.

We all know the importance of Java GC, but the huge number of writings on the topic can be difficult to grasp, so I've written this post to help sort out some of the confusion (especially in terminology).

Note that you may need to have some basic understanding of how GC works. If not, these two links provide a good introduction:

http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html (young gen GC)

http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html (CMS)

The Generational Heap

Generation	What’s on it
Young gen (aka. New gen)	The pool from which memory is initially allocated for most objects.
Old gen (aka. Tenured space)	The pool containing objects that have existed for some time in the survivor space.
Perm gen	The pool containing all the reflective data of the virtual machine itself, such as class and method objects.
Code cache	Memory that is used for compilation and storage of native code.

HotSpot Memory Layout

heap = young gen (aka. new gen) + old gen (aka. tenured space)

young gen = eden space + survivor space 1 + survivor space 2

One important point to bear in mind:

Almost all objects on heap will die (apart from application scoped objects), either when they're young or old, so the overall used heap size should be related to the rate of object creation and their lifetime only, not how long the application has run (unless there is a memory leak).

The Collectors

This link (also this one) provides a good overview of the types of collectors as well as their combination in HotSpot.

This table summarises all collectors (apart from G1 collector):

	Compaction Collector	Non Compaction Collector
Serial (Single thread)	Young gen: Serial collector (aka. Copy collector) (-XX:+UseSerialGC) Old gen: Mark Sweep Compact (aka. Serial old collector, MSC) (-XX:+UseSerialGC), note this is the default old GC	None
Parallel (Multi thread)	Young gen: Parallel collector (aka. Parallel scavenge, PS) (-XX:+UseParallelGC), only works with serial/parallel old collector Parallel new collector (-XX:+UseParNewGC), mainly works with CMS, but also works with MSC (because of full GC) Old gen: Parallel Scavenge Mark Sweep Compact (aka. Parallel MSC) (-XX:+UseParallelOldGC), note this will force -XX:+UseParallelGC	None
Concurrent	None	Old gen: Concurrent Mark Sweep (-XX:+UseConcMarkSweepGC), note this only works with -XX:+UseParNewGC or -XX:-UseParNewGC (i.e., use serial collector)

Let me just highlight some important points:

A minor collection happens on young gen only, while a major collection (aka. full GC) on the entire heap.
A major collection is triggered when old gen is full. It collects the old gen first, then the young gen, promoting qualified objects to old gen.
Stop-the-world events are not bad. All collector pauses are stop-the-world (whether it's a young gen collector or an old gen one), in order to calculate the GC roots and possibly compacting the heap or copying objects (to survivor spaces). What is bad is a major collection that stops the world for too long, or lots of frequent minor collections that add up to long pause time.
A full GC is slow because it involves compacting the heap (i.e., relocating live objects in old gen so they are close together).
CMS (concurrent-mark-sweep) collector tries to avoid full GC (hence better response time) at the cost of throughput (because the collector runs concurrently with the application threads). When a full GC is inevitable (either old gen is full, overly fragmented, or it cannot catch up with object creation rate), it falls back to the parallel collector to compact the heap (just once).
A rule of thumb: parallel GC for throughput, concurrent GC for response time.

Ways to Tune

There are two main ways to tune a JVM, as documented in Java SE 6 HotSpot GC tuning

1. Explicitly setting the heap and off-heap parameters

2. Using ergonomics (only works for parallel collectors)

Also see here for a list of all JVM options in Java 6.

How to Monitor GC

VisualVM or JConsole are the most used tools for visualising GC. Alternatively, use the following command prints the GC information:

jstat -gcutil <prid> <period in ms>

References

Generational heap: http://stackoverflow.com/a/1262474/842860
Java Garbage Collection Basics: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
Java SE 6 HotSpot GC tuning: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
http://www.infoq.com/articles/Java_Garbage_Collection_Distilled
http://javabook.compuware.com/content/memory/how-garbage-collection-works.aspx
http://www.fasterj.com/articles/oraclecollectors1.shtml
Minor GC, with a list of all possible collector options: http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html
CMS: http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html
TLAB: http://stackoverflow.com/a/25515423/842860
List of JVM options for 1.6: http://stas-blogspot.blogspot.co.uk/2011/07/most-complete-list-of-xx-options-for.html