Learning Cassandra - Topics and Quick Notes

In 50 words:

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web.

While I started learning Cassandra, I tried to put together all the topics that I went through. Along with topics, I also tried to maintain the resources for that topic. My intention to share that information on the blog is two fold. While Cassandra experts can help me with missed topics or better resources, this blog can be a good reference for those who are aspiring to learn Cassandra. So let us begin.

Resources:

Cassandra - The Definitive Guide by Eben Hewitt
DataStax course videos: https://academy.datastax.com/courses/
Cassandra kick-start: http://wiki.apache.org/cassandra/GettingStarted
Tutorials: TutorialsPoint, DataStax, Intellipaat, Lynda.com and our beloved Google.

Sample java code

https://github.com/imsatyam/cassandra_basic

Topics, Summary And Notes

Introduction

Relational Databases in Distributive Environment

ACID is not entirely true.
Schema
Sharding (Feature based, Key based, Lookup based).

Why Cassandra?

Distributed and decentralized (Server Symmetry vs Master Slave)
Elastically scalable (Vertical and Horizontal scaling)
Highly available, fault tolerant
Tunably consistent (Eventually consistent, Consistency - Strict/Casual/Weak, CAP Theorem (CA, CP, AP))
Row oriented (Sparse multidimensional hash-tables)

Origin, Use-cases and Users

Installation

Download tgz or installer. Fairly easy installation.
Directory Structure and notable file names:

bin: nodetool (Inspect cluster)
conf: storage-conf.xml (Configure keyspace and column families)
interface: cassandra.thrift (RPC client API)
javadoc
lib

Understanding cassandra.yaml
Install python and subsequent setup.py installation in pylib

Data Model

Basic rule of Cassandra Data Model:

Spread data evenly around the cluster and Minimize the number of partitions read.
Model around queries (QDD)
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Important terms

Cluster: Outermost structure of Cassandra.
Keyspace: Outermost container for data in Cassandra

Replication factor
Replica placement strategy (SimpleStrategy/NetworkTopologyStrategy/OldN...)

Column Families: Container for an ordered collection of rows, each of which is itself an ordered collection of columns.

Attributes: Name and a comparator to sort the columns after column execution.
Not like RDBMS schema

Column: Basic unit of data structure

Column has name, value and timestamp
Wide and skinny rows
Column sorting: e.g. WITH CLUSTERING ORDER BY (posttime DESC);

Super Column: Special kind of column whose value is a map of subcolumns.
Primary Key: ((Partition Key), Cluster Key)
Partition: Group of rows sharing same partition key.

Design difference between RDBMS and Cassandra

No referential integrity
Secondary Indexes
Sorting: No ORDER BY here. It is a design decision. Though SliceRange is there.
Denormalization

Architecture https://wiki.apache.org/cassandra/ArchitectureInternals

System Keyspace stores metadata of the node.

node's token, cluster name, keyspace and schema def, is node bootstrapped.
Schema column family and Migration column family.

Peer-to-Peer distribution model. No Single Point of Failure (SPOF).
Gossip and Failure-Detection

org.apache.cassandra.gms.Gossiper class. Node registers itself to Gossiper at start.
Gossiper class maintains a list of nodes which are dead and alive.
Gossiper chooses a random node in the ring periodically (TimerTask) and starts session.
3 Messages. (GossipDigestSynMessage, GossipDigestAckMessage and GossipDigestAck2Message
Cassandra works on Phi Accrual Failure Detection. Simple understanding - Click Here
Failure detection: org.apache.cassandra.gms.FailureDetector

Anti-entropy and Read repair

Anti-entropy is the replica synchronization mechanism.

server initiates a TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring nodes.
Merkle tree is a hash representing the data in that column family.
Dynamo DB vs Cassandra in terms of Merkel tree.
org.apache.cassandra.service.AntiEntropyService and its static Differencer.

Memtables, SSTables, and Commit Logs

Memtable flushing is a nonblocking operation.
Each commit log maintains an internal bit flag to indicate whether it needs flushing.
There is only one bit flag per column family.
All writes are sequential. So, if speed of disc is less, performance hit occurs.

Hinted Handoff

If a node is not available.
Concern - hints build up considerably & flood the node with requests when it comes up.
Hinted handoff message priority and disabling hinted handoff.

Partitioner

How data is distributed across the node?
Murmur3Partitioner, RandomPartitioner, ByteOrderedPartitioner
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architecturePartitionerAbout_c.html

Compaction

http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html
SizeTieredCompactionStrategy,DateTieredCompactionStrategy and LeveledCompactionStrategy
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

Bloom Filter

http://billmill.org/bloomfilter-tutorial/
When a query is performed, the Bloom filter is checked first before accessing disk. Because false-negatives are not possible, if the filter indicates that the element does not exist in the set, it certainly doesn’t; but if the filter thinks that the element is in the set, the disk is accessed to make sure.

Tombstones

Staged Event-Driven Architecture (SEDA)

Stage is a basic unit of work and each stage can be handled by a different thread pool.
Stages in Cassandra: Read, Mutation, Gossip, Response, Anti-Entropy, Load Balance, Migration, Streaming

Snitch

Gather information about your network topology so that Cassandra can efficiently route requests.
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchesAbout_c.html

Security

Security mechanism is pluggable, which means that you can easily swap out one authentication method for another, or write your own.
Default: org.apache.cassandra.auth.AllowAllAuthenticator
Alternative: org.apache.cassandra.auth.SimpleAuthenticator

access.properties: e.g. Keyspace1=jsmith,Elvis Presley,dilbert
passwd.properties: e.g. jsmith=havebadpass Elvis\ Presley=graceland4evar

Accordingly replace the value of authenticator element in cassandra.yaml.

Miscellaneous Configuration

gc_grace_seconds: default - 864000 or 10 days
in_memory_compaction_limit_in_mb: default - 64 MB

Reading and Writing Data

Query differences between RDBMS and Cassandra

No update query
Record-Level Atomicity on Writes
No Server-Side Transaction Support
No Duplicate Keys

Consistency Levels

Read consistency levels: ONE, QUORUM, ALL
Write consistency levels: ZERO, ANY, ONE, QUORUM, ALL

Slice Predicate <Yet to understand completely :'(>

Java Clients

Thrift, Hector, Astyanax, Casser, Achilles
Casser uses Java 8.
These clients usually provide connection pooling and object mapping.
We can happily use Spring-data-cassandra.

Monitoring

Logging

Cassandra uses log4j. Default log level is INFO.
conf/log4j-server.properties (log4j.rootLogger, log4j.appender.R.File)
Tailing
Warning signs: DEBUG 12:39:56,312 attempting to connect to mywinbox/192.168.1.3

Connecting is ok, but it should connect. If it is stuck, it is a problem.

JMX and MBeans

Health Check

Check MBean for org.apache.cassandra.concurrent.ROW_MUTATION_STAGE
Check MBean for org.apache.cassandra.concurrent.ROW_READ_STAGE

Maintenance

Ring Information (nodetool -h <IP Address> <command>)

info: basic information about current state of a node.
ring: Nodes in the ring and their state. Concept of Range Tokens.

Getting Statistics

cfstats: Overview of column family. Gives the cluster metrics.
tpstats: Information on thread pools that cassandra maintains.

Basic Maintenance

repair: induces major compaction.
flush: forces write of data from memtable to SSTable.
cleanup

Snapshots

To make a copy of some or all of the keyspaces in a node and save it to what is essentially a separate database file.

Load Balancing the cluster

loadbalance: decommission node and send its tokens to others, then bootstrap it again.
stream: monitoring the load-balancing operation while it is happening.

Decommissioning a node

decommission
Steps:

gossiper shut down.
messaging service is shut down.
SEDA stage manager is shut down.
Node is set to be decommissioned.
Data is streamed to other nodes.
After receiving acknowledgement, node can leave the ring.

Updating nodes

removetoken: Cannot remove it's own token.
Compaction Threshold: number of SSTables that are in the queue to be compacted before a minor compaction is actually kicked off. Minimum - 4, Maximum - 32

Performance Tuning

Data Storage: Store the datafiles and the commit logs on separate hard disks.
Reply timeout: How long Cassandra will wait for other nodes to respond before deciding that the request is a failure. rpc_timeout_in_ms in yaml. Defaults to 5000 (5 sec)
Commit Logs: How large the commit log is allowed to grow before it stops appending new writes to a file and creates a new one. commitlog_rotation_threshold_in_mb in YAML.
Memtables:

size memtable can grow to before it's flushed. binary_memtable_throughput_in_mb
column values that will be stored in a memtable before being flushed.

Concurrency

concurrent_reads: default - 2 threads per processor core.
concurrent_writes: Match it with the number of clients that will concurrently write.

Caching

row cahe, key cache

Other Resources

Pagination

Basic Concepts: http://clojurecassandra.info/articles/cassandra_concepts.html
Introductory Pointers: https://dzone.com/articles/introduction-apache-cassandras
CQL: https://cassandra.apache.org/doc/cql3/CQL-2.2.html
Youtube O'Reilly

Pages

Potpourri

Learning Cassandra - Topics and Quick Notes

Satyam Shandilya

No comments:

Post a Comment

About Me

Popular Posts

Labels Cloud

Social

Instagram