In 50 words:
Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web.
While I started learning Cassandra, I tried to put together all the topics that I went through. Along with topics, I also tried to maintain the resources for that topic. My intention to share that information on the blog is two fold. While Cassandra experts can help me with missed topics or better resources, this blog can be a good reference for those who are aspiring to learn Cassandra. So let us begin.
Resources:
- Cassandra - The Definitive Guide by Eben Hewitt
- DataStax course videos: https://academy.datastax.com/courses/
- Cassandra kick-start: http://wiki.apache.org/cassandra/GettingStarted
- Tutorials: TutorialsPoint, DataStax, Intellipaat, Lynda.com and our beloved Google.
Sample java code
Topics, Summary And Notes
- Introduction
- Relational Databases in Distributive Environment
- ACID is not entirely true.
- Schema
- Sharding (Feature based, Key based, Lookup based).
- Why Cassandra?
- Distributed and decentralized (Server Symmetry vs Master Slave)
- Elastically scalable (Vertical and Horizontal scaling)
- Highly available, fault tolerant
- Tunably consistent (Eventually consistent, Consistency - Strict/Casual/Weak, CAP Theorem (CA, CP, AP))
- Row oriented (Sparse multidimensional hash-tables)
- Origin, Use-cases and Users
- Installation
- Download tgz or installer. Fairly easy installation.
- Directory Structure and notable file names:
- bin: nodetool (Inspect cluster)
- conf: storage-conf.xml (Configure keyspace and column families)
- interface: cassandra.thrift (RPC client API)
- javadoc
- lib
- Understanding cassandra.yaml
- Install python and subsequent setup.py installation in pylib
- Data Model
- Basic rule of Cassandra Data Model:
- Spread data evenly around the cluster and Minimize the number of partitions read.
- Model around queries (QDD)
- http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
- Important terms
- Cluster: Outermost structure of Cassandra.
- Keyspace: Outermost container for data in Cassandra
- Replication factor
- Replica placement strategy (SimpleStrategy/NetworkTopologyStrategy/OldN...)
- Column Families: Container for an ordered collection of rows, each of which is itself an ordered collection of columns.
- Attributes: Name and a comparator to sort the columns after column execution.
- Not like RDBMS schema
- Column: Basic unit of data structure
- Column has name, value and timestamp
- Wide and skinny rows
- Column sorting: e.g. WITH CLUSTERING ORDER BY (posttime DESC);
- Super Column: Special kind of column whose value is a map of subcolumns.
- Primary Key: ((Partition Key), Cluster Key)
- Partition: Group of rows sharing same partition key.
- Design difference between RDBMS and Cassandra
- No referential integrity
- Secondary Indexes
- Sorting: No ORDER BY here. It is a design decision. Though SliceRange is there.
- Denormalization
- Architecture https://wiki.apache.org/cassandra/ArchitectureInternals
- System Keyspace stores metadata of the node.
- node's token, cluster name, keyspace and schema def, is node bootstrapped.
- Schema column family and Migration column family.
- Peer-to-Peer distribution model. No Single Point of Failure (SPOF).
- Gossip and Failure-Detection
- org.apache.cassandra.gms.Gossiper class. Node registers itself to Gossiper at start.
- Gossiper class maintains a list of nodes which are dead and alive.
- Gossiper chooses a random node in the ring periodically (TimerTask) and starts session.
- 3 Messages. (GossipDigestSynMessage, GossipDigestAckMessage and GossipDigestAck2Message
- Cassandra works on Phi Accrual Failure Detection. Simple understanding - Click Here
- Failure detection: org.apache.cassandra.gms.FailureDetector
- Anti-entropy and Read repair
- Anti-entropy is the replica synchronization mechanism.
- server initiates a TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring nodes.
- Merkle tree is a hash representing the data in that column family.
- Dynamo DB vs Cassandra in terms of Merkel tree.
- org.apache.cassandra.service.AntiEntropyService and its static Differencer.
- Memtables, SSTables, and Commit Logs
- Memtable flushing is a nonblocking operation.
- Each commit log maintains an internal bit flag to indicate whether it needs flushing.
- There is only one bit flag per column family.
- All writes are sequential. So, if speed of disc is less, performance hit occurs.
- Hinted Handoff
- If a node is not available.
- Concern - hints build up considerably & flood the node with requests when it comes up.
- Hinted handoff message priority and disabling hinted handoff.
- Partitioner
- How data is distributed across the node?
- Murmur3Partitioner, RandomPartitioner, ByteOrderedPartitioner
- https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architecturePartitionerAbout_c.html
- Compaction
- http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html
- SizeTieredCompactionStrategy,DateTieredCompactionStrategy and LeveledCompactionStrategy
- http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
- Bloom Filter
- http://billmill.org/bloomfilter-tutorial/
- When a query is performed, the Bloom filter is checked first before accessing disk. Because false-negatives are not possible, if the filter indicates that the element does not exist in the set, it certainly doesn’t; but if the filter thinks that the element is in the set, the disk is accessed to make sure.
- Tombstones
- https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_deletes_c.html
- http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones.html
- Staged Event-Driven Architecture (SEDA)
- Stage is a basic unit of work and each stage can be handled by a different thread pool.
- Stages in Cassandra: Read, Mutation, Gossip, Response, Anti-Entropy, Load Balance, Migration, Streaming
- Snitch
- Gather information about your network topology so that Cassandra can efficiently route requests.
- https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchesAbout_c.html
- Security
- Security mechanism is pluggable, which means that you can easily swap out one authentication method for another, or write your own.
- Default: org.apache.cassandra.auth.AllowAllAuthenticator
- Alternative: org.apache.cassandra.auth.SimpleAuthenticator
- access.properties: e.g. Keyspace1=jsmith,Elvis Presley,dilbert
- passwd.properties: e.g. jsmith=havebadpass Elvis\ Presley=graceland4evar
- Accordingly replace the value of authenticator element in cassandra.yaml.
- Miscellaneous Configuration
- gc_grace_seconds: default - 864000 or 10 days
- in_memory_compaction_limit_in_mb: default - 64 MB
- Reading and Writing Data
- Query differences between RDBMS and Cassandra
- No update query
- Record-Level Atomicity on Writes
- No Server-Side Transaction Support
- No Duplicate Keys
- Consistency Levels
- Read consistency levels: ONE, QUORUM, ALL
- Write consistency levels: ZERO, ANY, ONE, QUORUM, ALL
- Slice Predicate <Yet to understand completely :'(>
- Java Clients
- Thrift, Hector, Astyanax, Casser, Achilles
- Casser uses Java 8.
- These clients usually provide connection pooling and object mapping.
- We can happily use Spring-data-cassandra.
- Monitoring
- Logging
- Cassandra uses log4j. Default log level is INFO.
- conf/log4j-server.properties (log4j.rootLogger, log4j.appender.R.File)
- Tailing
- Warning signs: DEBUG 12:39:56,312 attempting to connect to mywinbox/192.168.1.3
- Connecting is ok, but it should connect. If it is stuck, it is a problem.
- JMX and MBeans
- http://www.javalobby.org/java/forums/t49130.html
- https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_monitoring_c.html
- https://cassandra-zone.com/performance-monitoring/
- Health Check
- Check MBean for org.apache.cassandra.concurrent.ROW_MUTATION_STAGE
- Check MBean for org.apache.cassandra.concurrent.ROW_READ_STAGE
- Maintenance
- Ring Information (nodetool -h <IP Address> <command>)
- info: basic information about current state of a node.
- ring: Nodes in the ring and their state. Concept of Range Tokens.
- Getting Statistics
- cfstats: Overview of column family. Gives the cluster metrics.
- tpstats: Information on thread pools that cassandra maintains.
- Basic Maintenance
- repair: induces major compaction.
- flush: forces write of data from memtable to SSTable.
- cleanup
- Snapshots
- To make a copy of some or all of the keyspaces in a node and save it to what is essentially a separate database file.
- Load Balancing the cluster
- loadbalance: decommission node and send its tokens to others, then bootstrap it again.
- stream: monitoring the load-balancing operation while it is happening.
- Decommissioning a node
- decommission
- Steps:
- gossiper shut down.
- messaging service is shut down.
- SEDA stage manager is shut down.
- Node is set to be decommissioned.
- Data is streamed to other nodes.
- After receiving acknowledgement, node can leave the ring.
- Updating nodes
- removetoken: Cannot remove it's own token.
- Compaction Threshold: number of SSTables that are in the queue to be compacted before a minor compaction is actually kicked off. Minimum - 4, Maximum - 32
- Performance Tuning
- Data Storage: Store the datafiles and the commit logs on separate hard disks.
- Reply timeout: How long Cassandra will wait for other nodes to respond before deciding that the request is a failure. rpc_timeout_in_ms in yaml. Defaults to 5000 (5 sec)
- Commit Logs: How large the commit log is allowed to grow before it stops appending new writes to a file and creates a new one. commitlog_rotation_threshold_in_mb in YAML.
- Memtables:
- size memtable can grow to before it's flushed. binary_memtable_throughput_in_mb
- column values that will be stored in a memtable before being flushed.
- Concurrency
- concurrent_reads: default - 2 threads per processor core.
- concurrent_writes: Match it with the number of clients that will concurrently write.
- Caching
- row cahe, key cache
Other Resources
- Pagination
- https://datastax.github.io/java-driver/2.1.8/features/paging/
- https://gist.github.com/stevesun21/df3fa5141bd01a4f83fc
- https://github.com/PatrickCallaghan/datastax-paging-demo
- Basic Concepts: http://clojurecassandra.info/articles/cassandra_concepts.html
- Introductory Pointers: https://dzone.com/articles/introduction-apache-cassandras
- CQL: https://cassandra.apache.org/doc/cql3/CQL-2.2.html
- Youtube O'Reilly
0 comments:
Post a Comment