Understand the Cassandra data model

The Cassandra data model defines

  • Column family as a way to store and organize data
  • Table as a two-dimensional view of a multi-dimensional column family
  • Operations on tables using the Cassandra Query Language (CQL)

Cassandra1.2+reliesonCQLschema,concepts,andterminology, though the older Thrift API remains available

Table (CQL API terms) Column Family (Thrift API terms)
Table is a set of partitions Column family is a set of rows
Partition may be single or multiple row Row may be skinny or wide
Partition key uniquely identifies a partition, and may be simple or composite Row key uniquely identifies a row, and may be simple or composite
Column uniquely identifies a cell in a partition, and may be regular or clustering Column key uniquely identies a cell in a row, and may be simple or composite
Primary key is comprised of a partition key plus clustering columns, if any, and uniquely identifies a row in both its partition and table

Row (Partition)

Row is the smallest unit that stores related data in Cassandra

  • Rows: individual rows constitute a column family
  • Row key: uniquely identifies a row in a column family
  • Row: stores pairs of column keys and column values
  • Column key: uniquely identifies a column value in a row
  • Column value: stores one value or a collection of values

Rows may be described as skinny or wide

  • Skinny row: has a fixed, relatively small number of column keys
  • Wide row: has a relatively large number of column keys (hundreds or thousands); this number may increase as new data values are inserted

Key (Partition Key)

Composite row key

multiple components separated by colon

Composite column key

multiple components separated by colon

Column family (Table)

set of rows with a similar structure

Table with single-row partitions

Table with multi-row partitions