Cassandra Query Language (CQL) v1.0.0 (UPDATED)


NOTE: CQL V2 reference is available here http://crlog.info/2011/09/17/cassandra-query-language-cql-v2-0-reference/

Cassandra Query Language (CQL) v1.0.0

This is an update to my previous post documenting the Cassandra query language CQL. A few changes have been made in CQL, the biggest change being the addition of the INSERT keyword. Previously the UPDATE statement would perform an insert if a value did not already exists, the INSERT statement now explicitly does this inserting. BATCH and ALTER TABLE are also now included in the mix, see the official doc here : https://github.com/apache/cassandra/blob/trunk/doc/cql/CQL.textile.

If you’re new to NoSQL and Cassandra you can read this gentle Introduction to NoSQL and Apache Cassandra

  1. Cassandra Query Language (CQL) v1.0.0
    1. Table of Contents
    2. USE
    3. SELECT
      1. Specifying Columns
      2. Column Family
      3. Consistency Level
      4. Filtering rows
      5. Limits
    4. ALTER TABLE
    5. INSERT
    6. UPDATE
      1. Column Family
      2. Consistency Level
      3. Timestamp
      4. TTL
      5. Specifying Columns and Row
    7. DELETE
      1. Specifying Columns
      2. Column Family
      3. Consistency Level
    8. BATCH
    9. TRUNCATE
    10. CREATE KEYSPACE
    11. CREATE COLUMNFAMILY
      1. Column Family Options
    12. CREATE INDEX
    13. DROP INDEX
    14. DROP
    15. Common Idioms
  2. Versioning
  3. Changes

Cassandra Query Language (CQL) v1.0.0

Table of Contents

USE

Synopsis:

 USE <KEYSPACE>;

A USE statement consists of the USE keyword, followed by a valid keyspace name. Its purpose is to assign the per-connection, current working keyspace. All subsequent keyspace-specific actions will be performed in the context of the supplied value.

SELECT

Synopsis:

 SELECT [FIRST N] [REVERSED] <SELECT EXPR> FROM <COLUMN FAMILY> [USING <CONSISTENCY>] [WHERE <CLAUSE>] [LIMIT N];

A SELECT is used to read one or more records from a Cassandra column family. It returns a result-set of rows, where each row consists of a key and a collection of columns corresponding to the query.

Specifying Columns

 SELECT [FIRST N] [REVERSED] name1, name2, name3 FROM ... SELECT [FIRST N] [REVERSED] name1..nameN FROM ...

The SELECT expression determines which columns will appear in the results and takes the form of either a comma separated list of names, or a range. The range notation consists of a start and end column name separated by two periods (..). The set of columns returned for a range is start and end inclusive.

The FIRST option accepts an integer argument and can be used to apply a limit to the number of columns returned per row. When this limit is left unset it defaults to 10,000 columns.

The REVERSED option causes the sort order of the results to be reversed.

It is worth noting that unlike the projection in a SQL SELECT, there is no guarantee that the results will contain all of the columns specified. This is because Cassandra is schema-less and there are no guarantees that a given column exists.

Column Family

 SELECT ... FROM <COLUMN FAMILY> ...

The FROM clause is used to specify the Cassandra column family applicable to a SELECT query.

Consistency Level

 SELECT ... [USING <CONSISTENCY>] ...

Following the column family clause is an optional consistency level specification.

Filtering rows

 SELECT ... WHERE KEY = keyname AND name1 = value1 SELECT ... WHERE KEY >= startkey and KEY =< endkey AND name1 = value1 SELECT ... WHERE KEY IN ('<key>', '<key>', '<key>', ...)

The WHERE clause provides for filtering the rows that appear in results. The clause can filter on a key name, or range of keys, and in the case of indexed columns, on column values. Key filters are specified using the KEY keyword, a relational operator, (one of =, >, >=, <, and <=), and a term value. When terms appear on both sides of a relational operator it is assumed the filter applies to an indexed column. With column index filters, the term on the left of the operator is the name, the term on the right is the value to filter on.

Note: The greater-than and less-than operators (> and <) result in key ranges that are inclusive of the terms. There is no supported notion of “strictly” greater-than or less-than; these operators are merely supported as aliases to >= and <=.

Limits

 SELECT ... WHERE <CLAUSE> [LIMIT N] ...

Limiting the number of rows returned can be achieved by adding the LIMIT option to a SELECT expression. LIMIT defaults to 10,000 when left unset.

ALTER TABLE

Synopsis:

bc.

ALTER TABLE ADD ;

ALTER TABLE ALTER TYPE;

ALTER TABLE DROP;

An ALTER is used to manipulate with ColumnFamily columns. It allows you to add new columns, alter and drop existing columns. No results are returned.

INSERT

Synopsis:

bc.

INSERT INTO (KEY,
,
, …) VALUES (, , , …) [USING CONSISTENCY [AND TIMESTAMP ] [AND TTL]];

An INSERT is used to write one or more columns to a record in a Cassandra column family. No results are returned.

INSERT works exactly like UPDATE so for information about Column Family and Consistency Level arguments please take at the UPDATE section.

UPDATE

Synopsis:

 UPDATE <COLUMN FAMILY> [USING <CONSISTENCY> [AND TIMESTAMP <timestamp>] [AND TTL <timeToLive>]] SET name1 = value1, name2 = value2 WHERE KEY = keyname;

An UPDATE is used to write one or more columns to a record in a Cassandra column family. No results are returned.

Column Family

 UPDATE <COLUMN FAMILY> ...

Statements begin with the UPDATE keyword followed by a Cassandra column family name.

Consistency Level

 UPDATE ... [USING <CONSISTENCY>] ...

Following the column family identifier is an optional consistency level specification.

Timestamp

bc.

UPDATE … [USING TIMESTAMP] …

UPDATE supports setting client-supplied optional timestamp for modification.

TTL

bc.

UPDATE … [USING TTL] …

UPDATE supports setting time to live (TTL) for each of the columns in UPDATE statement.

Specifying Columns and Row

 UPDATE ... SET name1 = value1, name2 = value2 WHERE KEY = keyname; UPDATE ... SET name1 = value1, name2 = value2 WHERE KEY IN ('<key>', '<key>', ...)

Rows are created or updated by supplying column names and values in term assignment format. Multiple columns can be set by separating the name/value pairs using commas. Each update statement requires exactly one key to be specified using a WHERE clause and the KEY keyword.

DELETE

Synopsis:

 DELETE [COLUMNS] FROM <COLUMN FAMILY> [USING <CONSISTENCY>] WHERE KEY = keyname1 DELETE [COLUMNS] FROM <COLUMN FAMILY> [USING <CONSISTENCY>] WHERE KEY IN (keyname1, keyname2);

A DELETE is used to perform the removal of one or more columns from one or more rows.

Specifying Columns

 DELETE [COLUMNS] ...

Following the DELETE keyword is an optional comma-delimited list of column name terms. When no column names are specified, the remove applies to the entire row(s) matched by the WHERE clause

Column Family

 DELETE ... FROM <COLUMN FAMILY> ...

The column family name follows the list of column names.

Consistency Level

 DELETE ... [USING <CONSISTENCY>] ...

Following the column family identifier is an optional consistency level specification.

Specifying Rows

 DELETE ... WHERE KEY = keyname1 DELETE ... WHERE KEY IN (keyname1, keyname2)

The WHERE clause is used to determine which row(s) a DELETE applies to. The first form allows the specification of a single keyname using the KEY keyword and the = operator. The second form allows a list of keyname terms to be specified using the IN notation and a parenthesized list of comma-delimited keyname terms.

BATCH

Synopsis:

bc.

BATCH BEGIN BATCH [USING CONSISTENCY [AND TIMESTAMP]]

INSERT or UPDATE or DELETE statements separated by semicolon or “end of line”

APPLY BATCH

BATCH supports setting client-supplied optional global timestamp which will be used for each of the operations included in batch.

A single consistency level is used for the entire batch, it appears after the BEGIN BATCH statement, and uses the standard consistency level specification. Batch default to CONSISTENCY.ONE when left unspecified.

NOTE: While there are no isolation guarantees, UPDATE queries are atomic within a give record.

Example:

bc.

BEGIN BATCH USING CONSISTENCY QUORUM

INSERT INTO users (KEY, password, name) VALUES (‘user2’, ‘ch@ngem3b’, ‘second user’)

UPDATE users SET password = ‘ps22dhds’ WHERE KEY = ‘user2’

INSERT INTO users (KEY, password) VALUES (‘user3’, ‘ch@ngem3c’)

DELETE name FROM users WHERE key = ‘user2’

INSERT INTO users (KEY, password, name) VALUES (‘user4’, ‘ch@ngem3c’, ‘Andrew’)

APPLY BATCH

TRUNCATE

Synopsis:

 TRUNCATE <COLUMN FAMILY>

Accepts a single argument for the column family name, and permanently removes all data from said column family.

CREATE KEYSPACE

Synopsis:

 CREATE KEYSPACE <NAME> WITH AND strategy_class = <STRATEGY> AND strategy_options.<OPTION> = <VALUE> [AND strategy_options.<OPTION> = <VALUE>];

The CREATE KEYSPACE statement creates a new top-level namespace (aka “keyspace”). Valid names are any string constructed of alphanumeric characters and underscores, but must begin with a letter. Properties such as replication strategy and count are specified during creation using the following accepted keyword arguments:

keyword required description
strategy_options no Most strategies require additional arguments which can be supplied by appending the option name to the strategy_options keyword, separated by a colon (:). For example, a strategy option of “DC1” with a value of “1” would be specified as strategy_options:DC1 = 1; replication_factor for SimpleStrategy could be strategy_options:replication_factor=3.

CREATE COLUMNFAMILY

Synopsis:

 CREATE COLUMNFAMILY <COLUMN FAMILY> (KEY <type> PRIMARY KEY [, name1 type, name2 type, ...]); CREATE COLUMNFAMILY <COLUMN FAMILY> (KEY <type> PRIMARY KEY [, name1 type, name2 type, ...]) [WITH keyword1 = arg1 [AND keyword2 = arg2 [AND ...]]];

CREATE COLUMNFAMILY statements create new column family namespaces under the current keyspace. Valid column family names are strings of alphanumeric characters and underscores, which begin with a letter.

Specifying Key Type

 CREATE ... (KEY <type> PRIMARY KEY) ...

When creating a new column family, you must specify key type. The list of possible key types is identical to column comparators/validators, (see Specifying Column Type). It’s important to note that the key type must be compatible with the partitioner in use, for example OrderPreservingPartitioner and CollatingOrderPreservingPartitioner both require UTF-8 keys.

Specifying Column Type (optional)

 CREATE ... (KEY <type> PRIMARY KEY, name1 type, name2 type) ...

It is possible to assign columns a type during column family creation. Columns configured with a type are validated accordingly when a write occurs. Column types are specified as a parenthesized, comma-separated list of column term and type pairs. The list of recognized types are:

type description
bytea Arbitrary bytes (no validation)
ascii ASCII character string
text UTF8 encoded string
varchar UTF8 encoded string
uuid Type 1, or type 4 UUID
varint Arbitrary-precision integer
int 8-byte long (same as bigint)
bigint 8-byte long

Note: In addition to the recognized types listed above, it is also possible to supply a string containing the name of a class (a sub-class of AbstractType), either fully qualified, or relative to the org.apache.cassandra.db.marshal package.

Column Family Options (optional)

 CREATE COLUMNFAMILY ... WITH keyword1 = arg1 AND keyword2 = arg2;

A number of optional keyword arguments can be supplied to control the configuration of a new column family.

keyword default description
comparator text Determines sorting and validation of column names. Valid values are identical to the types listed in Specifying Column Type above.
comment none A free-form, human-readable comment.
row_cache_size 0 Number of rows whose entire contents to cache in memory.
key_cache_size 200000 Number of keys per SSTable whose locations are kept in memory in “mostly LRU” order.
read_repair_chance 1.0 The probability with which read repairs should be invoked on non-quorum reads.
gc_grace_seconds 864000 Time to wait before garbage collecting tombstones (deletion markers).
default_validation text Determines validation of column values. Valid values are identical to the types listed in Specifying Column Type above.
min_compaction_threshold 4 Minimum number of SSTables needed to start a minor compaction.
max_compaction_threshold 32 Maximum number of SSTables allowed before a minor compaction is forced.
row_cache_save_period_in_seconds 0 Number of seconds between saving row caches.
key_cache_save_period_in_seconds 14400 Number of seconds between saving key caches.
memtable_flush_after_mins 60 Maximum time to leave a dirty table unflushed.
memtable_throughput_in_mb dynamic Maximum size of the memtable before it is flushed.
memtable_operations_in_millions dynamic Number of operations in millions before the memtable is flushed.
replicate_on_write false

CREATE INDEX

Synopsis:

CREATE INDEX [index_name] ON <column_family> (column_name);

A CREATE INDEX statement is used to create a new, automatic secondary index for the named column.

DROP INDEX

Synopsis:

DROP INDEX <INDEX_NAME>

A DROP INDEX statement is used to drop an existing secondary index.

DROP INDEX statement will search all ColumnFamilies in the current Keyspace for specified index and delete it if found.

DROP

Synopsis:

DROP <KEYSPACE|COLUMNFAMILY> namespace;

DROP statements result in the immediate, irreversible removal of keyspace and column family namespaces.

Common Idioms

Specifying Consistency

 ... USING <CONSISTENCY> ...

Consistency level specifications are made up the keyword USING, followed by a consistency level identifier. Valid consistency levels are as follows:

  • CONSISTENCY ANY
  • CONSISTENCY ONE (default)
  • CONSISTENCY QUORUM
  • CONSISTENCY ALL
  • CONSISTENCY LOCAL_QUORUM
  • CONSISTENCY EACH_QUORUM

Term specification

Terms are used in statements to specify things such as keyspaces, column families, indexes, column names and values, and keyword arguments. The rules governing term specification are as follows:

  • Any single quoted string literal (example: 'apple').
  • Unquoted alpha-numeric strings that begin with a letter (example: carrot).
  • Unquoted numeric literals (example: 100).
  • UUID strings in hyphen-delimited hex notation (example: 1438fc5c-4ff6-11e0-b97f-0026c650d722).

Terms which do not conform to these rules result in an exception.

How column name/value terms are interpreted is determined by the configured type.

type term
ascii Any string which can be decoded using ASCII charset
text / varchar Any string which can be decoded using UTF8 charset
uuid Standard UUID string format (hyphen-delimited hex notation)
uuid Standard UUID string format (hyphen-delimited hex notation)
uuid The string now, to represent a type-1 (time-based) UUID with a date-time component based on the current time
uuid Numeric value representing milliseconds since epoch
uuid An iso8601 timestamp
int Integer value capable of fitting in 8 bytes (same as bigint)
bigint Integer value capable of fitting in 8 bytes
varint Integer value of arbitrary size
bytea Hex-encoded strings (converted directly to the corresponding bytes)

Versioning

Versioning of the CQL language adheres to the Semantic Versioning guidelines. Versions take the form X.Y.Z where X, Y, and Z are integer values representing major, minor, and patch level respectively. There is no correlation between Cassandra release versions and the CQL language version.

version description
Patch The patch version is incremented when bugs are fixed.
Minor Minor version increments occur when new, but backward compatible, functionality is introduced.
Major The major version must be bumped when backward incompatible changes are introduced. This should rarely (if ever) occur.

Changes

Tue, 22 Mar 2011 18:10:28 -0700 - Eric Evans <eevans@rackspace.com>
 * Initial version, 1.0.0

9 Responses to Cassandra Query Language (CQL) v1.0.0 (UPDATED)

  1. Pingback: Cassandra Query Language AKA CQL syntax « Courtney Robinson 's log

  2. Pingback: CQL : Creating a simple keyspace « Courtney Robinson 's log

  3. Pingback: CQL : Creating a simple keyspace « Courtney Robinson 's log

  4. Pingback: CQL : Creating a column family « Courtney Robinson 's log

  5. Pingback: Windows Azure and Cloud Computing Posts for 8/6/2011+ - Windows Azure Blog

  6. Pingback: Cassandra Query Language (CQL) v2.0 reference « Courtney Robinson 's log

  7. Pingback: IT-kregi » Apache Cassandra 1.0.0

  8. nish gowda says:

    Does Order by work in CQL?

    • Courtney says:

      No Cassandra’s data is sorted when written see http://wiki.apache.org/cassandra/DataModel#Columns
      You can to some extent use ordering on a slice range http://wiki.apache.org/cassandra/API#SliceRange you should aim to model your data to avoid the need for sorting as much as you can.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,384 other followers

%d bloggers like this: