Understanding How CQL3 Maps to Cassandra’s Internal Data Structure: Sets, Lists, and Maps

July 24, 2013 John Berryman
Category: Uncategorized

Refer to my previous post for more in depth details into what Im doing here. This is just a quick reference regarding how Cassandra sets, lists, and maps work under the hood:

Maps

Defining:

cqlsh:test> CREATE TABLE phonelists (        ... user text PRIMARY KEY,        ... phoneNumbers map );

Inserting:

cqlsh:test> INSERT INTO phonelists (user, phonenumbers)        ... VALUES ('john',{'patricia':'555-4326','doug':'555-1579'});cqlsh:test> INSERT INTO phonelists (user, phonenumbers)        ... VALUES ('scott',{'bill':'555-7382','patricia':'555-4326', 'jane':'555-8743'});

Retrieving:

cqlsh:test> SELECT * FROM phonelists; user  | phonenumbers-------+------------------------------------------------------ scott | {bill: 555-7382, jane: 555-8743, patricia: 555-4326}  john |                 {doug: 555-1579, patricia: 555-4326}

Updating:

cqlsh:test> UPDATE phonelists        ... SET phonenumbers = phonenumbers + {'daniel':'555-0453'}        ... WHERE user='john';

The internal representation:

[default@test] list phonelists;-------------------RowKey: scott=> (column=, value=, timestamp=1374684062860000)=> (column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)=> (column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)-------------------RowKey: john=> (column=, value=, timestamp=1374683971220000)=> (column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374683971220000)

Note that the above text components, e.g. doug and 555-1579′, are returned by cassandra-cli in their hex encoding. Ive taken the liberty to decode them so that the print out is more understandable. I will do this throughout the post.

Lists

Defining:

cqlsh:test> CREATE TABLE friendlists (        ... user text PRIMARY KEY,        ... friends list         ... );

Inserting:

cqlsh:test> INSERT INTO friendlists (user, friends)        ... VALUES ('john',['doug','patricia','scott']);cqlsh:test> INSERT INTO friendlists (user, friends)        ... VALUES ('patricia', ['john','lucifer']);

Retrieving:

cqlsh:test> SELECT * FROM friendlists; user     | friends----------+-------------------------     john | [doug, patricia, scott] patricia |         [john, lucifer]

Updating:

cqlsh:test> UPDATE friendlists        ... SET friends = friends + ['matt','eric']        ... WHERE user='john';cqlsh:test> UPDATE friendlists        ... SET friends = friends - ['lucifer']        ... WHERE user='patricia';

The internal representation:

[default@test] list friendlists;Using default limit of 100Using default column limit of 100-------------------RowKey: john=> (column=, value=, timestamp=1374687324950000)=> (column=friends:26017c10f48711e2801fdf9895e5d0f8, value='doug', timestamp=1374687206993000)=> (column=friends:26017c11f48711e2801fdf9895e5d0f8, value='patricia', timestamp=1374687206993000)=> (column=friends:26017c12f48711e2801fdf9895e5d0f8, value='scott', timestamp=1374687206993000)=> (column=friends:6c504b60f48711e2801fdf9895e5d0f8, value='matt', timestamp=1374687324950000)=> (column=friends:6c504b61f48711e2801fdf9895e5d0f8, value='eric', timestamp=1374687324950000)-------------------RowKey: patricia=> (column=, value=, timestamp=1374687352290000)=> (column=friends:3b817b80f48711e2801fdf9895e5d0f8, value='john', timestamp=1374687243064000)

Here the internal column name is more complicated because a UUID is appended to the name of the CQL field “friend”. This is used to keep track of the order of items in the list.

To be determined: Does a list item delete take more time than a list insert? I suspect so – I dont see how Cassandra can delete an element in the list without reading in all the elements in the list and then deleting column that has the value indicated in the delete. – UPDATE This is the case.

Set

Defining:

cqlsh:test> CREATE TABLE friendsets (        ... user text PRIMARY KEY,        ... friends set         ... );

Inserting:

cqlsh:test> INSERT INTO friendsets (user, friends)        ... VALUES ('john',{'doug','patricia','scott'});cqlsh:test> INSERT INTO friendsets (user, friends)        ... VALUES ('patricia', {'john','lucifer'});

Retrieving:

cqlsh:test> SELECT * FROM friendsets; user     | friends----------+-------------------------     john | {doug, patricia, scott} patricia |         {john, lucifer}

Updating:

cqlsh:test> UPDATE friendsets        ... SET friends = friends + {'matt','eric'}        ... WHERE user='john';cqlsh:test> UPDATE friendsets        ... SET friends = friends - {'lucifer'}        ... WHERE user='patricia';

The internal representation:

[default@test] list friendsets;Using default limit of 100Using default column limit of 100-------------------RowKey: john=> (column=, value=, timestamp=1374688135443000)=> (column=friends:'doug', value=, timestamp=1374688108307000)=> (column=friends:'eric', value=, timestamp=1374688135443000)=> (column=friends:'matt', value=, timestamp=1374688135443000)=> (column=friends:'patricia', value=, timestamp=1374688108307000)=> (column=friends:'scott', value=, timestamp=1374688108307000)-------------------RowKey: patricia=> (column=, value=, timestamp=1374688151386000)=> (column=friends:'john', value=, timestamp=1374688116595000)