Apache Cassandra Learning Step by Step (3): Samples ABC

最新推荐文章于 2019-08-28 12:44:00 发布

itstarting

最新推荐文章于 2019-08-28 12:44:00 发布

阅读量103

点赞数

分类专栏： NoSQL 文章标签： NoSQL Apache Cassandra learning

NoSQL 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

====16 Feb 2012, by Bright Zheng (IT进行时)====

4. Samples ABC

We try to learn it step by step to understand the concepts and Java API usages by means of:

1. Concept Introduction

2. CLI

3. Java Sample Code

4.1. Get a Single Column by a Key

4.1.1. Sample Code

public QueryResult<HColumn<String,String>> execute() {

ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

columnQuery.setColumnFamily("Npanxx");

columnQuery.setKey("512204");

columnQuery.setName("city");

QueryResult<HColumn<String, String>> result = columnQuery.execute();

return result;

}

4.1.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

HColumn(city=Austin)

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.1.3. CLI

[default@Tutorial] get Npanxx['512204']['city'];

=> (column=city, value=Austin, timestamp=1329234388328000)

Elapsed time: 16 msec(s).

4.2. Get multiple columns by a Key

4.2.1. Sample Code

public QueryResult<ColumnSlice<Long,String>> execute() {

SliceQuery<String, Long, String> sliceQuery =

HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer);

sliceQuery.setColumnFamily("StateCity");

sliceQuery.setKey("TX Austin");

//way 1: set multiple columnNames

sliceQuery.setColumnNames(202L, 203L, 204L);

//way 2: use setRange

// change 'reversed' to true to get the columns in reverse order

//sliceQuery.setRange(202L, 204L, false, 5);

QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute();

return result;

}

4.2.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.2.3. CLI(TODO)

TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command?

4.3. Get multiple rows by a set of Key

4.3.1. Sample Code

public QueryResult<Rows<String,String,String>> execute() {

MultigetSliceQuery<String, String, String> multigetSlicesQuery =

HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

multigetSlicesQuery.setColumnFamily("Npanxx");

multigetSlicesQuery.setColumnNames("city","state","lat","lng");

multigetSlicesQuery.setKeys("512202","512203","512205","512206");

QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute();

return results;

}

4.3.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.3.3. CLI(TODO)

TODO: N/A?

4.4. Get Slices from a Range of Rows by Key

4.4.1. Sample Code

GetRangeSlicesForStateCity.java

public QueryResult<OrderedRows<String,String,String>> execute() {

RangeSlicesQuery<String, String, String> rangeSlicesQuery =

HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

rangeSlicesQuery.setColumnFamily("Npanxx");

rangeSlicesQuery.setColumnNames("city","state","lat","lng");

rangeSlicesQuery.setKeys("512202", "512205");

rangeSlicesQuery.setRowCount(5);

QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute();

return results;

}

Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so). The result can be referred at “Sample Code run by Maven”.

4.4.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))

})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.4.3. CLI(TODO)

TODO: N/A

4.5. Get Slices from a Range of Rows by Columns

4.5.1. Sample Code

GetSliceForAreaCodeCity.java

public QueryResult<ColumnSlice<String,String>> execute() {

SliceQuery<String, String, String> sliceQuery =

HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

sliceQuery.setColumnFamily("AreaCode");

sliceQuery.setKey("512");

// change the order argument to 'true' to get the last 2 columns in descending order

// gets the first 4 columns "between" Austin and Austin__204 according to comparator

sliceQuery.setRange("Austin", "Austin__204", false, 5);

QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();

return result;

}

4.5.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([

HColumn(Austin__202=30.27x097.74),

HColumn(Austin__203=30.27x097.74),

HColumn(Austin__204=30.32x097.73)

])

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.5.3. CLI

N/A

4.6. Get Slices from Indexed Columns

4.6.1. Sample Code

GetIndexedSlicesForCityState.java

public QueryResult<OrderedRows<String, String, String>> execute() {

IndexedSlicesQuery<String, String, String> indexedSlicesQuery =

HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

indexedSlicesQuery.setColumnFamily("Npanxx");

indexedSlicesQuery.setColumnNames("city","lat","lng");

indexedSlicesQuery.addEqualsExpression("state", "TX");

indexedSlicesQuery.addEqualsExpression("city", "Austin");

indexedSlicesQuery.addGteExpression("lat", "30.30");

QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();

return result;

}

4.6.2. Sample Code run by Maven

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({512204=Row(

512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.6.3. CLI

[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30';

-------------------

RowKey: 512204

=> (column=city, value=Austin, timestamp=1329299521508000)

=> (column=lat, value=30.32, timestamp=1329299521540000)

=> (column=lng, value=097.73, timestamp=1329299521555000)

=> (column=state, value=TX, timestamp=1329299521524000)

-------------------

RowKey: 512206

=> (column=city, value=Austin, timestamp=1329299521618000)

=> (column=lat, value=30.32, timestamp=1329299521633000)

=> (column=lng, value=097.73, timestamp=1329299522491000)

=> (column=state, value=TX, timestamp=1329299521618000)

-------------------

RowKey: 512205

=> (column=city, value=Austin, timestamp=1329299521555000)

=> (column=lat, value=30.32, timestamp=1329299521586000)

=> (column=lng, value=097.73, timestamp=1329299521602000)

=> (column=state, value=TX, timestamp=1329299521571000)

3 Rows Returned.

Elapsed time: 16 msec(s).

4.7. Insertion

4.7.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer));

mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));

MutationResult mr = mutator.execute();

return null;

}

4.7.2. Sample Code run by Maven

Omitted

4.7.3. CLI

[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34';

[default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34';

[default@Tutorial] set Npanxx['650222']['lat']='37.57';

…

4.8. Deletion

4.8.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

//Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer)

//columnName as null means to delete the whole row.

mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer);

mutator.addDeletion("650", "AreaCode", null, stringSerializer);

mutator.addDeletion("650222", "Npanxx", null, stringSerializer);

// adding a non-existent key like the following will cause the insertion of a tombstone

// mutator.addDeletion("652", "AreaCode", null, stringSerializer);

MutationResult mr = mutator.execute();

return null;

}

4.8.2. Sample Code run by Maven

Omitted…

4.8.3. CLI

[default@Tutorial] del StateCity['CA Burlingame'];

[default@Tutorial] del AreaCode['650'];

[default@Tutorial] del Npanxx['650222'];

Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.

[default@Tutorial] list StateCity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

2 Rows Returned.

Elapsed time: 16 msec(s).

As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.

Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!

Fortrunately, the command of ‘get’ won’t retrieve it back any more.

[default@Tutorial] get StateCity['CA Burlingame'];

Returned 0 results.

Elapsed time: 0 msec(s).

Go deeper? Please read on.

When will Cassandra remove these tombstones? As I know, two ways:

1. Wait until gc_grace_seconds is timeout (Not verified yet)

The gc_grace_seconds is set per CF and can be updated without a restart.

How to get gc_grace_seconds? Simply use CLI:

[default@Tutorial] show schema;

…

create column family StateCity

with column_type = 'Standard'

and comparator = 'LongType'

and default_validation_class = 'UTF8Type'

and key_validation_class = 'UTF8Type'

and rows_cached = 0.0

and row_cache_save_period = 0

and row_cache_keys_to_save = 2147483647

and keys_cached = 200000.0

and key_cache_save_period = 14400

and read_repair_chance = 1.0

and gc_grace = 864000 // 10 days, OMG

and min_compaction_threshold = 4

and max_compaction_threshold = 32

and replicate_on_write = true

and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'

and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';

…

2. The Compaction event (under investigation but no luck yet)

The Compaction will be triggered automatically.

But how to trigger compaction manually? Use nodetool as well.

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial

Starting NodeTool

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial

Starting NodeTool

Then we can see some logging messages in the Cassandra console.

But as I found, the tombstones are still here. (WHY???)

C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db

{

"4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]],

"54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]],

"616263": []

}

And still appears in the list command. (KAO, 阴魂不散? Big why???)

[default@Tutorial] list statecity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

-------------------

RowKey: abc

3 Rows Returned.

Elapsed time: 31 msec(s).

在这儿咱发几句牢骚：

1. 可能是学习深度还不足的原因，感觉CLI比较弱，适合初始化建模DDL和简单的数据分析；

2. Tombstone的清理问题还没有最终得到验证，暂时挂起，权当悬案先，以后有答案了再补充、更正

itstarting

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache Cassandra Learning Step by Step (3): Samples ABC

====16 Feb 2012, by Bright Zheng (IT进行时)====4. Samples ABCWetry to learn it step by step to understand the concepts and Java API usages bymeans of:1. Concept Introduction2. CLI3. Java Sa...
复制链接

扫一扫