Row Key Definition
Define a Row Key in a DDL statement
The user can define the row key for a relation by adding certain properties to the relation’s DDL statement in the WITH
clause:
key.format
: This is the data format for the row key (i.e. how key bytes are serialized) in the messages coming from the relation’s topic. Supported formats for a row key are:primitive
,json
,protobuf
, andavro
.key.type
: It defines the structure of the row key’s fields:For a row key with a
primitive
format, thekey.type
defines the data type of key values. Valid data types for a primitive row key are:SMALLINT
,INTEGER
,BIGINT
,DOUBLE
, andVARCHAR
.For a row key in the
json
,avro
, orprotobuf
format, thekey.type
is astruct
whose fields define the names and data types of the key’s fields.
Examples
The following DDL statement defines a stream that has a primitive
row key of the VARCHAR
type:
The following DDL statement defines a stream that has a row key in the json
format. The row key in each record can be accessed via the rowkey
function, and it is a struct
that has a field named id
of the INTEGER
data type:
The following DDL statement defines a stream that has a row key in the protobuf
format. The row key in each record has two fields: an INTEGER
field named id
and a VARCHAR
field named region
. A protobuf descriptor for the row key is defined with the name customer_key_msg
within the descriptor source of the relation’s topic.
Row Key in CREATE AS SELECT
Statements
CREATE AS SELECT
StatementsCAS with Simple SELECT
SELECT
In a CAS statement with a SELECT clause consisting of only projection and filter operators, using SELECT
and WHERE
clauses, respectively, the row key definition for the new relation will be the same as the row key definition (if any) of the source.
As an example, the below CSAS statement creates a new pagevisits
stream by filtering some records from the pageviews
stream, which we defined above (see Examples). Given that pageviews
records have row keys with the primitive format,pagevisits
records will have row keys with a similar definition:
When running CAS, the key.format
for the new relation (sink) can be specified in the sink’s WITH
clause. For example, the below CSAS statement is similar to the above one; However, it changes the row key format to json
for the new pagevisits
stream, while the source relation pageviews
has its row key with the primitive
format:
CCAS with GROUP BY
GROUP BY
If the SELECT clause in a CREATE CHANGELOG AS SELECT statement has a GROUP BY
clause, the row key for the new relation will consist of columns in the GROUP BY
clause.
The key.format
for the new relation will be the same as a source relation’s key.format
. Similar to the example above, the key.format
can be changed using the sink’s WITH
clause.
For example, the below query runs grouping and aggregation on pageviews
using a tumbling window to create a new changelog visits_rate
. The row key for visits_rate
will have three fields: window_start
, window_end
, and userid
as they are the columns referred in the GROUP BY
clause:
The below query creates a new changelog region_stats
by running a grouping and aggregation on the source stream users
. Each record in region_stats
will have a row key with a single field named location
:
CSAS with JOIN
JOIN
If the SELECT clause in a CREATE STREAM AS SELECT statement has a JOIN
clause, the row key for the new relation will consist of the column from the left relation in the JOIN
criteria.
The key.format
for the new relation will be the same as the left source relation’s key.format
. Similar to the example above, the key.format
can be changed using the sink’s WITH
clause.
For example, the below query runs an interval join on two streams, pageviews
and users
, to create a new stream named pvusers
. Each record in pvusers
will have a row key with one field, userid
, as userid
is the join column in pageviews
, which is left-side of the join, referred in the join criteria: p.userid = u.userid
:
Last updated