CREATE STREAM AS SELECT

Syntax

CREATE STREAM
    stream_name
[WITH (stream_parameter = value [, ... ])] 
AS select_statement
[PARTITION BY partition_by_clause];

Description

CREATE STREAM AS is essentially a combination of two statements:

  • An INSERT INTO statement that runs a SELECT statement and adds the results into the newly created Stream.

Arguments

stream_name

This specifies the name of the new Stream. Optionally, use <database_name>.<schema_name> as the prefix to the name to create the Relation in that scope. For case-sensitive names, the name must be wrapped in double quotes; otherwise, the lowercase name will be used.

WITH (<stream_parameter> = <value> [, …​ ])

Optionally, this clause specifies #_stream_parameters.

select_statement

This statement specifies the SELECT statement to run.

PARTITION BY partition_by_clause

Optionally, this clause allows the user to set the partition key of records according to their values for a given set of columns. The PARTITION BY clause in the statement defines a list of one or more columns (separated by commas) as partitioning columns. By default, the key for the sink's records will have a data format equal to the sink's value data format. To set a specific key format, Set the key.format Stream parameter to specify a different key format. PARTITION BY is supported for CREATE STREAM AS SELECT and INSERT INTO queries where the sink is a Stream. Currently, PARTITION BY only applies for queries whose sink Stream is backed by a Kafka store.

Stream Parameters

Kafka Specific Parameters

Kinesis Specific Parameters

Examples

Create a copy Stream

The following creates a replica of the source Stream.

CREATE STREAM pageviews_copy AS SELECT * FROM pageviews;

Create a Stream in a specific Schema within default Database

The following creates a replica of the source Stream, but the new Relation belongs to the Schema named schema2 in the session’s current Database.

CREATE STREAM schema2.pageviews_copy AS SELECT * FROM pageviews;

Create Stream in specific Schema and Database

The following creates a replica of the source Stream, but the new Relation belongs to the Schema named schema2 in the Database named db.

CREATE STREAM db.schema2.pageviews_copy AS SELECT * FROM pageviews;

Create a case-sensitive Stream

The following creates a replica of the source Stream. The new sink Relation has a case-sensitive name.

CREATE STREAM "PageViews" AS SELECT * FROM pageviews;

Create a case-sensitive Stream in a case-sensitive Schema and Database

The following creates a replica of the source Stream. The new sink Relation has a case-sensitive name and is in a case-sensitive Database and Schema.

CREATE STREAM "DataBase"."Schema"."PageViews :)" AS SELECT * FROM pageviews;

Create a new Stream that is backed by a specific Topic

The following creates a replica of the source Stream, but this new Stream is associated with the specified Topic called pageviewstwo.

CREATE STREAM
  pageviews2
  WITH ('topic' = 'pageviewstwo')
AS SELECT * FROM pageviews;

Copy data from one Store to another

The following moves data from a Kafka Store to a Kinesis Store. The query creates a replica of the source Stream, but this new Stream is associated with the specified Store called kinesis_store.

CREATE STREAM pageviews_kinesis
  WITH ('store' = 'kinesis_store') AS 
SELECT * FROM pageviews_kafka;

Convert data from JSON to Avro with a Kafka Store

The following creates a replica of the source Stream that has a data format of JSON, but the new sink Stream has a data format of Avro for its value and key.

CREATE STREAM
  pageviews_avro
  WITH ('value.format' = 'avro', 'key.format' = 'AVRO')
AS SELECT * FROM pageviews_json;

Convert data from JSON to Avro with a Kinesis Store

The following creates a replica of the source Stream that has a data format of JSON, but the new sink Stream has a data format of Avro for its value. Since the sink is a Kinesis stream, there is no key associated with the record, and so the value.format property is the only one that is necessary.

CREATE STREAM
  pageviews_avro
  WITH ('store' = 'kinesis_store', 'value.format' = 'avro')
AS SELECT * FROM pageviews_json;

Simple projection to a Kafka topic with a specific number of partitions and replicas

The following is a simple projection query where the sink Kafka topic has a specific number of partitions and replicas set.

CREATE STREAM
  pageviews2
  WITH ('topic.partitions' = '5', 'topic.replicas' = '3')
AS SELECT
  viewtime AS vtime,
  pageid AS pid
FROM pageviews;

Simple projection to a Kinesis stream with a specific number of shards

The following is a simple projection query where the sink Kinesis stream has a specific number of shards set.

CREATE STREAM pageviews2
  WITH ('topic.shards' = '4') AS 
SELECT
  viewtime AS vtime,
  pageid AS pid
FROM pageviews;

Create a Stream using an interval join

Interval joins between two Streams result in a STREAM sink Relation type.

CREATE STREAM pageviews_enriched AS
SELECT 
  p.userid AS pvid, 
  u.userid AS uid, 
  u.gender, 
  p.pageid, 
  u.interests[1] AS top_interest 
FROM 
  pageviews p JOIN users u WITHIN 5 minutes 
  ON u.userid = p.userid;

Create a Stream using a temporal join

A temporal join of two Relations where the left join side source is a Stream and the right join side source is a Changelog results in a STREAM output Relation type. In the example below, a new Stream called users_visits is created by performing a temporal join between the pageviews Stream and the users_log Changelog.

CREATE STREAM users_visits AS 
SELECT 
  p.userid AS pvid, 
  u.userid AS uid, 
  u.gender, 
  p.pageid, 
  u.interests[1] AS top_interest 
FROM 
  pageviews p JOIN users_log u 
  ON u.userid = p.userid;

Create a Stream with specifying the timestamp column

The below statement creates a new Stream, called pagestats, from the already existing Stream pageviews. The timestamp Stream parameter, specified in the WITH clause, is used to mark the viewtime column in pagestats as the timestamp column. Therefore, any subsequent query that refers to pagestats in its FROM clause will use this column for time based operations.

CREATE STREAM pagestats 
  WITH ('timestamp'='viewtime') AS
SELECT viewtime, pageid 
FROM pageviews;

Create a Stream with specifying the Kafka delivery guarantee

The below statement creates a new Stream, called pageviews_exactly_once, from the already existing Stream pageviews. The delivery.guarantee Stream parameter, specified in the WITH clause, is used to override the default delivery.guarantee of at_least_once to exactly_once. A user may want to use this configuration if their use case can tolerate higher latencies but cannot tolerate duplicate outputs.

CREATE STREAM pageviews_exactly_once 
  WITH ('delivery.guarantee'='exactly_once') AS
SELECT *
FROM pageviews;

Create a Stream with the PARTITION BY clause

The below statement creates a new Stream, called pageviews_partition_by, from the already existing Stream pageviews. The PARTITION BY clause is sets the key type for the output pageviews_partition_by Stream. Notice in this example the source Stream's records don't set a key value and the sink Stream has the PARTITION BY values as key. The sink Stream's key data format is JSON in this example because it inherits the sink's value data format by default.

CREATE STREAM pageviews_partition_by AS
SELECT viewtime, userid AS `UID`, pageid 
FROM pageviews
PARTITION BY "UID", pageID;

Given this input for pageviews:

KEY     VALUE
{}	{"viewtime":1690327704650, "userid":"User_9", "pageid":"Page_11"}
{}	{"viewtime":1690327705651, "userid":"User_6", "pageid":"Page_94"}

We can expect the following output in pageviews_partition_by:

KEY                                     VALUE
{"UID":"User_9", "pageid":"Page_11"}	{"viewtime":1690327704650, "UID":"User_9", "pageid":"Page_11"}
{"UID":"User_6", "pageid":"Page_94"}	{"viewtime":1690327705651, "UID":"User_6", "pageid":"Page_94"}

Create a Stream with the PARTITION BY clause to override existing key

The below statement creates a new Stream, called pageviews_partition_by, from the already existing Stream pageviews. The PARTITION BY clause sets the key type for the output pageviews_partition_by Stream. Further, this query also sets the key.format property for the sink Stream to be PRIMITIVE. Notice in this example the source Stream's records have the pageid column value set as the key in JSON format and the output Stream has the PARTITION BY value as key in the PRIMITIVE format.

CREATE STREAM pageviews_partition_by
WITH ('key.format'='PRIMITIVE') AS
SELECT viewtime, userid AS `UID`, pageid 
FROM pageviews
PARTITION BY "UID";

Given this input for pageviews:

KEY                     VALUE
{"pageid":"Page_11"}	{"viewtime":1690327704650, "userid":"User_9", "pageid":"Page_11"}
{"pageid":"Page_94"}	{"viewtime":1690327705651, "userid":"User_6", "pageid":"Page_94"}

We can expect the following output in pageviews_partition_by:

KEY             VALUE
"User_9"	{"viewtime":1690327704650, "UID":"User_9", "pageid":"Page_11"}
"User_6"	{"viewtime":1690327705651, "UID":"User_6", "pageid":"Page_94"}

Last updated