WITH (Common Table Expression)

Syntax

WITH
    with_name AS (select_statement)
    [, ...]
select_statement;

Description

The WITH clause is used for defining named subqueries that can be used as a common expression in other SELECT statements. These common expressions are referred to as Common Table Expressions (CTEs), and present a temporary view of the data that is projected from it's select_statement. As a result, CTEs modularize queries, making them more maintenable and versatile than subqueries.

The result of a CTE is effectively a Relation that can be used just like any other Relation defined with a DDL or subquery, and by definition takes precedence over other Relations defined using a DDL.

Arguments

with_name

A name for the select_statement that defines the CTE.

select_statement

See SELECT.

Examples

Using CTEs from other CTEs

CTEs are just like any other Relations, and they can be used within the SELECT statement of other CTEs. In this example, c1 projects the viewtime of each pageid from pageviews, and c2 adds a processing time to the result of that before eventually projecting proc_time, pageid and viewtime in the main SELECT statement:

WITH
    c1 AS (SELECT pageid, viewtime FROM pageviews),
    c2 AS (SELECT NOW() AS proc_time, * FROM c1)
SELECT * FROM c2;

Joining a Stream CTE with a Changelog CTE

Each CTE represts a local Relation meaning that their grouping/aggregation or project reflects how they present the underlying data to the query that they're part of.

In the following example, c1 represents a Stream over pageviews where Page_6 has been visited by a user, and c2 represents a Changelog over users where its grouping the changes by the userid column. When joining these two CTEs, the JOIN operation treats this as a Stream-Changelog join, and doesn't require a WITHIN window for the join criteria:

WITH
    c1 AS (SELECT * FROM pageviews WHERE pageid = 'Page_6'),
    c2 AS (
        SELECT userid, count(interests) AS interest_count
        FROM users
        GROUP BY userid)
SELECT
    p.userid AS pvid,
    u.userid AS uid,
    p.pageid,
    u.interest_count AS interest_count
FROM
    c1 p
JOIN
    c2 u
ON u.userid = p.userid;

Self-joining CTEs

In this example, a single CTE is written to reshape the pageviews Stream, but used twice in the JOIN operation to self-join for the resulting exanded data. The result of the joined data can be used as projected by the CTE's SELECT statement, i.e. user ID as an Integer:

WITH
    c1 AS (
        SELECT
            viewtime,
            CAST(SUBSTRING(pageid FROM 6) AS INTEGER) AS pid,
            CAST(SUBSTRING(userid FROM 6) AS INTEGER) AS uid
        FROM pageviews)
SELECT
    pl.uid AS lid,
    pr.uid AS rid,
    pl.pid,
    pr.viewtime AS viewtime
FROM
    c1 pl
JOIN
    c1 pr
WITHIN 1 MINUTE
ON pr.uid = pl.uid
WHERE pl.uid != 5;

Create a new Stream from MATCH_RECOGNIZEd CTE

This example shows a real-world query pattern matching over bus trip updates (redefined with CTEs from our Analyzing NYC Bus Data blog). A local bus trip updates Relation is defined in the trip_updates CTE, which is then used in the MATCH_RECOGNIZE update to find each vehicles average time at each stop:

CREATE STREAM trips_delay_increasing
WITH
    trip_updates AS (
        SELECT
            trip,
            "stopTimeUpdate",
            vehicle,
            CAST(FROM_UNIXTIME("timestamp") AS TIMESTAMP) AS ts,
            "timestamp" AS epoch_secs,
            delay
        FROM
            nyc_bus_trip_updates)
AS SELECT
    trip,
    vehicle,
    CAST(
      FROM_UNIXTIME((start_epoch_secs + end_epoch_secs) / 2)
      AS TIMESTAMP
    ) AS avg_ts
FROM trip_updates
  MATCH_RECOGNIZE(
    PARTITION BY trip
    ORDER BY "ts"
    MEASURES
      C.row_timestamp AS row_timestamp,
      C.row_key AS row_key,
      C.row_metadata AS row_metadata,
      C.vehicle AS vehicle,
      A.epoch_secs AS start_epoch_secs,
      C.epoch_secs AS end_epoch_secs
    ONE ROW PER MATCH
    AFTER MATCH SKIP TO LAST C
    PATTERN (A B C)
    DEFINE
      A AS delay > 0,
      B AS delay > A.delay + 30,
      C AS delay > B.delay + 30
  ) AS MR WITH ('timestamp'='ts');

Last updated