Database

Databases are the foundation for organizing data in DeltaStream and provide the building block of its namespacing model. Users can create Databases for logical groupings for different teams or projects. For instance, you can create a Database for a logging project and another for an ads team.

Schema

A schema is a logical grouping of relational objects such as Streams, Changelogs, Materialized Views, and Tables. As mentioned above, schemas are grouped in a Database. A combination of Databases and schemas enable users to organize their Streams, Changelogs, and other Database objects in a hierarchical fashion in DeltaStream. Such hierarchies also are one of the bases for providing role-based access Control (RBAC) in DeltaStream the same way as other relational databases.

Relation

DeltaStream provides a relational model for streaming data where data is stored in Relations. The following Relation types are supported in DeltaStream:

  • Stream

  • Changelog

  • Materialized View

  • Table

In DeltaStream, these Relations are building blocks of user applications and pipelines. Relation names can be specified as fully or partially qualified names via specifying a Database and/or #_schema names in the format of [<database_name>.<schema_name>.]<relation_name>, e.g. db1.public.pageviews. Otherwise, the current Database and schema in the scope of a client is used to identify a Relation.

Stream

A Stream is a sequence of immutable, partitioned, and partially ordered events (we use events and records synonymously). A Stream is a relational representation of data in streaming stores, such as the data in a Kafka topic or a Kinesis stream. The records in a Stream are independent of each other, meaning there is no correlation between two records in a Stream. A Stream declares the schema of the records, which includes the column name along with the column type and optional constraints.

Changelog

Similar to a stream, a Changelog is a sequence of partitioned and partially ordered events (we use events and records synonymously). Similar to a Stream, a Changelog is a relational representation of data in the streaming stores, such as the data in a Kafka topic or a Kinesis stream. A changelog defines a PRIMARY KEY for records that is used to represent the change over time for records with the same primary key. Records in a Changelog correlate with each other based on the PRIMARY KEY. This means that a record in a Changelog either is an insert record if it’s the first time the record with the given PRIMARY KEY is appended to the Changelog or upsert records if a previous record with the same PRIMARY KEY has been inserted into the Changelog.

Materialized View

A Materialized View creates a snapshot of a streaming query result and continuously updates the snapshot as records arrive to the query input(s). A Materialized View is quaryable in DeltaStream, meaning that you can query the Materialized View and the result of such a query will be computed using the data in the snapshot at the time of the query. Note that queries on a Materialized View are not streaming queries, and they are the same as the queries on tables and Materialized Views in traditional relational Databases.

Table

A Table is similar to a Materialized View in a way that it stores records from a streaming source, but unlike Materialized Views it does not support upsert. In other words, all records from a source or an upstream query operation, e.g. JOIN or aggregation, is stored as a sequence of records as they were provided by the records provider for the sink that writes to the Table. When used with records that have a primary key, e.g. #_changelog, the resulting rows in the Table represent the incremental changes to each record key.

Row Key

Each record in a Stream or Changelog can have a row key. Row key definition by the user is optional for a Relation. The value of a key, for a given record, is extracted from its corresponding message, which is read from the source Relation’s Topic. For example, if a Kafka topic is used as the Relation’s Topic, Kafka messages’ key bytes are used to assign row key values to the relation’s records, based on the relation’s row key definition (if any). Some operations such as GROUP BY and JOIN impact the row key definition and add row keys to their results’ records. When writing query results to a sink, the records’ keys are written as the messages’ keys into the sink relation’s Topic. For example, when the result of a join query is written into a Kafka topic, the row keys of the result records are set as Kafka messages’ keys. Check Row Key Definition for further information.

Last updated