Store

Stores are where user streaming data resides. Apache Kafka and Amazon Kinesis are two types of such stores. DeltaStream reads data from streaming stores, performs the desired computation, and writes the results of the computation to the same store or another. Stores are owned and managed by users, and in order to access the data in a store, the user configures connectivity and access to the store. For instance, if a user has an Apache Kafka cluster provided by Confluent Cloud, they can declare a Store in DeltaStream by setting up the connectivity and access. Once the Store is defined, DeltaStream can read from Topics in the Kafka cluster and write into Topics in the Kafka cluster. Currently, DeltaStream support Apache Kafka(AWS MSK, Confluent Cloud, RedPanda,...), AWS Kinesis, PostgresSQL(only as source for CDC streams), Snowflake and Databricks(only as sink for CTAS queries).

Topic

In DeltaStream, a Topic is where the streaming data in a DeltaStream Store is stored. The DeltaStream Topic is simply an interface around the event organization layer for the physical streaming stores. For Apache Kafka type Stores, a DeltaStream Topic corresponds with a Kafka topic, and for AWS Kinesis type Stores, a DeltaStream Topic corresponds with a Kinesis stream. Topics are used to store the data backing Streams and Changelogs. You can create, delete, and view the content of Topics in DeltaStream.

Schema Registry

A schema registry is a centralized repository for managing and validating schemas for data in Apache Kafka topics. In DeltaStream, schema registry is used to represent a schema registry service for Apache Kafka clusters. For instance, if you are using Confluent Cloud with schema registry service, you can define a schema registry in DeltaStream that represents the Confluent Cloud's schema registry service and assign it to the stores that use that service. DeltaStream will use the corresponding schema registry to fetch the topic schemas to deserialize topic content.

Entity

For non streaming stores such as postgreSQL, Snowflake and Databricks, DataStream uses ENTITY to represent the tables in such stores. Similar to the concept of topics in streaming stores, entities are used to refer, inspect, add or delete tables in postgreSQL, Snowflake and Databricks.

Last updated