Trial Quick Start

If you have signed up for DeltaStream’s free trial access, you can use this guide to build an end-to-end streaming application with the provided Kafka store in DeltaStream’s web application. To simplify getting started for trial users, we automatically create an organization, trial-organization, along with a default store with synthetic data.

Using the available store and synthetic data, you will accomplish the following steps in this guide:

  1. Inspect the data in the streaming store.

  2. Create your first Database.

  3. Create streams and changelogs for your Kafka topics.

  4. Create new streams, changelogs, and materialized views using DeltaStream’s continuous queries.

1. Inspect a Trial Store

To simplify getting started for trial users, every user will have a predefined store, trial_store, available for them once they sign in. The predefined store is an AWS MSK cluster that has several topics with synthetic data producers that continuously publish messages into these topics. Once you sign into the trial account, you should see the trial store in the Stores page. The following image shows what you will see once you click on trial_store.

You can inspect the data in each topic by clicking on the topic. This will take you to the page where you can click on the play button in the lower right corner of the page to see the live stream of data flowing to the topic. Here is an image of the content for the pageviews topic.

Once you confirm the store connectivity and can inspect the content of the topics, we can start declaring databases and relations and write queries on the streaming data.

2. Create a Database

DeltaStream provides a relational model on top of your streaming data. Similar to other relational systems, DeltaStream uses databases and schemas for namespacing and organizing your data. To create a new database, go to the Catalog page from the main menu and click the + button. Enter the database name, and click SAVE to create your first database.

For this guide, we named our database TestDB. Note that you can create as many databases as you need. Once you create a database, it will also have a schema named public, but you can add more schemas if you wish.

3. Create Streams and Changelogs

We will now create relations on top of our Kafka topics using DeltaStream’s DDL statements. If you want to treat your streaming data in a topic as an append only stream where each event is an independent event in your stream, you define it as a stream. In this guide, we declare a stream on the pageviews topic since each pageview event is an independent event. To do so, you can go to SQL page by choosing SQL from the main menu and write the DDL statement there. Here is the statement to create a pageviews stream.

CREATE STREAM pageviews (
    viewtime BIGINT, 
    userid VARCHAR, 
    pageid VARCHAR
)WITH (
    'topic'='pageviews', 
    'value.format'='JSON'
);

Note that the above stream will be created in the currently used database and schema, which are TestDB and public, respectively. Also, since there is no store specified in the WITH clause, DeltaStream will use the default store that we declared above as the store where the pageviews topic is stored in.

We then declare a changelog for the users topic. A changelog indicates that we want to interpret events in a topic as UPSERT events; therefore, the events should have a primary key, and each event will be interpreted as an insert or update for the given primary key. Use the following statement in the SQL page to declare the users changelog.

CREATE CHANGELOG users_log (
    registertime BIGINT, 
    userid VARCHAR, 
    regionid VARCHAR, 
    gender VARCHAR, 
    interests ARRAY<VARCHAR>, 
    contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR>, 
    PRIMARY KEY(userid)
)WITH (
    'topic'='users', 
    'key.format'='json', 
    'key.type'='STRUCT<userid VARCHAR>', 
    'value.format'='json'
);

Once you declare the pageviews stream and users changelog you should be able to see them in the public schema of the TestDB database by navigating to the Databases page from the main menu.

4. Run Queries

Once you declared streams and changelogs, you can write continuous queries in SQL to process this streaming data in real time. Let’s start with interactive queries, where the result of such queries are streamed back to the user. Such queries can be used to inspect your streams and changelogs or build queries iteratively by inspecting the query result. As an example, let’s inspect the pageviews stream using the following interactive query:

SELECT * FROM pageviews;

Once you run this query, DeltaStream compiles it into a streaming job and runs it and streams the result into the web app. You should see something like this in your screen:

While interactive query results are streamed to the user, DeltaStream provides persistent queries that are continuous queries where the query results are stored back in a Store or Materialized View. As the first step, let’s write a persistent query that joins the pageviews Stream with the users_log Changelog to create an enriched pageviews Stream that includes user details for each pageview event. We will also convert the epoch time to the timestamp with a timezone using the TO_TIMESTAMP_LTZ function.

CREATE STREAM enriched_pv 
AS SELECT
    TO_TIMESTAMP_LTZ(viewtime, 3) AS viewtime,  
    p.userid AS userid, 
    pageid, 
    TO_TIMESTAMP_LTZ(registertime, 3) AS registertime, 
    regionid, 
    gender, 
    interests, 
    contactinfo
FROM pageviews p
JOIN users_log u ON u.userid = p.userid;

Under the covers, the persistent query above will create a new topic in the trial store. When creating a new topic in the trial store, a topic prefix name will be added to the name of the topic created. The prefix will be based on your trial email and some unique random characters. For example, the email test@gmail.com will result in a topic prefix like t_testgmailcom_4evmsyg_ and creating the topic enriched_pv will create the topic t_testgmailcom_4evmsyg_enriched_pv, which can be viewed in the trial store topics list.

Note: This is a requirement only for the trial store we have set; prefixes will not be added if you are using any other store such as your own Apache Kafka or AWS Kinesis.

Once you write the above query in the SQL page and click RUN, you should see the confirmation of query success like the following:

Under the hood, DeltaStream will compile and launch the query as an Apache Flink streaming job. You should be able to see the query along with its status in the Query Management page by selecting Queries in the main menu. Once the query successfully runs, you will have a new Kafka topic named enriched_pv in your Kafka cluster, and a new stream will also be added to the streams in your database, TestDB. You can examine the contents of the new stream by running the following query from the SQL page in the web app:

SELECT * FROM enriched_pv;

The following image shows the result of running the above interactive continuous query. Note that the result of the interactive query is streamed to the client as shown below:

Now that we have the enriched pageviews stream, let’s build a materialized view where we compute the number of pageviews per user. Type the following statement in the SQL page to stream this materialized view.

CREATE MATERIALIZED VIEW user_view_count
AS SELECT
    userid, 
    COUNT(*) AS view_count 
FROM enriched_pv 
GROUP BY userid;

Once you run the above query, DeltaStream will launch a streaming job that runs the SELECT statement and materializes the result of the query. The resulting materialized view then can be queried the same way you would query a materialized view in traditional relational databases; however, in DeltaStream, the data in the materialized view is always kept fresh by the streaming job. The following is a simple query to get the current view count for a user with the userid of User_2.

SELECT * FROM user_view_count WHERE userid = 'User_2';

The result of the above query will be one row as shown below:

As you see, the number of pageviews for User_2 is 3 at the time of running the above query. Now run the query again. You should see an updated result for the pageview count for the user, which indicates that every time you run a query on a Materialized View, you will get the most up-to-date result. DeltaStream will make sure the data in the view is continuously updated using the continuous query that declared the materialized view. Here is an image that shows running the same query on the materialized view a few seconds later.

As you see, the result is updated to 11 from the previous value of 3.

You can see all of the relations such as streams, changelogs, and materialized views along with queries and their relationship with each other in the Stream 360 section of the DeltaStream web app. As you can see in the image below, we have a query that joins pageviews and users_log and creates a new stream called enriched_pv. We also have another query that creates a materialized view named user_view_count from enriched_pv.

Clean Up

Now that we have seen a basic use case, let’s clean up our environment. To do so we first terminate the queries and then drop the created streams, changelogs, and materialized views. To terminate your queries, go to the Queries page. Then you can go to the corresponding Database and Schema to drop the streams, changelogs, and materialized views. Note that if there is a query that uses a stream, changelog, or materialized view, you must terminate the query before dropping the relation.

Last updated