Style Selector
Layout Style
Boxed Background Patterns
Boxed Background Images
Color Scheme

Working with Materialized View tables in ClickHouse

There must be something about January which makes John prod me into a blog post about something I’ve just teased out. So here we are, it’s 2020, it’s January, and what is fast (OK, not so fast) becoming an annual tradition.

Today’s post is a selection on snippets on Materialized Views. Materialized Views, if you haven’t met them, are tables automatically populated when data is inserted into some other table.

Materialized View inner tables

There are two ways to create a table populated from a Materialized View. This first, which we’ve used up to now, is like this:

CREATE MATERIALIZED VIEW QueriesPerSecond
(
    Date Date,
    DateTime DateTime,
    NodeID UInt16,
    QueryCount UInt32,
    QueryDOCount UInt32,
)
ENGINE = SummingMergeTree()
PARTITION BY toYearWeek(Date)
ORDER BY (Date, DateTime, NodeID)
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

When records are added to table QueryResponse, the SELECT runs and the results added to QueriesPerSecond.

Well, almost. If you SHOW TABLES, you’ll see there’s now QueriesPerSecond and another table named .inner.QueriesPerSecond. The latter table is where the records actually end up.

How to refer to that inner table, a digression…

You need a little trick to refer to the inner table from SQL. You’ll need to enclose the table name in back-ticks. This also means that you can’t refer to inner tables in a different database to the current.

So to verify that the inner table is indeed the table containing the data for the view, SELECT * FROM `.inner.QueriesPerSecond` LIMIT 10.

… back to Materialized Views

All this is fine and dandy. But what if you want to, say, rename a column in QueryResponse? How do you change the SELECT associated with the Materialized View? Well, you can drop and re-create the Materialized View, but if you drop the view then the inner table also gets dropped and you lose all the existing data. What you need to do instead is to find out where the definition is saved on the server:

:) select metadata_path from system.tables where name='QueriesPerSecond'
+-metadata_path--------------------------------------------+
| /var/lib/clickhouse/metadata/wombat/QueriesPerSecond.sql |
+----------------------------------------------------------+
:) detach table QueriesPerSecond

Head to the server, edit that .sql file to your taste, and reattach the table:

:) attach table QueriesPerSecond

So, you can do it, but it’s not great. You can’t do it from SQL, and you need to log into the server and directly edit files in the ClickHouse working space.

Materialized View without inner tables

Here’s what we’ll be doing from now on. It turns out you can create a normal table to hold the results of the materialized view updates:

CREATE TABLE QueriesPerSecond
(
    Date Date,
    DateTime DateTime,
    NodeID UInt16,
    QueryCount UInt32,
    QueryDOCount UInt32,
)
ENGINE = SummingMergeTree()
PARTITION BY toYearWeek(Date)
ORDER BY (Date, DateTime, NodeID);

CREATE MATERIALIZED VIEW QueriesPerSecondMV
TO QueriesPerSecond
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

To change the view SELECT, drop QueriesPerSecondMV and re-create it. Unlike the materialized view with the inner table we saw earlier, this won’t delete the underlying table. So now we can modify the materialized view query from SQL, rather than having to monkey with files on the server.

Convert from inner table Materialized View to a separate table Materialized View

So, can you convert from the inner table to a separate table Materialized View? Yes.

:) DETACH TABLE QueriesPerSecond
:) RENAME TABLE `.inner.QueriesPerSecond` TO QueriesPerSecond
:) CREATE MATERIALIZED VIEW QueriesPerSecondMV
TO QueriesPerSecond
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

Renaming the old inner table to the previous name of the view plus table overwrites any old definition of that view, and we have a pure data table. Create a new materialized view to populate it, and we’re done.