SQL Multi-Substring Search in Yandex Clickhouse

Nowadays, also analytical databases are more frequently used to search for substrings or patterns in character strings. Thus, most databases provide a like operator or more sophisticated regex functions. However, especially long regex patterns or many of them can be challenging. In a recent project, I had to search for not one, but for 150 substrings at the same time. Thankfully, my database of choice (Yandex Clickhouse) offers several possibilities to handle this:...

January 16, 2021 · 3 min · Christian Hotz-Behofsits

Bigquery: Least but not NULL

Let’s start with a simple SQL statement that returns the smallest value from each of several columns (similar to min only not across observations, but columns): SELECT LEAST(date1, date2, date3) as min_date FROM tbl In BigQuery it works nicely till you encounter NULL values, then it will allways pick the NULL. Unfortunately it does not provide a IGNORE NULLS option such as ARRAY_AGG or others do. However, similar behaviour can be archieved with the following UDF:...

January 14, 2021 · 1 min · Christian Hotz-Behofsits

Interpolate missing values in SQL

In this I demonstrate how missing values can be interpolated in SQL. The code is written for BigQuery and I provide a user defined function, however, similar solutions may exist for other databases. Although, SQL may not be the first choice to do such tasks, it is possible. Before one can replace missing values, you have to have missing values. At least in my applications, often the whole rows are missing and not just selective fields....

May 24, 2020 · 2 min · Christian Hotz-Behofsits

Bigquery & Embeddings

One can argue if it is wise to store embeddings directly in bigquery or calculate the similarities in SQL. For sure, in some cases a library (e.g. gensim) or approximations (e.g. Facebook faiss) are more appropriate. However, in our setting we wanted to use BigQuery. Therefore, arrays are used to store the word vectors and I created SQL functions to calculate pairwise cosine similarities. .notice{padding:18px;line-height:24px;margin-bottom:24px;border-radius:4px;color:#444;background:#e7f2fa}.notice p:last-child{margin-bottom:0}.notice-title{margin:-18px -18px 12px;padding:4px 18px;border-radius:4px 4px 0 0;font-weight:700;color:#fff;background:#6ab0de}....

October 27, 2019 · 3 min · Christian Hotz-Behofsits