Bigquery: Least but not NULL

Let’s start with a simple SQL statement that returns the smallest value from each of several columns (similar to min only not across observations, but columns): SELECT LEAST(date1, date2, date3) as min_date FROM tbl In BigQuery it works nicely till you encounter NULL values, then it will allways pick the NULL. Unfortunately it does not provide a IGNORE NULLS option such as ARRAY_AGG or others do. However, similar behaviour can be archieved with the following UDF:...

January 14, 2021 · 1 min · Christian Hotz-Behofsits

Merge large CSV files with headers

If you have ever worked with classic data warehousing tools, you may already know the problem: The CSV export splits the output into different files, often with one file containing no more than 1 million observations. However, for analysis purposes, this format is not always optimal and must be merged into a single file. In bash this is not a big problem with cat, but only if there is no header present....

January 9, 2021 · 1 min · Christian Hotz-Behofsits

Interpolate missing values in SQL

In this I demonstrate how missing values can be interpolated in SQL. The code is written for BigQuery and I provide a user defined function, however, similar solutions may exist for other databases. Although, SQL may not be the first choice to do such tasks, it is possible. Before one can replace missing values, you have to have missing values. At least in my applications, often the whole rows are missing and not just selective fields....

May 24, 2020 · 2 min · Christian Hotz-Behofsits

Bigquery & Embeddings

One can argue if it is wise to store embeddings directly in bigquery or calculate the similarities in SQL. For sure, in some cases a library (e.g. gensim) or approximations (e.g. Facebook faiss) are more appropriate. However, in our setting we wanted to use BigQuery. Therefore, arrays are used to store the word vectors and I created SQL functions to calculate pairwise cosine similarities. .notice{padding:18px;line-height:24px;margin-bottom:24px;border-radius:4px;color:#444;background:#e7f2fa}.notice p:last-child{margin-bottom:0}.notice-title{margin:-18px -18px 12px;padding:4px 18px;border-radius:4px 4px 0 0;font-weight:700;color:#fff;background:#6ab0de}....

October 27, 2019 · 3 min · Christian Hotz-Behofsits