Learning Emoji Representations from Observed Usage

Nowadays it is hard to imagine daily communication without emojis like 😄, 😬 or ❤️. These cute pictograms are not only ideal for expressing emotions, they are also standardized. This makes them ideal for analysis. However, before classical analyses such as clustering can be performed, a numerical representation is required. A simple method would be one-hot-encoding, which would be obvious considering the very limited vocabulary (there are only a few thousand emojis)....

November 5, 2019 · 3 min · Christian Hotz-Behofsits

Bigquery & Embeddings

One can argue if it is wise to store embeddings directly in bigquery or calculate the similarities in SQL. For sure, in some cases a library (e.g. gensim) or approximations (e.g. Facebook faiss) are more appropriate. However, in our setting we wanted to use BigQuery. Therefore, arrays are used to store the word vectors and I created SQL functions to calculate pairwise cosine similarities. .notice{padding:18px;line-height:24px;margin-bottom:24px;border-radius:4px;color:#444;background:#e7f2fa}.notice p:last-child{margin-bottom:0}.notice-title{margin:-18px -18px 12px;padding:4px 18px;border-radius:4px 4px 0 0;font-weight:700;color:#fff;background:#6ab0de}....

October 27, 2019 · 3 min · Christian Hotz-Behofsits

Recommender Systems in R

A former WU-member, Michael Hahsler, created a really nice package called recommenderlab, which allows you to build collaborative filtering systems. But before you can use it, you have to install all required packages: install.packages(c("recommenderlab", "dplyr", "readr")) … and load them: library(recommenderlab) library(tibble) library(dplyr) library(readr) I asked my students to answer some questions. One question was about their favourite TV series, which is a good example for a recommender system (Netflix for example does pretty much the same)....

October 26, 2019 · 4 min · Christian Hotz-Behofsits

Using Embeddings in R

There exist different file formats to store distributed vector or word representations also known as embeddings. However, one of the most convenient ways is to use the text format used by the original word2vec implementation. In this format, each row starts with a label (an item of the vocabulary) followed by the vector components. Furthermore, each field is separated by an ordinary space. The following function processes such *.vec-files and can be used to load them directly into R:...

August 8, 2019 · 3 min · Christian Hotz-Behofsits

Migrate Data from S3 to Google Cloud

I noticed that google cloud storage is a bit cheaper than Amazon S3. Although, it seems not much, it can be a lot if you have some big data sets or huge backups stored in Amazon`s storage solution. This tutorial show how you can migrate from/to Google Cloud Storage. But keep in mind that you will have to pay for the file transfer! You can do that locally or on your cloud shell....

October 2, 2017 · 2 min · Christian Hotz-Behofsits