A former WU-member, Michael Hahsler, created a really nice package called recommenderlab, which allows you to build collaborative filtering systems.

Wrangling data can be a real pain, especially if you work with unstructured or semi-structured data. I had to convert a JSON object into an dataframe in R, which should be easy using jsonlite, but it was not - So I used tidyjson and I really like it.

I noticed that google cloud storage is a bit cheaper than Amazon S3. Although, it seems not much, it can be a lot if you have some big data sets or huge backups stored in Amazon`s storage solution. This tutorial show how you can migrate from/to Google Cloud Storage. But keep in mind that you will have to pay for the file transfair!

I am using S3 for my database backups, cause it is cheap and it worked quite well, but the database got bigger and bigger. So I tried to optimize the process, with success.

MySQL/MariaDB are popular Databases, which are often used for data scraping, because they can handle a lot of INSERTs in a short time. In this post I will show you how to add a new user and set the permissions to access a database.

PiewDiePie announced to make a YouTube pause just a few moments ago, but he is still one of the best known human brands on YouTube. I tried to visualise the sentiment development of his video comments.

Apache Drill is a really easy to use dremel-based big-data analysis tool. So it's perfect if you have a lot of static data (read-only workloads) and want to use SQL. And the best of all: it does not require Hadoop/HDFS :)

Just Another Gibbs Sampler is a really nice piece of software, but if you want to calculate long markov chains it takes some time. I tried to optimise this process by using different versions, including some self-compiled binaries.

I have to monitor some crontabs and they support email notification on errors. I don't want to manage and maintain another mailserver, therefore I installed postfix and configured it as relay. So the system is able to send mails, but another mailserver is used. The new postfix setup accepts only local SMTP connections and port 25 is not accessible from the outside.

Some files include a byte order mark (BOM), which can be quite annoying. In this post I explain how to remove it.

I was not able to install the R-package FSelector, because a dependency could not be installed properly.

Some great people from the statmath institute created a wu-style latex-presentation theme. In this post I will explain how to use and extend it.

This tutorial shows how to setup an open proxy in ubuntu. I am using such a setup for scraping purposes and it works quite well.