If you have to analyse a lot of data you need a good backend to fetch and store your results piecewise in small chunks try MonetDB(Lite). I often used CSV or TSV files, because it is convenient. But from time to time you have to do some filtering, joins or other data-operations. So I looked for a nice database and MonetDB seems to have some nice properties...

Sadly, installing Julia is not always simple. Although, there is a precompiled Julia package in the repositories, it is a badly outdated version. In this "recipe", I show how to install the latest Julia version on Debian Jessie.

A former WU-member, Michael Hahsler, created a really nice package called recommenderlab, which allows you to build collaborative filtering systems.

Wrangling data can be a real pain, especially if you work with unstructured or semi-structured data. I had to convert a JSON object into an dataframe in R, which should be easy using jsonlite, but it was not - So I used tidyjson and I really like it.

I noticed that google cloud storage is a bit cheaper than Amazon S3. Although, it seems not much, it can be a lot if you have some big data sets or huge backups stored in Amazon`s storage solution. This tutorial show how you can migrate from/to Google Cloud Storage. But keep in mind that you will have to pay for the file transfair!

I am using S3 for my database backups, cause it is cheap and it worked quite well, but the database got bigger and bigger. So I tried to optimize the process, with success.

MySQL/MariaDB are popular Databases, which are often used for data scraping, because they can handle a lot of INSERTs in a short time. In this post I will show you how to add a new user and set the permissions to access a database.

PiewDiePie announced to make a YouTube pause just a few moments ago, but he is still one of the best known human brands on YouTube. I tried to visualise the sentiment development of his video comments.

Apache Drill is a really easy to use dremel-based big-data analysis tool. So it's perfect if you have a lot of static data (read-only workloads) and want to use SQL. And the best of all: it does not require Hadoop/HDFS :)

Just Another Gibbs Sampler is a really nice piece of software, but if you want to calculate long markov chains it takes some time. I tried to optimise this process by using different versions, including some self-compiled binaries.

I have to monitor some crontabs and they support email notification on errors. I don't want to manage and maintain another mailserver, therefore I installed postfix and configured it as relay. So the system is able to send mails, but another mailserver is used. The new postfix setup accepts only local SMTP connections and port 25 is not accessible from the outside.

Some files include a byte order mark (BOM), which can be quite annoying. In this post I explain how to remove it.

I was not able to install the R-package FSelector, because a dependency could not be installed properly.