If you have to analyse a lot of data you need a good backend to fetch and store your results piecewise in small chunks try MonetDB(Lite). I often used CSV or TSV files, because it is convenient. But from time to time you have to do some filtering, joins or other data-operations. So I looked for a nice database and MonetDB seems to have some nice properties...
Sadly, installing Julia is not always simple. Although, there is a precompiled Julia package in the repositories, it is a badly outdated version. In this "recipe", I show how to install the latest Julia version on Debian Jessie.
Wrangling data can be a real pain, especially if you work with unstructured or semi-structured data. I had to convert a JSON object into an dataframe in R, which should be easy using jsonlite, but it was not - So I used tidyjson and I really like it.
In this tutorial you will learn how to build a basic "shitstorm" detection system for your facebook page. A shitstorm is also known as social media crisis or "getting flak on social media". Although, the term is mostly used in german speaking countries, I'll use it throughout this tutorial, because it is really expressive.
I noticed that google cloud storage is a bit cheaper than Amazon S3. Although, it seems not much, it can be a lot if you have some big data sets or huge backups stored in Amazon`s storage solution. This tutorial show how you can migrate from/to Google Cloud Storage. But keep in mind that you will have to pay for the file transfair!
MySQL/MariaDB are popular Databases, which are often used for data scraping, because they can handle a lot of INSERTs in a short time. In this post I will show you how to add a new user and set the permissions to access a database.
PiewDiePie announced to make a YouTube pause just a few moments ago, but he is still one of the best known human brands on YouTube. I tried to visualise the sentiment development of his video comments.
Apache Drill is a really easy to use dremel-based big-data analysis tool. So it's perfect if you have a lot of static data (read-only workloads) and want to use SQL. And the best of all: it does not require Hadoop/HDFS :)
Just Another Gibbs Sampler is a really nice piece of software, but if you want to calculate long markov chains it takes some time. I tried to optimise this process by using different versions, including some self-compiled binaries.
(September 15th, 2016)
I have to monitor some crontabs and they support email notification on errors. I don't want to manage and maintain another mailserver, therefore I installed postfix and configured it as relay. So the system is able to send mails, but another mailserver is used. The new postfix setup accepts only local SMTP connections and port 25 is not accessible from the outside.