I had to implement a web scraping script, which fetches data from a website on a given interval. The problem was that sometimes the values were updated by the data provider and so some values in my scraping database had to be reloaded and updated as well.

We wanted to speed up a php web scraper. It uses a while loop to load all data sequentially from a csv file. The performance was quite bad because the script always blocks execution while it waits for a HTTP response (this happens on each loop iteration). I used the following approach to parallelise the execution with minimal rewriting afford.

HipHop was facebook's attempt to make php faster. They tried to convert source code parts to native machine code, which worked quite well. Now they work on another (but quite related) approach called hhvm (hiphop virtual machine), which is even nicer. I installed it on a ubuntu vm and it runs like a charm.

Logging is an important part of debugging and monitoring applications. But which library should I use?