I am using crontab to trigger a PHP-scraping script on regular basis, but the script gets called multiple times.

A recherché revealed that other people had the same issue (sp1, so2), but none of the provided “solutions” worked for me:

  • Some mentioned that this could happen if there are multiple crond-services are running parallel. (I called sudo service crond restart to stop and restart the service, but it doesn’t solve the problem).
  • One hotfix is to use flock or a similar approach to prevent different executions, but I would like to find the cause and not to combat symptoms.

To run this scripts I created a new user (incl. group) named scraping with the appropriate permissions and edited the users crontab:

sudo crontab -u scraping -e

The script should be called once a day (at 9am) so I added the following line:

* 09 * * * /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log

chris@linux-server:~$ ps aux | grep "/bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log"
scraping 22779  0.0  0.0   4476     0 ?        Ss   09:42   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22792  0.0  0.0   4476     0 ?        Ss   09:43   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22810  0.0  0.0   4476     0 ?        Ss   09:44   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22828  0.0  0.0   4476     0 ?        Ss   09:45   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22847  0.0  0.0   4476     0 ?        Ss   09:46   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22862  0.0  0.0   4476     0 ?        Ss   09:47   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22881  0.0  0.0   4476     0 ?        Ss   09:48   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22895  0.0  0.0   4476     0 ?        Ss   09:49   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22913  0.0  0.0   4476     0 ?        Ss   09:50   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22930  0.0  0.0   4476     0 ?        Ss   09:51   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 22943  0.0  0.0   4476     0 ?        Ss   09:52   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log
scraping 23013  0.0  0.0   4476     0 ?        Ss   09:54   0:00 /bin/sh -c /usr/bin/hhvm /var/WebScraping/index.php > /var/log/scraping_stdout.log 2> /var/log/scraping_stderr.log

So there are some strange things going on:

  • Why is none of the scripts triggered exactly at 09:00am?
  • Why is there a delay of ~1 minute between the different process start times?

I also tried to figure out if there are multiple cron instances and this is the case. But there should be only one crond (I restarted the service manually just a few hours before).

chris@linux-server:~$ ps -e | grep cron
20710 ?        00:00:00 cron
22774 ?        00:00:00 cron
22790 ?        00:00:00 cron
22806 ?        00:00:00 cron
22824 ?        00:00:00 cron
22844 ?        00:00:00 cron
22860 ?        00:00:00 cron
22877 ?        00:00:00 cron
22893 ?        00:00:00 cron
22909 ?        00:00:00 cron
22925 ?        00:00:00 cron
22940 ?        00:00:00 cron
23011 ?        00:00:00 cron

Further investigation revealed, that the cronjob was triggered 60 times (once every minute). So the problem was the first * in the crontab.