Using a JSON-API in R

Wrangling data can be a real pain, especially if you work with unstructured or semi-structured data. I had to convert a JSON object into an dataframe in R, which should be easy using jsonlite, but it was not - So I used tidyjson and I really like it.

First of all load all libraries (e.g., httr for interacting with websites, tidyjson to parse the response and dplyr because you are a cool kid). You will notice the slightly modified command in case of dplyr - it will suppress all these annoying warnings. If you want to use the newest package-versions you should use devtools to install them directly from the source code repository (e.g., devtools::install_github("jeremystan/tidyjson")).

library(httr)
library(tidyjson)
suppressMessages(library(dplyr))

Now we can load the data from the API (I had to modify the url a bit, because it contains an private key). The paste0 command is used to concatenate the url parts and the context function extracts the message (also known as HTML body) from the response.

urlArtist <- paste0("http://yourkey.api3.nextbigsound.com/metrics/artist/","356",".json")
json <- content(GET(urlArtist), "text")

The message (json variable) contains now a JSON string, which looks like this:

[
  {
    "Service": {
      "name": "Last.fm",
      "id": 2
    },
    "Profile": {
      "url": "http:\/\/www.last.fm\/music\/kanye+west",
      "id": 174990
    },
    "Metric": {
      "plays": {
        "17260": 193240062,
        "17261": 193277000,
        "17262": 193314868,
        "17263": 193361600,
        "17264": 193426101,
        "17265": 193463760,
        "17266": 193496738,
        "17267": 193528758
      },
      "fans": {
        "17260": 4141733,
        "17261": 4142021,
        "17262": 4142279,
        "17263": 4142410,
        "17264": 4142864,
        "17265": 4143160,
        "17266": 4143450,
        "17267": 4143715
      },
      "comments": [

      ]
    }
  }
]

We want to transform it to a proper dataframe. This can be done using tidyjson and dplyr.

music_metrics <- json %>% 
  # convert to a json table and unroll the array of 
  # service-objects (the square brackets in json)
  as.tbl_json %>% gather_array %>%
  # extract the service name (e.g., Last.fm) and the Profile url
  spread_values(
    serice = jstring("Service", "name"), 
    url = jstring("Profile", "url")
  ) %>% 
  # now we enter the object with the key "Metric" and 
  # use the alias "metric" for the key-names (e.g., plays, fans)
  enter_object("Metric") %>% gather_object("metric") %>% 
  # the Metric object contains some objects (e.g., plays, fans), 
  # but also some empty arrays (e.g., comments)
  # we don't want empty arrays and so we output the type of the 
  # object end take just objects (using dplyr's filter method)
  json_types() %>% filter(type=="object") %>% 
  # now we want all keys of this object (e.g., 17266), because 
  # they represent the date. So we call them simply "date"
  gather_object("date") %>% 
  # Now we extract the value, but it is in a strange format 
  # (day counts since the unix epoch)
  spread_values(value = jnumber()) %>%
  # to translate these to standard Unix timestamps in seconds, 
  # multiply the day count by 86,400
  # ... and define the reference day (01.01.1970)
  mutate(date = as.POSIXct(as.numeric(date) * 86400, origin="1970-01-01")) %>%
  # tidyjson adds some additional columns. We don't need 
  # them anymore, so we drop them ...
  select(-type,-document.id, -array.index)

Yeah, now we have a dataframe :)

music_metrics
serice url metric date value
1 Last.fm http://www.last.fm/music/kanye+west plays 1491264000.00 193240062.00
2 Last.fm http://www.last.fm/music/kanye+west plays 1491350400.00 193277000.00
3 Last.fm http://www.last.fm/music/kanye+west plays 1491436800.00 193314868.00
4 Last.fm http://www.last.fm/music/kanye+west plays 1491523200.00 193361600.00
5 Last.fm http://www.last.fm/music/kanye+west plays 1491609600.00 193426101.00
6 Last.fm http://www.last.fm/music/kanye+west plays 1491696000.00 193463760.00
7 Last.fm http://www.last.fm/music/kanye+west plays 1491782400.00 193496738.00
8 Last.fm http://www.last.fm/music/kanye+west plays 1491868800.00 193528758.00
9 Last.fm http://www.last.fm/music/kanye+west fans 1491264000.00 4141733.00
10 Last.fm http://www.last.fm/music/kanye+west fans 1491350400.00 4142021.00
11 Last.fm http://www.last.fm/music/kanye+west fans 1491436800.00 4142279.00