Animating a Day of Traffic

Animating a Day of Traffic

2019, Sep 20    
traffic_flow_gif.utf8.md

When you’re undertaking exploration, the process is never as clean, linear or as straightforward as your explanation after you’re done. In this post I’ll show you how I came from the cold to interface with a road traffic API and create an animated gif for a day of road traffic flow. We’ll come across a few stumbling blocks and rather than brush over these, I’ll show you how I identified the problems and worked past them.

Traffic API

The API is WebTris API and we’ll use the webTRISr R package to interface with it.

library(webTRISr)
library(dplyr)

webTRISr has a few simple commands to get meta data from the API.

webtris_areas()

The WebTris API covers the Strategic Road Network. This network is managed by Highways England and contains England’s motorways plus a few key A-Roads. Highways England break the network down into areas and issue contracts to manage these areas. There are different types of contract, but the DBFO area contacts are for a company to Design, Build, Finance and Operate the area.

head(webtris_sites())

Across the Strategic Road Network are traffic measuring sites. These sites are fixed points across the network with some equipment that count how many vehicles go past. They may also record speed, size and which lanes vehicles are travelling in. A few of these systems are MIDAS, ANPR and various induction loop schemes.

nrow(webtris_sites())
[1] 17690

Sites on a map

We can use the API to check where the measuring points are. Stack Overflow has this and this guide which helps explain mapping in R.

library(ggplot2)
library(ggmap)

Load the map meta data and sites

uk <- map_data(map = "world", region = "UK") 
sites <- webtris_sites()[, c("Id", "Description", "Latitude", "Longitude")]
colnames(sites) <- c("id", "description", "lat", "lon") # we'll use description and Id later when mapping flow to locations

and create the map.

ggplot() + 
  geom_polygon(data = uk, aes(x = long, y = lat, group = group)) +
  coord_map() +
  geom_point(data = sites, aes(x = lon, y = lat, fill = "red", alpha = 0.8), size = 1, shape = 21) +
  ggtitle("Traffic Monitoring Sites Across England") +
   theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank(),
        legend.position = "none")

This map shows all the monitoring points for England’s Strategic Road Network.

Flow reports

We can use the webTRISr interface to run traffic reports for a site.

testReport1 <- webtris_report(sites=c("7"), start_date="01-01-2017", end_date="01-03-2017", report_type="daily")
Fetched page 1 of approximately 10000 rows

We get a warning about how much data we’ve returned. Let’s check how many rows we have.

nrow(testReport1)
[1] 5280

We can try another report to see if we should care about the warning.

testReport2 <- webtris_report(sites=c("7"), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
nrow(testReport2)
[1] 96

We’ve returned much less than testReport1, but we still got a warning. If the warning was to say we didn’t get all the data, I’d expect testReport2 to have at least 5280 rows.

Let’s press on and look at a report.

head(testReport2)

It was easy to plot all the sites, can we use this report to plot something more interesting like vehicle flow?

Calculate flow

This paper from the TRL gives an in-depth introduction to traffic and flow modelling. We basically need to count how many vehicles passed a measuring site in a time.

Each report returns results over 15 minute intervals for one site. I’m not interested in vehicle size or average speed, so we can discard those columns.

reportCols <- c("Site Name", "Report Date", "Time Period Ending", "Time Interval", "Total Volume")
testReport2[1:10, reportCols]

We will be able to work out flow by dividing Total Volume / 15 minutes to get vehicles/minute.

More flow reports

Now we’ve seen how to get the flow report for one site, let’s try and get the entire UK. First check that we can get a day of data across a few sites.

siteIds <- unlist(sites["id"])

Try 30 sites.

testAllUKDailyReport <- webtris_report(sites=c(siteIds[1:30]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')[, reportCols]
Fetched page 1 of approximately 10000 rows
testAllUKDailyReport$flow <- testAllUKDailyReport[, "Total Volume"] / 15
 

unique(testAllUKDailyReport[, "Site Name"])
 [1] "A14/1908A" "A1M/2259B" "A1M/9869K" "A2/8392M"  "M1/2413B"  "M1/2771A"  "M1/4846A" 
 [8] "M1/4890A"  "M2/8528A"  "M25/4490B" "M25/4876A" "M25/5135B" "M25/5764B" "M27/9273A"
[15] "M27/9387A" "M3/2039B"  "M3/2165M"  "M3/2173A"  "M4/3479A"  "M5/7138M"  "M5/7482B" 
[22] "M6/5838B"  "M6/6988B"  "M62/2099A" "M62/2328B"

That worked. Let’s try it with 50 sites.

testAllUKDailyReport <- webtris_report(sites=c(siteIds[1:50]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
 

unique(testAllUKDailyReport[, "Site Name"])
Error: Can't find column `Site Name` in `.data`.

That didn’t work. Perhaps there was some bad data? Let’s just look at sites 30 to 50.

testAllUKDailyReport <- webtris_report(sites=c(siteIds[30:50]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
 

unique(testAllUKDailyReport[, "Site Name"])
 [1] "A1M/9468J"  "M1/2607J"   "M1/2717J"   "M1/3949L"   "M1/4085B"   "M1/5157B"   "M1/5163J"  
 [8] "M2/8528A"   "M20/6568B1" "M25/4101L"  "M25/4116A"  "M25/4465A"  "M25/5165M"  "M42/6168A" 
[15] "M5/9061A"   "M54/3154A"  "M56/8150A"  "M6/5962B"   "M6/7412B"   "M621/1133A"

This worked. It suggests to me that the API is rate limited. Let’s build our results covering England in chunks of 30.

chunkSize <- 29
siteBatch <- chunkSize
day <- "27-05-2019"

allUKDailyReport <- webtris_report(sites=c(siteIds[1:chunkSize]), start_date=day, end_date=day, report_type ='daily')[, reportCols]

 
while (siteBatch <= nrow(sites)){
  reportChunk <- webtris_report(sites=c(siteIds[(siteBatch + 1):(siteBatch + chunkSize)]), 
                                start_date=day, end_date=day, report_type ='daily')[, reportCols]

  allUKDailyReport <- rbind(allUKDailyReport, reportChunk)
  siteBatch <- siteBatch + chunkSize
}
Error: parse error: premature EOF
                                       
                     (right here) ------^

I’m feeling impatient and don’t want to look into this premature end of file error.

Map flow to location

We can get the location of a report by mapping Site Name from webtirs_report() to the description in webtris_site().

head(sites[sites$description == "A1M/9869K", ])

We then process our results for plotting.

allUKDailyLocations <- inner_join(sites, allUKDailyReport, by = c("description" = "Site Name"))
allUKDailyLocations$vehmin <- allUKDailyLocations[, "Total Volume"] / 15

colnames(allUKDailyLocations) <- c("id", "description", "lat", "lon", "reportDate", "timePeriodEnding", "timeInterval", "totalVolume", "vehmin")

Plot flow

I use this to plot vehicle flow during 15 minute intervals on a map. A smaller timeInterval is earlier in the day and a bigger one is later in the day.

library(gridExtra)
MapPlotter <- function(timeInterval){
  ggplot() +
    geom_polygon(data = uk, aes(x = long, y = lat, group = group), fill = "grey") +
    coord_map() +
    geom_point(data = allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, ],
               aes(x = lon, y = lat, fill = vehmin), size = 1, shape = 21, alpha = 0.5) +
    xlab(paste("Time of day", 
                  allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, "timePeriodEnding"])) +
    theme(axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.title.y = element_blank(),
          axis.ticks = element_blank(),
          panel.background = element_blank()) +
    # anchor the colour on max flow - so all charts have a similar scale
    scale_fill_gradient(limits = c(0, max(allUKDailyLocations$vehmin, na.rm = T)),
                        low = "deepskyblue", high = "grey15", name = "Flow")
}

I should say, I’m only plotting these maps in a grid to show you a few time intervals at once. When I did this for my own benefit, I looked at one fullscreen map at a time.

grid.arrange(MapPlotter(2), MapPlotter(45), MapPlotter(60), MapPlotter(90))

After trying a few different values for timeInterval, the flow doesn’t seem to change. Have we loaded the data correctly or is there something else at play? Perhaps the data for time interval doesn’t change, or a measuring site isn’t reporting data. Maybe we made a false assumption about how the reports work.

Let’s investigate.

Check data quality

Perhaps this is because of the end of file error we saw earlier. First I check how many rows a few of the sites give us.

table(allUKDailyLocations[!is.na(allUKDailyLocations[ ,"vehmin"]) , "id"])[1:30]

 2  3  5  7  8 13 14 15 17 18 19 20 21 23 24 25 26 28 30 31 32 33 34 35 36 38 41 43 44 46 
96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 94 96 96 96 96 96 96 96 

Most of these sites have 96 rows. We can confirm how many time intervals are in the data set.

unique(allUKDailyLocations[, "timeInterval"])
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[31] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
[61] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
[91] 90 91 92 93 94 95

We have 96 time intervals. Which sites have less than 96 records?

tab <- table(allUKDailyLocations[!is.na(allUKDailyLocations[ ,"vehmin"]) , "id"])

tab[which(tab < 96)]

   34    47   201   256   461   547  1009  1017  1499  1876  2051  2193  2427  2829  3323 
   94    89    74    94     1    92     1    94    79    90    84    42    94    93    74 
 3361  3402  3437  3457  3656  3658  3918  4214  4659  5153  5290  5433  5474  5689  5768 
   95    95     1    95    95     1    86    93    90     1    95    88     1    94    94 
 5849  5898  5912  6246  6327  6340  6341  6351  6352  6491  6498  6499  6506  6514  6648 
   93     1    93    93    92    30    30    92    92    68    68    68    68    68     8 
 6649  6705  6706  6708  6709  6710  6881  6882  7010  7011  7146  7148  7208  7215  7218 
    7    92    92    30     4     4    92    92    34    34    34    34    68    30    68 
 7315  7316  8335  8336  8632  8786  8787  8812  8813  9408 10061 
   86    86    45    45    66    92    92    42    42    87    93 

About 20 sites have less than a full day of data (96 records), I don’t think the premature EOF error is our problem - if it was I would expect a large number of sites without full result sets (missing rows).

We can plot a few of the sites with a full set of data to make sure the flow changes over time.

ggplot(allUKDailyLocations[allUKDailyLocations[, "id"] %in% c(2,3,31,32,75,77), ]) +
  geom_line(aes(x = timeInterval, y = vehmin, colour = as.factor(id)))

The data from these sites change, but why doesn’t our map change? Let’s check change in flow (flowDelta) for all sites.

maxFlow <- allUKDailyLocations %>% group_by(id) %>% filter(vehmin == max(vehmin))
minFlow <- allUKDailyLocations %>% group_by(id) %>% filter(vehmin == min(vehmin))
# vehmin is vehicles a minute
flowDelta <- inner_join(maxFlow, minFlow, by = "id") %>% mutate(delta = vehmin.x - vehmin.y)
summary(flowDelta$delta)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   3.588   0.000 121.000 
ggplot(flowDelta) +
  geom_histogram(aes(x = delta))

This suggests most of our sites have no change in flow. It’s a highly aggregated view of our data and before taking it at face value I’d like to spot check a few sites that don’t report any flow.

unique(flowDelta[flowDelta$delta == 0, "description.x"])[1:10, ]
allUKDailyLocations[allUKDailyLocations$description == "M4/3838B5", ]

The structure of the report for the first site looks sensible, the only thing odd is no vehicle count. Perhaps the big spike showing most sites have no change in vehicle flow is because most of the sensors across the network are broken and not recording any traffic (or there are large parts of the Network with no vehicles passing through). Next, I look at the sites that have a change in flow, i.e sites that do have data.

ggplot(flowDelta[flowDelta$totalVolume.x > 0, ]) +
  geom_histogram(aes(x = delta))

I can make sense of this chart. It shows sites with vehicle flow and makes me question our first histogram that showed lots of sites with no change in vehicle flow. Aggregates can catch you out in silly ways so I’d like to double check and see how many unique sites had 0 flow.

count(unique(flowDelta[flowDelta$delta == 0, "description.x"]))

Huh, only 9 sites don’t have a change in vehicle flow at some time during the day. I was expecting a really high number. The histogram said 80,000 sites have no change in flow, why don’t we see this here? Are a few sites with no flow data skewing this branch of investigation? How many sites do have a change in flow.

count(unique(flowDelta[flowDelta$delta != 0, "description.x"]))

Yes, a few sites with 0 change in flow are skewing the results. There are 9 sites with no change in flow but 6421 with some change. When I filtered on the min and max flow, it must have returned all the rows for sites with 0 flow, matching all 96 time intervals for a site to give us `(96 * 95) * 9 = 82080’ ‘sites’ with no flow.

So why doesn’t our map change?

From the histogram earlier, most sites have a variation in flow between 0 - 40 vehicles a minute but the colour axis had to cover a much bigger range. What if we plot the flow on a log scale?

Plot log flow

LogMapPlotter <- function(timeInterval){
    ggplot() +
    geom_polygon(data = uk, aes(x = long, y = lat, group = group), fill = "grey") +
    coord_map() +
    geom_point(data = allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, ],
               aes(x = lon, y = lat, fill = log(vehmin)), size = 1, shape = 21, alpha = 0.5) +
    xlab(paste("Time of day", 
                  allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, "timePeriodEnding"])) +
    theme(axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.title.y = element_blank(),
          axis.ticks = element_blank(),
          panel.background = element_blank()) +
    scale_fill_gradient(limits = c(0, max(log(allUKDailyLocations$vehmin), na.rm = T)),
                        low = "deepskyblue", high = "grey15", name = "Flow")
}
grid.arrange(LogMapPlotter(2), LogMapPlotter(45), LogMapPlotter(60), LogMapPlotter(90))

This looks better - we can see the flow at sites changing. We can use these images to make an animated gif.

Save images for the gif

To create the gif, we just need to play one image after the other. We’ll turn the ggplot objects to .png by saving them in a temporary location. We can then read and combine them into a gif.

We start by preparing the meta data, including a place to save the images.

imageStorePath <- "images\\gif_tmp\\"
plots <- list()
timeIntervals <- unique(allUKDailyLocations[, "timeInterval"])

We then loop through each time interval, create a plot and save a compressed version as a .png on disk. The images need to be compressed, otherwise it takes a few hours to create the gif.

for (t in timeIntervals){
  p <- LogMapPlotter(t)
  # https://stackoverflow.com/questions/45113597/error-unable-to-start-png-device?rq=1
  ggsave(filename = paste0(imageStorePath, sprintf('%02.f', t), ".png"), plot = p, device = "png", dpi = 150) 

  plots[[t + 1]] <- p
}
Saving 7 x 7 in image

Use the magik library

In a previous post - Growing a Lorenz Butterfly in R, I created a gif with a system call to ImageMagick. Whilst working on this post, I found this solution using r image magick which may be neater. The package is quite new and according to the vignette “…being a first release, documentation is still sparse.”

library(magick)
package 㤼㸱magick㤼㸲 was built under R version 3.6.1Linking to ImageMagick 6.9.9.14
Enabled features: cairo, freetype, fftw, ghostscript, lcms, pango, rsvg, webp
Disabled features: fontconfig, x11

Create a test gif

Before we loop through all the images and make the full gif, I’d like to test the process on a few images. Foreach image we created we need to:

  • Read it
  • Join it to our list
  • Animate the result
density <- 10 # default desnity = 72dpi

img1 <- image_read(paste0(imageStorePath, "\\00.png"), density = density)
img2 <- image_read(paste0(imageStorePath, "\\01.png"), density = density)
img3 <- image_read(paste0(imageStorePath, "\\02.png"), density = density)
imgGroup <- image_join(img1, img2, img3)
image_animate(imgGroup, fps = 1)

The gif appears in my R-Studio viewer, but doesn’t display in this notebook. I’ll try with an example from the documentation.

# Download earth gif and make it a bit smaller for vignette
earth <- image_read("https://jeroen.github.io/images/earth.gif") %>%
  image_scale("200x") %>%
  image_quantize(128)

length(earth)
[1] 44
earth

This doesn’t fare any better. So I’ll just embed the test gif manually for you to see.

You have to look closely, but the flow does change.

Create the gif

To make our gif for one day of traffic flow we simply have to repeat the process on all the images. This time, instead of manually reading each image, we check the image directory for all images we need,

files <- list.files(path = imageStorePath, pattern = "*.png", full.names = T)

read them all in,

images <- sapply(files, FUN = image_read)

join them all together

imageGroup <- image_join(images)

create the gif

startTime <- Sys.time()

gif <- image_animate(imageGroup, fps = 4)

endTime <- Sys.time()
endTime - startTime
Time difference of 8.65783 secs

and render it by writing the gif to disk.

startTime <- Sys.time()

image_write(gif, paste0(i