data:image/s3,"s3://crabby-images/a0320/a03205cd1ff3099d43fe7cb33d3e47b51b5c155d" alt="Animating a Day of Traffic"
Animating a Day of Traffic
When you’re undertaking exploration, the process is never as clean, linear or as straightforward as your explanation after you’re done. In this post I’ll show you how I came from the cold to interface with a road traffic API and create an animated gif for a day of road traffic flow. We’ll come across a few stumbling blocks and rather than brush over these, I’ll show you how I identified the problems and worked past them.
Traffic API
The API is WebTris API and we’ll use the webTRISr R package to interface with it.
library(webTRISr)
library(dplyr)
webTRISr
has a few simple commands to get meta data from the API.
webtris_areas()
The WebTris API covers the Strategic Road Network. This network is managed by Highways England and contains England’s motorways plus a few key A-Roads. Highways England break the network down into areas and issue contracts to manage these areas. There are different types of contract, but the DBFO area contacts are for a company to Design, Build, Finance and Operate the area.
head(webtris_sites())
Across the Strategic Road Network are traffic measuring sites. These sites are fixed points across the network with some equipment that count how many vehicles go past. They may also record speed, size and which lanes vehicles are travelling in. A few of these systems are MIDAS, ANPR and various induction loop schemes.
nrow(webtris_sites())
[1] 17690
Sites on a map
We can use the API to check where the measuring points are. Stack Overflow has this and this guide which helps explain mapping in R.
library(ggplot2)
library(ggmap)
Load the map meta data and sites
uk <- map_data(map = "world", region = "UK")
sites <- webtris_sites()[, c("Id", "Description", "Latitude", "Longitude")]
colnames(sites) <- c("id", "description", "lat", "lon") # we'll use description and Id later when mapping flow to locations
and create the map.
ggplot() +
geom_polygon(data = uk, aes(x = long, y = lat, group = group)) +
coord_map() +
geom_point(data = sites, aes(x = lon, y = lat, fill = "red", alpha = 0.8), size = 1, shape = 21) +
ggtitle("Traffic Monitoring Sites Across England") +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank(),
legend.position = "none")
This map shows all the monitoring points for England’s Strategic Road Network.
Flow reports
We can use the webTRISr
interface to run traffic reports for a site.
testReport1 <- webtris_report(sites=c("7"), start_date="01-01-2017", end_date="01-03-2017", report_type="daily")
Fetched page 1 of approximately 10000 rows
We get a warning about how much data we’ve returned. Let’s check how many rows we have.
nrow(testReport1)
[1] 5280
We can try another report to see if we should care about the warning.
testReport2 <- webtris_report(sites=c("7"), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
nrow(testReport2)
[1] 96
We’ve returned much less than testReport1
, but we still got a warning. If the warning was to say we didn’t get all the data, I’d expect testReport2
to have at least 5280 rows.
Let’s press on and look at a report.
head(testReport2)
It was easy to plot all the sites, can we use this report to plot something more interesting like vehicle flow?
Calculate flow
This paper from the TRL gives an in-depth introduction to traffic and flow modelling. We basically need to count how many vehicles passed a measuring site in a time.
Each report returns results over 15 minute intervals for one site. I’m not interested in vehicle size or average speed, so we can discard those columns.
reportCols <- c("Site Name", "Report Date", "Time Period Ending", "Time Interval", "Total Volume")
testReport2[1:10, reportCols]
We will be able to work out flow by dividing Total Volume / 15
minutes to get vehicles/minute.
More flow reports
Now we’ve seen how to get the flow report for one site, let’s try and get the entire UK. First check that we can get a day of data across a few sites.
siteIds <- unlist(sites["id"])
Try 30 sites.
testAllUKDailyReport <- webtris_report(sites=c(siteIds[1:30]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')[, reportCols]
Fetched page 1 of approximately 10000 rows
testAllUKDailyReport$flow <- testAllUKDailyReport[, "Total Volume"] / 15
unique(testAllUKDailyReport[, "Site Name"])
[1] "A14/1908A" "A1M/2259B" "A1M/9869K" "A2/8392M" "M1/2413B" "M1/2771A" "M1/4846A"
[8] "M1/4890A" "M2/8528A" "M25/4490B" "M25/4876A" "M25/5135B" "M25/5764B" "M27/9273A"
[15] "M27/9387A" "M3/2039B" "M3/2165M" "M3/2173A" "M4/3479A" "M5/7138M" "M5/7482B"
[22] "M6/5838B" "M6/6988B" "M62/2099A" "M62/2328B"
That worked. Let’s try it with 50 sites.
testAllUKDailyReport <- webtris_report(sites=c(siteIds[1:50]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
unique(testAllUKDailyReport[, "Site Name"])
Error: Can't find column `Site Name` in `.data`.
That didn’t work. Perhaps there was some bad data? Let’s just look at sites 30 to 50.
testAllUKDailyReport <- webtris_report(sites=c(siteIds[30:50]), start_date="01-01-2017", end_date="01-01-2017", report_type ='daily')
Fetched page 1 of approximately 10000 rows
unique(testAllUKDailyReport[, "Site Name"])
[1] "A1M/9468J" "M1/2607J" "M1/2717J" "M1/3949L" "M1/4085B" "M1/5157B" "M1/5163J"
[8] "M2/8528A" "M20/6568B1" "M25/4101L" "M25/4116A" "M25/4465A" "M25/5165M" "M42/6168A"
[15] "M5/9061A" "M54/3154A" "M56/8150A" "M6/5962B" "M6/7412B" "M621/1133A"
This worked. It suggests to me that the API is rate limited. Let’s build our results covering England in chunks of 30.
chunkSize <- 29
siteBatch <- chunkSize
day <- "27-05-2019"
allUKDailyReport <- webtris_report(sites=c(siteIds[1:chunkSize]), start_date=day, end_date=day, report_type ='daily')[, reportCols]
while (siteBatch <= nrow(sites)){
reportChunk <- webtris_report(sites=c(siteIds[(siteBatch + 1):(siteBatch + chunkSize)]),
start_date=day, end_date=day, report_type ='daily')[, reportCols]
allUKDailyReport <- rbind(allUKDailyReport, reportChunk)
siteBatch <- siteBatch + chunkSize
}
Error: parse error: premature EOF
(right here) ------^
I’m feeling impatient and don’t want to look into this premature end of file error.
Map flow to location
We can get the location of a report by mapping Site Name
from webtirs_report()
to the description
in webtris_site()
.
head(sites[sites$description == "A1M/9869K", ])
We then process our results for plotting.
allUKDailyLocations <- inner_join(sites, allUKDailyReport, by = c("description" = "Site Name"))
allUKDailyLocations$vehmin <- allUKDailyLocations[, "Total Volume"] / 15
colnames(allUKDailyLocations) <- c("id", "description", "lat", "lon", "reportDate", "timePeriodEnding", "timeInterval", "totalVolume", "vehmin")
Plot flow
I use this to plot vehicle flow during 15 minute intervals on a map. A smaller timeInterval
is earlier in the day and a bigger one is later in the day.
library(gridExtra)
MapPlotter <- function(timeInterval){
ggplot() +
geom_polygon(data = uk, aes(x = long, y = lat, group = group), fill = "grey") +
coord_map() +
geom_point(data = allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, ],
aes(x = lon, y = lat, fill = vehmin), size = 1, shape = 21, alpha = 0.5) +
xlab(paste("Time of day",
allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, "timePeriodEnding"])) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank()) +
# anchor the colour on max flow - so all charts have a similar scale
scale_fill_gradient(limits = c(0, max(allUKDailyLocations$vehmin, na.rm = T)),
low = "deepskyblue", high = "grey15", name = "Flow")
}
I should say, I’m only plotting these maps in a grid to show you a few time intervals at once. When I did this for my own benefit, I looked at one fullscreen map at a time.
grid.arrange(MapPlotter(2), MapPlotter(45), MapPlotter(60), MapPlotter(90))
After trying a few different values for timeInterval
, the flow doesn’t seem to change. Have we loaded the data correctly or is there something else at play? Perhaps the data for time interval doesn’t change, or a measuring site isn’t reporting data. Maybe we made a false assumption about how the reports work.
Let’s investigate.
Check data quality
Perhaps this is because of the end of file error we saw earlier. First I check how many rows a few of the sites give us.
table(allUKDailyLocations[!is.na(allUKDailyLocations[ ,"vehmin"]) , "id"])[1:30]
2 3 5 7 8 13 14 15 17 18 19 20 21 23 24 25 26 28 30 31 32 33 34 35 36 38 41 43 44 46
96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 94 96 96 96 96 96 96 96
Most of these sites have 96 rows. We can confirm how many time intervals are in the data set.
unique(allUKDailyLocations[, "timeInterval"])
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[31] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
[61] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
[91] 90 91 92 93 94 95
We have 96 time intervals. Which sites have less than 96 records?
tab <- table(allUKDailyLocations[!is.na(allUKDailyLocations[ ,"vehmin"]) , "id"])
tab[which(tab < 96)]
34 47 201 256 461 547 1009 1017 1499 1876 2051 2193 2427 2829 3323
94 89 74 94 1 92 1 94 79 90 84 42 94 93 74
3361 3402 3437 3457 3656 3658 3918 4214 4659 5153 5290 5433 5474 5689 5768
95 95 1 95 95 1 86 93 90 1 95 88 1 94 94
5849 5898 5912 6246 6327 6340 6341 6351 6352 6491 6498 6499 6506 6514 6648
93 1 93 93 92 30 30 92 92 68 68 68 68 68 8
6649 6705 6706 6708 6709 6710 6881 6882 7010 7011 7146 7148 7208 7215 7218
7 92 92 30 4 4 92 92 34 34 34 34 68 30 68
7315 7316 8335 8336 8632 8786 8787 8812 8813 9408 10061
86 86 45 45 66 92 92 42 42 87 93
About 20 sites have less than a full day of data (96 records), I don’t think the premature EOF error is our problem - if it was I would expect a large number of sites without full result sets (missing rows).
We can plot a few of the sites with a full set of data to make sure the flow changes over time.
ggplot(allUKDailyLocations[allUKDailyLocations[, "id"] %in% c(2,3,31,32,75,77), ]) +
geom_line(aes(x = timeInterval, y = vehmin, colour = as.factor(id)))
The data from these sites change, but why doesn’t our map change? Let’s check change in flow (flowDelta
) for all sites.
maxFlow <- allUKDailyLocations %>% group_by(id) %>% filter(vehmin == max(vehmin))
minFlow <- allUKDailyLocations %>% group_by(id) %>% filter(vehmin == min(vehmin))
# vehmin is vehicles a minute
flowDelta <- inner_join(maxFlow, minFlow, by = "id") %>% mutate(delta = vehmin.x - vehmin.y)
summary(flowDelta$delta)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 3.588 0.000 121.000
ggplot(flowDelta) +
geom_histogram(aes(x = delta))
This suggests most of our sites have no change in flow. It’s a highly aggregated view of our data and before taking it at face value I’d like to spot check a few sites that don’t report any flow.
unique(flowDelta[flowDelta$delta == 0, "description.x"])[1:10, ]
allUKDailyLocations[allUKDailyLocations$description == "M4/3838B5", ]
The structure of the report for the first site looks sensible, the only thing odd is no vehicle count. Perhaps the big spike showing most sites have no change in vehicle flow is because most of the sensors across the network are broken and not recording any traffic (or there are large parts of the Network with no vehicles passing through). Next, I look at the sites that have a change in flow, i.e sites that do have data.
ggplot(flowDelta[flowDelta$totalVolume.x > 0, ]) +
geom_histogram(aes(x = delta))
I can make sense of this chart. It shows sites with vehicle flow and makes me question our first histogram that showed lots of sites with no change in vehicle flow. Aggregates can catch you out in silly ways so I’d like to double check and see how many unique sites had 0 flow.
count(unique(flowDelta[flowDelta$delta == 0, "description.x"]))
Huh, only 9 sites don’t have a change in vehicle flow at some time during the day. I was expecting a really high number. The histogram said 80,000 sites have no change in flow, why don’t we see this here? Are a few sites with no flow data skewing this branch of investigation? How many sites do have a change in flow.
count(unique(flowDelta[flowDelta$delta != 0, "description.x"]))
Yes, a few sites with 0 change in flow are skewing the results. There are 9 sites with no change in flow but 6421 with some change. When I filtered on the min and max flow, it must have returned all the rows for sites with 0 flow, matching all 96 time intervals for a site to give us `(96 * 95) * 9 = 82080’ ‘sites’ with no flow.
So why doesn’t our map change?
From the histogram earlier, most sites have a variation in flow between 0 - 40 vehicles a minute but the colour axis had to cover a much bigger range. What if we plot the flow on a log scale?
Plot log flow
LogMapPlotter <- function(timeInterval){
ggplot() +
geom_polygon(data = uk, aes(x = long, y = lat, group = group), fill = "grey") +
coord_map() +
geom_point(data = allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, ],
aes(x = lon, y = lat, fill = log(vehmin)), size = 1, shape = 21, alpha = 0.5) +
xlab(paste("Time of day",
allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, "timePeriodEnding"])) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank()) +
scale_fill_gradient(limits = c(0, max(log(allUKDailyLocations$vehmin), na.rm = T)),
low = "deepskyblue", high = "grey15", name = "Flow")
}
grid.arrange(LogMapPlotter(2), LogMapPlotter(45), LogMapPlotter(60), LogMapPlotter(90))
This looks better - we can see the flow at sites changing. We can use these images to make an animated gif.
Save images for the gif
To create the gif, we just need to play one image after the other. We’ll turn the ggplot
objects to .png
by saving them in a temporary location. We can then read and combine them into a gif.
We start by preparing the meta data, including a place to save the images.
imageStorePath <- "images\\gif_tmp\\"
plots <- list()
timeIntervals <- unique(allUKDailyLocations[, "timeInterval"])
We then loop through each time interval, create a plot and save a compressed version as a .png
on disk. The images need to be compressed, otherwise it takes a few hours to create the gif.
for (t in timeIntervals){
p <- LogMapPlotter(t)
# https://stackoverflow.com/questions/45113597/error-unable-to-start-png-device?rq=1
ggsave(filename = paste0(imageStorePath, sprintf('%02.f', t), ".png"), plot = p, device = "png", dpi = 150)
plots[[t + 1]] <- p
}
Saving 7 x 7 in image
Use the magik library
In a previous post - Growing a Lorenz Butterfly in R, I created a gif with a system call to ImageMagick. Whilst working on this post, I found this solution using r image magick which may be neater. The package is quite new and according to the vignette “…being a first release, documentation is still sparse.”
library(magick)
package 㤼㸱magick㤼㸲 was built under R version 3.6.1Linking to ImageMagick 6.9.9.14
Enabled features: cairo, freetype, fftw, ghostscript, lcms, pango, rsvg, webp
Disabled features: fontconfig, x11
Create a test gif
Before we loop through all the images and make the full gif, I’d like to test the process on a few images. Foreach image we created we need to:
- Read it
- Join it to our list
- Animate the result
density <- 10 # default desnity = 72dpi
img1 <- image_read(paste0(imageStorePath, "\\00.png"), density = density)
img2 <- image_read(paste0(imageStorePath, "\\01.png"), density = density)
img3 <- image_read(paste0(imageStorePath, "\\02.png"), density = density)
imgGroup <- image_join(img1, img2, img3)
image_animate(imgGroup, fps = 1)
The gif appears in my R-Studio viewer, but doesn’t display in this notebook. I’ll try with an example from the documentation.
# Download earth gif and make it a bit smaller for vignette
earth <- image_read("https://jeroen.github.io/images/earth.gif") %>%
image_scale("200x") %>%
image_quantize(128)
length(earth)
[1] 44
earth
This doesn’t fare any better. So I’ll just embed the test gif manually for you to see.
You have to look closely, but the flow does change.
Create the gif
To make our gif for one day of traffic flow we simply have to repeat the process on all the images. This time, instead of manually reading each image, we check the image directory for all images we need,
files <- list.files(path = imageStorePath, pattern = "*.png", full.names = T)
read them all in,
images <- sapply(files, FUN = image_read)
join them all together
imageGroup <- image_join(images)
create the gif
startTime <- Sys.time()
gif <- image_animate(imageGroup, fps = 4)
endTime <- Sys.time()
endTime - startTime
Time difference of 8.65783 secs
and render it by writing the gif to disk.
startTime <- Sys.time()
image_write(gif, paste0(imageStorePath, "\\road_gif.gif"))
endTime <- Sys.time()
endTime - startTime
Time difference of 1.06791 hours
Here we have it a gif of traffic flow across England on Monday 27th May! It isn’t very informative, but I find it mildly entertaining to watch.
A better colour scheme
Perhaps I can make the gif a bit more informative by using a diverging colour series.
maxColour <- max(log(allUKDailyLocations$vehmin), na.rm = T)
LogMapPlotterDiverging <- function(timeInterval){
ggplot() +
geom_polygon(data = uk, aes(x = long, y = lat, group = group), fill = "grey") +
coord_map() +
geom_point(data = allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, ],
aes(x = lon, y = lat, fill = log(vehmin)), size = 1, shape = 21, alpha = 0.5) +
xlab(paste("Time of day",
allUKDailyLocations[allUKDailyLocations[, "timeInterval"] == timeInterval, "timePeriodEnding"])) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank()) +
scale_fill_gradient2(limits = c(0, maxColour),
low = "blue", high = "red", mid = "white", midpoint = maxColour/2, name = "Flow")}
grid.arrange(LogMapPlotterDiverging(2), LogMapPlotterDiverging(45), LogMapPlotterDiverging(60), LogMapPlotterDiverging(90))
In these charts, red shows sites with high flow of traffic, white shows the median flow for the day and blue indicates a low flow. We also see quite a few grey sites - these are sites that return null data.
length(unique(allUKDailyLocations[is.na(allUKDailyLocations$vehmin), "id"]))
[1] 2438
There are a few thousands sites that don’t return vehicle count data.
I use this new colour scheme to build a gif.
for (t in timeIntervals){
p <- LogMapPlotterDiverging(t)
# https://stackoverflow.com/questions/45113597/error-unable-to-start-png-device?rq=1
ggsave(filename = paste0(imageStorePath, sprintf('%02.f', t), ".png"), plot = p, device = "png", dpi = 150)
plots[[t + 1]] <- p
}
Saving 7 x 7 in image
files <- list.files(path = imageStorePath, pattern = "*.png", full.names = T)
images <- sapply(files, FUN = image_read)
imageGroup <- image_join(images)
gif <- image_animate(imageGroup, fps = 4)
image_write(gif, paste0(imageStorePath, "\\road_gif.gif"))
If we watch the gif long enough, we see the flow rise during the day, and fall during the evening, that many of the measuring sites around the London M25 Ring Road are broken and that the M4 from London to Bristol along with the M1 from Leeds, before it branches into the A1, have high flow on this day.
Some supporting charts would help us further interpret this gif but for now, thanks for reading.