As managing director of Innovation at Blueprint Technologies, I have the pleasure of working directly with some of the most talented data scientists in the world, both within our company and through our various partners. A common theme I have found in projects involving data science is the need for significant amounts of data.
Recently, we worked with our largest partner, Microsoft, on a video analytics project. It was an incredible opportunity to experiment with Azure for video processing and analysis. The case study for this project will be published soon, so rather than detail out the solution, I’ll cover a problem we had to overcome early in the project: availability of relevant video data.
We had 60 days to go from whiteboard to market with a video analytics solution that solved for a specific use case within a specific industry. We needed overhead video footage of people and vehicles within a city environment. After a bit of Google-Fu, we found quite a few overhead static imagery datasets but we needed video. The few video datasets we did find lacked the desired consistency. We had to figure out something different. Quickly.
We experimented with generating the video we needed using drones. The approach lacked the traffic density we needed.
Attempts to capture footage of live traffic resulted in warnings by local law enforcement on the use of civilian drones in high traffic areas. It was time to try something different. Or get arrested.
Previously, we had success generating training data for machine learning models using video games. In fact, at the Apache Spark + AI Summit a few years ago, we presented our research in training collision detection for an autonomous drone experiment using Doom.
Due to the ability to build a world to fit our needs, we originally intended to use Minecraft. In 2014, Microsoft acquired Mojang, the game studio that created Minecraft. Two years later, Microsoft publicly unveiled Project Malmo, “a sophisticated AI experimentation platform built on top of Minecraft, and designed to support fundamental research in artificial intelligence.” You can check out Project Malmo here. However, Minecraft lacked the vehicles and associated driving behaviors we needed for the project.
We were introduced to AirSim, an open source simulator for autonomous vehicles built on the Unreal Engine from Microsoft AI & Research. Based upon the demos, it appeared to have everything we needed to generate our video data. You can check out AirSim here. However, building AirSim on a MacBook was proving to take more time than anticipated. The documentation did note that “It should be possible to build AirSim on OSX as well, but it isn’t actively tested.” Yet again, we had to find a different way.
The solution to our problem turned out to be one of the most ambitious simulations of a city available: Grand Theft Auto V. It was created by Rockstar North and the studio took great care in attempting to recreate Los Angeles (Los Santos as it is referred to in the game). The studio sent out multiple research teams throughout Los Angeles and shot over 250,000 images and hours of video. From the Los Angeles International Airport and Beverly Hills to landmarks such as the Hollywood sign and the Griffith Observatory, Grand Theft Auto V had all the elements we needed to generate our video data.
The game includes a director mode, which allowed us to control traffic density, pedestrian population, time of day, weather, and camera angle. Camera control would prove to be the most beneficial as our early attempts in the game started with hovering over a particular area of the city in a helicopter.
This approach saved weeks of time. We avoided having to program traffic simulations and randomization patterns. It provided high enough fidelity that we avoided having to travel to physical locations to film the video footage we need (and avoid getting arrested). It provided the flexibility necessary to generate hours worth of relevant training and testing data. Using this data, we were able to train various algorithms for object identification and tracking. In addition, the video data is used to train activity similarity models and improve overall accuracy of the models.
Although using the game to train machine learning models may not be what the designers had in mind, the approach proved to be a quick and efficient way of generating pedestrian and vehicles-in-motion activity. Unfortunately, we didn’t have programmatic access to the game (we used a Xbox) so we had to actually play the game to get the locations and activity we wanted. The sacrifices we make in the name of data science…