A new technique turns YouTube videos into 3D reconstructions of matches.


The key to the approach is a convolutional neural network—a type of artificial intelligence algorithm loosely modeled on the part of the brain that processes visual data—that researchers trained to estimate how far the surfaces of each player are from the camera that recorded the match. The network analyzed 12,000 2D images of players extracted from the soccer videogame FIFA alongside the corresponding 3D data from the game engine to learn how the two correlate. That allowed it to estimate depth maps for players from unseen 2D images. When shown unseen videos, the system accurately predicted depth maps for each player and combined them with the color footage to reconstruct each player in 3D. The players were then superimposed on a virtual soccer pitch allowing the match to be viewed in any 3D content viewer from a variety of angles.