The sensor platform consists of 4 cameras: two visible-band (color), two long wave infrared (thermal) cameras, as well as a LiDAR. The color cameras are Point Grey Flea2 cameras capturing at 1280 x 960 resolution, while the LWIR (thermal) cameras are Xenics Gobi-640-GigEs capturing at 640 x 480 resolution with a 50 mK thermal sensitivity. The cameras are synchronized by software trigger, and are mounted on a common baseline.
The LiDAR is a Trimble GX Advanced TLS, capable of capturing up to 5000 points per second with <2mm error up to a 50m depth. It is high-accuracy, but low speed. The common camera baseline was mounted facing the same direction very close to the LiDAR allowing us to scan the viewing volume by scanning an area of the scene with a similar angular extent. Each scan took around 8 minutes, resulting in a point cloud with an average of approximately 300,000 3D points per scene with an accuracy of less than 2mm between points. Since this capture time greatly exceeds the exposure time of the cameras, we scan static scenes. We do, however, collect posed scenes with pedestrians and vehicles that remain still for the duration of the scan to reflect more "natural" scenes.