This text is replaced by the Flash movie.


 Sponsors & Supporters










The ability to digitally capture the 3D morphology of a specimen has revolutionised palaeontology over recent years. Working in the virtual realm permits investigators to section, profile, maniulate and colour a specimen in ways that would otherwise be difficult or impossible, and often destructive, if applied to the real fossils. The use of digital models has facilitated a wide range of research into areas including locomotion, feeding, body mass calculations, documentation, conservation, hydro- and aerodynamics, and many others (Anderson et al., 2011; Bates et al., 2012; Bates and Falkingham, 2012; Bates et al., 2009; Falkingham et al., 2009; Farlow et al., 2012; Gidmark et al., 2013; Hutchinson et al., 2011; Panagiotopoulou et al., 2011; Rayfield, 2007; Sellers et al., 2012). Equally importantly, digitization of specimens in the internet age has enabled an unprecedented level of data sharing and collaboration, exemplified by online repositories such as Digimorph (, and online journals such as this, which enable digital models to accompany publications as electronic supplementary material (see appendices).

Until recently, however, digitization of fossils remained the purview of those with access to expensive hardware such as computed tomography (CT) and laser scanners, or expensive software-based photogrammetric solutions. While CT machines remain a requirement for internal morphology, methods with which to digitize external morphology have reached such a low cost that they have become available to anyone. Photogrammetric techniques can now be employed with a basic consumer camera, desktop PC, and free software (Falkingham, 2012).

One of the major developments in photogrammetric software most recently has been incorporating the GPU, or Graphics Processing Unit, found in many modern desktop and laptop computers - particularly in machines built for running computer games. The GPU contains many cores, far exceeding the number of cores found on the CPU, or Central Processing Unit (the 'processor' of the computer). While GPU cores are more specialized than those on the CPU, they can be used to dramatically speed up 3D applications (providing the software is written to take advantage of the GPU). Importantly, unlike more industrial hardware such as laser scanners, GPUs form part of a major consumer driven market - the video game industry - which drives prices down and processing power up at exceptional rates. Coupled with the falling cost of rapid-prototype machines (3D printers), there is now also a growing consumer demand for 3D digitization, which means that palaeontologists can take advantage of software and hardware developments at prices determined by aggressive market forces.

One such recent development is in using depth sensors, designed for interacting with computer games without a traditional controller, to scan and digitize objects and environments. Such sensors first came into being (at least in the mainstream) with the Microsoft KinectTM for Xbox 360®, released for the gaming console in 2010. This was later followed by the Microsoft KinectTM for Windows®, and the Asus® Xtion Pro, which possess similar specifications but are designed for developers and a personal computer environment.

Although developed for computer games, the low cost depth sensor was quickly 'hacked' for other uses, particularly in the fields of robotics (Stowers et al., 2011) and rehabilitation (Chang et al., 2011). Of pertinence here is that one of these uses included real time mapping of the environment, and tracking of the sensor (Newcombe et al., 2011). Despite initially being developed for robots navigating environments, the 3D mapping functionality can be used to create digital 3D models of objects or environments, and software applications designed for this purpose are now available (Izadi et al., 2011). These applications are possible because of the recent increases in consumer GPUs; being able to record a point cloud from the depth sensor, and in real time mesh that point cloud to produce a 3D model.

In this paper, I aim to demonstrate the applicability of these gaming peripherals and associated software packages for scanning and digitising palaeontological specimens.






Throughout this paper the Microsoft®  KinectTM for Windows® was used (available for around $150). The KinectTM sensor incorporates an RGB camera, recording at 640x480 pixels, and a depth sensor which uses an infrared laser. Alternatives to this include the original KinectTM for Xbox 360®, or the Asus® Xtion PRO, both available for a similar price to the KinectTM for Windows®, but with minor differences.  Both use an open source driver to communicate with the PC, making them compatible with Windows, Unix, and Mac operating systems, while the KinectTM for Windows® requires a Windows PC. The  KinectTM for Windows® has a shorter minimum range compared with the other options (40cm vs 80 cm) enabling closer scanning, while the Xtion PRO benefits from being powered by the USB cable, thereby offering greater mobility if attached to a laptop or tablet PC. It is worth noting that because the KinectTM for Xbox® has been predominantly developed for, and sold to, the computer game market, devices are available for very low cost (<$80 at the time of writing). The sensor was connected to a Laptop running Windows 7, containing an Intel® 8-core CPU (2.20 GHz) and an Nvidia GTX 560M GPU.



There are currently several software options available for using the KinectTM or Xtion PRO as a 3D scanner, including both commercial and non-commercial programs. Most programs will work equally with either hardware system. The two major programs that are relatively mature at this stage are ReconstructMe (, a commercial code with an additional non-commercial license available, and Kinfu, part of the Point Cloud Library (, available as part of the pre-release v1.7 source.

For this paper, the non-commercial console version of ReconstructMe was used. In order to capture an object or environment in 3D, the sensor is held in the hand and moved slowly over the specimen to be digitised. The depth sensor records xyz coordinates of points within the scene which are meshed in real time (producing a solid digital model rather than a cloud of points). If movements of the sensor are made too quickly, and the scene changes too much for the software to track, recording will pause and the user must return the sensor to the last known position (as shown on screen). This loss of tracking can be particularly troublesome when first starting to use the sensor, but experience in moving the sensor by hand will result in more stable data acquisition.



In order to illustrate the abilities - and weaknesses - of this technology for palaeontology, three specimens of varying size and complexity were digitized: A block containing dinosaur tracks (block ~30 cm across, tracks ~10 cm in length), an Elephant tooth (~30 cm), and a mounted skeleton of a Pronghorn (Antilocapra americana) (~1.2 m length). The specimens were sourced from the teaching and research collections at Brown University. Both the Elephant tooth and the track block were digitized laid on a table, and so only one side was recorded.


Additional comparative data

For comparison of the results of scanning with the Kinect™ with more commonly used digital acquisition techniques, photogrammetric models were produced of all specimens.  Photographs were taken with a Sony Nex-6 and 16-50mm lens. All photos were taken at 16mm focal length and a resolution of 4912 x 3264. The number of photographs was arbitrarily chosen in each case so as to maximize coverage of the specimens, whilst maintaining a high resolution of the specimen. The photographs were processed using VisualSFM (Wu, 2007, 2011; Wu et al., 2011) for the sparse reconstruction, and CMVS/PMVS (Furukawa et al., 2010a, b; Furukawa and Ponce, 2010) through the VisualSFM application for the dense reconstruction. This was carried out on the same laptop used for scanning with the Kinect™ sensor.

The time taken to generate the models with both methods was noted (Table 1) for comparison. Data acquisition was defined as the time taken to move the Kinect™ over the specimen and record the digital model, or the total time between first and last photograph in the case of the photogrammetric models. Data processing time was zero for the Kinect™ as ending the scan results in a complete 3D model almost instantaneously. For the photogrammetry, data processing time included the time taken for feature detection, feature matching, sparse reconstruction, and dense reconstruction. Note that in all three cases of photogrammetry, the dense reconstruction accounted for over 90% of the total time. Both photogrammetric and Kinect™ models were then cleaned of extraneous points using the cropping tool in CloudCompare v2.4 (, though the time taken to do this (1-2 minutes in each case) was not recorded as the same process, a simple crop, was used for both methods.

Models were compared using the cloud/mesh distance tool in CloudCompare, after scaling the photogrammetric models accordingly (because photogrammetry is a scale-less method, an object of known size, such as a scale bar, must be included in the final model such that the resultant point cloud can be correctly scaled). The output mesh produced by the Kinect™ and ReconstructMe was compared directly to the dense photogrammetric point cloud, rather than meshing the point cloud first. This method of comparison was chosen for two reasons, firstly, producing a mesh from the dense output of VisualSFM and PMVS/CMVS is a process highly dependent on user inputted variables, and as such meshes can vary in quality, and in relation to the raw data, depending on workflow.  Secondly, generating meshes for objects such as the Pronghorn skeleton are difficult because of the proximity of features such as the ribs, and often require clean-up in post-processing, adding a second subjective source of error (see discussion). That the output from the Kinect™ is in meshed format, as opposed to a point cloud, will be addressed in the discussion section.





Data acquisition

time (mins)

Data Processing

time (mins)


of Vertices

Elephant molar (lateral)






1:27 (42 pictures)









1:16 (40 pictures)



Dinosaur Track






1:00 (29 pictures)






The outputs of the Kinect™ sensor and ReconstructMe are presented in Figures 1-3, alongside photogrammetric models and comparison data (also see appendices). The resolution of detail in resultant models is considerably lower in the Kinect™ models than the photogrammetric models, as is particularly evident in the track and elephant tooth, where the Kinect™ models clearly show a lack of finer features compared to the photogrammetric point cloud. Resolution of the Kinect is on the order of 5-10 mm in the best case, while the photogrammetric methods can resolve details at the millimetre scale in this instance. The Kinect™ models are natively scaled correctly however, and as such can be measured directly in any 3D modelling package. This is in contrast to the photogrammetric method where models must be scaled by the user according to an object of known size within the final 3D model.

The block containing dinosaur tracks proved to be at the lower limits of usefulness in digitizing with the Kinect™, with details poorly resolved (Figure 1). The model produced using the Kinect™ appears smoothed in relation to the physical specimen or photogrammetric model. However, despite this smoothing, the model is reasonably accurate, the largest inaccuracies occurring around the tips of digits, where the relief becomes smaller than the 5 mm resolution of the Kinect™.  In these areas, the photogrammetric point cloud and Kinect™ mesh differ by ~5 mm, though the majority of the surface of the block differs only by 1-2 mm.

As with the track block, the Elephant tooth was also more smoothed in the Kinect™ mesh, though the overall morphology was well captured (Figure 2). The primary location of difference between the Kinect™ and photogrammetric point cloud was on the tooth roots, where the tips were not recorded by the Kinect™ resulting in 16 mm maximum difference between methods. Smaller features such as the cracks were recorded by photogrammetry, but lost entirely in the Kinect™ model. Again however, most of the model maintained just a few mm difference to the photogrammetric point cloud.

The mounted Pronghorn skeleton offered a different challenge, as the general morphology was much larger and more complex than that of either the track block or Elephant tooth. The vertebrae were poorly differentiated by the
Kinect™, but this is also true of the photogrammetric model, albeit to a lesser extent (Figure 3). There are some areas where the Kinect™ failed to resolve detail, particularly the tips of the snout and horns, and the transverse processes of the lumbar vertebrae. Conversely, the Kinect™ model successfully recorded more of the tail than did the photogrammetric reconstruction. The maximum difference between photogrammetric point cloud and Kinect™ mesh (300 mm) however is in the poles located between fore- and hind limbs, forming part of the mount. These poles were only 10 mm in diameter, and were not fully recorded by the Kinect™.

Figure 1 - Top left, the fossil track digitised. Top right, the photogrammetric point cloud produced using VisualSFM. Bottom left, the 3D model generated using the Kinect™ and ReconstructMe. Bottom right, the result of calculating the cloud-mesh distances.  The models are generally very correspondent (mostly within +/- 1.5mm), though the Kinect™ mesh is clearly lacking the finer details.


Figure 2 - Top left, the Elephant tooth used. Top right, the photogrammetric point cloud produced using VisualSFM. Bottom left, the 3D model generated using the Kinect™ and ReconstructMe. Bottom right, the result of calculating the cloud-mesh distances.  Note that for the majority of the specimen, differences are limited to +/- 5 mm, but some of the more complex morphology, particularly the roots, is up to 16 mm different between the Kinect model and the photogrammetry model.



Figure 3 - Top Left image of the mounted Pronghorn skeleton. Top Right , the photogrammetric point cloud produced using VisualSFM. Bottom left, the 3D model generated using the Kinect™ and ReconstructMe. Bottom right, the cloud-mesh distances.  Note that although the maximum and minimum distances seem very high, these values are generally limited to the poles between the legs which were too small to be recorded by the Kinect.


Figure 4 - Difficulties in meshing. Top, the photogrammetric point cloud meshed using the Poisson Surface Reconstruction feature in CloudCompare, Octree depth 10.  Bottom, the mesh produced by the Kinect and ReconstructMe.




Copyright © 2006

Optimized  IE 1024x768

About us    |  Contact us    |