DATA COLLECTION

Processing information collected by an unmanned aerial vehicle is broadly a three-step process:

Structure From Motion (SFM)

Multi-View Stereo (MVS)

Images rectification

 

These processes can be further understood into following sub-processes:

Image Feature Extraction

Each drone image has a collection of unique features which differentiate it from other images. These are known as key points. Key points from each image are extracted using automatic computer vision algorithms (SIFT, BRISK, etc.) These features consist of the building’s corner, roads, edges, etc.
Generally, images with good texture variation have 40,000+ features. It can be easily understood why photogrammetry performs poor in areas of low texture variation like water bodies, dense forest, sand, sky etc. Keypoints extraction becomes difficult in textureless surfaces.
Image shown below has three scenes Land/Bridge, Sand, Water regions. Each circle represents a unique feature.

Each circle represents a unique feature detected using BRISK detector
The density of features is very high in the leftmost and bridge region since it has a lot of edges, colour changes, etc.
 Very few features are detected in sand and water regions due to textureless surfaces.

Feature Matching

Extracted features are then searched (in the nearby images) and matching is performed. Using GPS data to search relevant images makes the matching process much faster and accurate. From matched features, fundamental matrix is derived and the relative position between two cameras is estimated. Techniques like Flann is often used to conduct search and match.

Bundle Adjustment (BA)

Relative position estimated from the fundamental matrix is generally prone to errors. BA is used to simultaneously refine the 3D coordinates (Lat, Long, Elevation), orientation parameters (Yaw, Pitch, Roll), and the optical characteristics (distortion parameters) of the camera(s) employed to acquire the images. BA is a nonlinear iterative optimization process where the objective function is Mean Reprojection Error (MRE) and parameters are the position, orientation and camera distortion coefficients. BA can be of two types – incremental or global.
Geotag data stored in images are used to georeference and scale the model.

Depth Map estimation and Point Cloud generation

Depth value is estimated for every pixel in the image using Multi-View Stereo algorithm. MVS algorithms are able to construct highly detailed 3D models from structured images. So, the output of SFM will act as an input to MVS algorithm. It will output the depth map corresponding to every input image.
Individual Depth Map is fused together with the depth map of the neighboring
image to obtain a 3D point. These points are often called as the dense point cloud. It may even consist of greater than 1 crore points for a relatively smaller area.

Digital Elevation Model (DEM)

3D Points are triangulated and gridded in 2.5 Dimension space to create 2.5D Digital Elevation Model (Raster). Every pixel in raster has latitude, longitude and elevation information. Interpolation techniques like IDW are often used to do 3D point cloud to 2.5D grid/raster conversion.

Orthomosaic (HD Maps)

Orthorectification of each photo is done using DEM. Orthorectification step involves creating a visibility or occlusion map with respect to each image. These maps tell us which pixels are visible or occluded (not visible) from a particular image. Only visible pixels are then selected and colour values are extracted. These orthorectified and occlusion-free photos are mosaiced together to create a large HD Map.
Once the processing is over; Post-processing techniques such as Image Blending, Color/Contrast adjustment are conducted to remove seam lines present on the boundary of images. Image blending uniforms the color and removes artifacts.

Drone Flight Training & Data Analysis