1. Objective

BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

Demo

Why selected BundleSDF?

Near real-time (10Hz)
Lightweight neural implicit model "Neural Object Field"
Novel unknown dynamic object

How about map building with the idea of BundleSDF?

Settings of BundleSDF

Use RGBD camera
Assume rigid object
Require object segmentation mask for the first frame

2. Approach

2.1. Pose Tracking

2.1.1. Object Extraction

2.1.2. RANSAC Pose Estimation

2.1.3. Memory Pool

Stores keyframes
Minimizes the long-term pose drift using past keyframes
Keyframe: Stores RGBD image and estimated pose

2.1.4. Pose Graph Optimization

1. Select K memory frames with the maximum viewing overlap
2. Solve the entire pose graph optimization via the Gauss-Newton algorithm with iterative re-weighting
3. Update both new keyframe and K memory frames

2.2. 3D Reconstruction - Neural Object Field

2.3. Rendering

Efficient Ray Sampling & Hybrid SDF Modeling

Sample points only within a certain range from the surface of the model. Divide the space into 3 types:

Yellow: uncertain free space
Red: empty space
Blue: near-surface space

Neural Object Field Training Loss

3. Metrics

6-DoF Pose Estimation

ADD(-S): Average Distance of the estimated pose to the ground truth pose

$\text{ADD}(\hat{T}, T) = \frac{1}{N} \sum_{i=1}^N \min_{\hat{R}, \hat{t}} \| \hat{R} x_i + \hat{t} - (R x_i + t) \|_2$

$\text{ADD-S}(\hat{T}, T) = \frac{1}{N} \sum_{i=1}^N \min_{\hat{R}, \hat{t}} \| \hat{R} x_i + \hat{t} - (R x_i + t) \|_2 + \min_{\hat{R}, \hat{t}} \| \hat{R}^{-1} x_i + \hat{t} - (R x_i + t) \|_2$

3D Reconstruction

Chamfer Distance: Calculate distance between the vertices of result mesh and the ground truth mesh

$\text{CD}(M, \hat{M}) = \frac{1}{|M|} \sum_{x \in M} \min_{y \in \hat{M}} \|x - y\|_2 + \frac{1}{|\hat{M}|} \sum_{y \in \hat{M}} \min_{x \in M} \|x - y\|_2$