Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Deep VIO
Project type
Deep learning based VIO
Date
April 2026
Location
Worcester, MA
Tech stack: PyTorch, ResNet-18, Bidirectional LSTM, FiLM Conditioning, Blender 4.5, 3D Gaussian Splatting, Python
A deep learning system for 6-DoF pose estimation that fuses monocular camera images with IMU data to estimate motion in real time. Built an end-to-end pipeline from synthetic data generation to model deployment, including three model variants — vision-only (DeepVO), IMU-only (DeepIO), and a fused visual-inertial model (DeepVIO) — to systematically isolate each modality's contribution. Solved the modality imbalance problem in multi-modal learning through FiLM-based feature conditioning and a two-stage training strategy.
Highlights
Built a synthetic data generation pipeline in Blender 4.5 with 3D Gaussian Splatting rendering, producing synchronized camera-IMU sequences across 4 trajectory families and multiple ground textures.
Diagnosed data quality as the primary bottleneck — 54% of initial training images were too featureless for visual odometry — and curated the dataset using Laplacian variance image-sharpness analysis.
Replaced a from-scratch CNN with a pretrained ResNet-18 backbone, cutting test loss by 90%.
Designed a FiLM-based fusion module where IMU features generate per-channel scale and shift parameters that modulate visual features, replacing naive concatenation.
Introduced a two-stage training strategy (visual pretraining → frozen-encoder fusion → joint finetuning) that achieved balanced modality fusion (gate α ≈ 0.43).
Results
92% reduction in test loss (0.3017 → 0.0237) across pipeline iterations.
1.5 m average ATE on test sequences with the ResNet-18 visual model; 3.5 m with the fused model.
First successful visual-inertial fusion in our experiments — fused model outperformed vision-only on the joint metric for the first time.
Interpretable gate analysis confirmed balanced fusion vs. IMU-dominated baseline (α = 0.07–0.18).































