top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Deep VIO

Project type

Deep learning based VIO

Date

April 2026

Location

Worcester, MA

Tech stack: PyTorch, ResNet-18, Bidirectional LSTM, FiLM Conditioning, Blender 4.5, 3D Gaussian Splatting, Python
A deep learning system for 6-DoF pose estimation that fuses monocular camera images with IMU data to estimate motion in real time. Built an end-to-end pipeline from synthetic data generation to model deployment, including three model variants — vision-only (DeepVO), IMU-only (DeepIO), and a fused visual-inertial model (DeepVIO) — to systematically isolate each modality's contribution. Solved the modality imbalance problem in multi-modal learning through FiLM-based feature conditioning and a two-stage training strategy.
Highlights

Built a synthetic data generation pipeline in Blender 4.5 with 3D Gaussian Splatting rendering, producing synchronized camera-IMU sequences across 4 trajectory families and multiple ground textures.
Diagnosed data quality as the primary bottleneck — 54% of initial training images were too featureless for visual odometry — and curated the dataset using Laplacian variance image-sharpness analysis.
Replaced a from-scratch CNN with a pretrained ResNet-18 backbone, cutting test loss by 90%.
Designed a FiLM-based fusion module where IMU features generate per-channel scale and shift parameters that modulate visual features, replacing naive concatenation.
Introduced a two-stage training strategy (visual pretraining → frozen-encoder fusion → joint finetuning) that achieved balanced modality fusion (gate α ≈ 0.43).

Results

92% reduction in test loss (0.3017 → 0.0237) across pipeline iterations.
1.5 m average ATE on test sequences with the ResNet-18 visual model; 3.5 m with the fused model.
First successful visual-inertial fusion in our experiments — fused model outperformed vision-only on the joint metric for the first time.
Interpretable gate analysis confirmed balanced fusion vs. IMU-dominated baseline (α = 0.07–0.18).

Good things take time

bottom of page