Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

HomographyNet: Supervised & Unsupervised Deep Homography Estimation

Project type

University course project

Date

February 2026

Location

Worcester, MA

Tech stack: PyTorch, OpenCV, NumPy, CNN, Spatial Transformer Networks, Tensor DLT, MSCOCO DatasetA deep learning approach to homography estimation that replaces the entire classical panorama-stitching pipeline — corner detection, ANMS, feature description, matching, and RANSAC — with a single end-to-end CNN. Implemented two variants: a supervised model trained on synthetic ground-truth homographies, and an unsupervised model that learns directly from photometric consistency without any homography labels. Built the full data generation pipeline, network architectures, and training loop from scratch using image patches sampled from MSCOCO.Highlights

Built a synthetic data generation pipeline from MSCOCO images: random patch extraction with bounded "active region" sampling to guarantee post-warp validity, random corner perturbation in [−ρ, ρ] plus translation, and inverse-homography warping (via cv2.getPerspectiveTransform + cv2.warpPerspective) to extract the second patch without introducing black pixels or masks.
Adopted the 4-point homography parameterization (H4Pt = C_B − C_A) instead of regressing the 9 values of H directly — the parameterization the original paper showed yields dramatically better results.
Supervised model: Implemented a VGG-style CNN that takes depth-stacked patch pairs (Mₚ × Nₚ × 2K) and regresses the 8-D corner displacement vector. Trained with L2 loss against the ground-truth H4Pt.
Unsupervised model: Implemented the photometric variant with two custom differentiable layers:

Tensor DLT — converts the predicted 4-point homography and the source patch corners into the full 3×3 homography matrix in a differentiable manner, enabling gradient flow through the geometric transform.
Spatial Transformer Network — a differentiable bilinear-interpolation warping layer that applies the estimated H to patch P_A.
Trained with the photometric loss ℓ = ‖w(P_A, H4Pt) − P_B‖₁ — no homography labels used during training, only the warped-vs-target patch reconstruction error.

Demonstrated end-to-end the classical → learned progression: the same task that requires 6 hand-engineered stages in MyAutoPano (Harris + ANMS + descriptors + ratio-test matching + RANSAC + DLT) is collapsed into a single forward pass of a CNN.

supervised_gnd_truth_classical_comparision

.pro-gallery-wix-wrapper {display: block !important;} .pro-gallery-wix-wrapper .gallery-item-container {opacity: 1 !important; display: block !important;}