Sayan Deb Sarkar

I'm a 1st-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL).

Before starting PhD, I was a CS master student at ETH Zürich supervised by Prof. Marc Pollefeys, working on aligning real-world 3D environments from multi-modal data. I graduated with a Bachelors in Information Technology from Manipal University, India, where I spent time working on face recognition and medical imaging problems.

In 2020-21, I spent a wonderful time working with Shreyas Hampali, Sinisa Stekovic and Mahdi Rad at Prof. Vincent Lepetit's lab on hand-object pose estimation and monte carlo scene search for 3D scene understanding. I view them as mentors entering research, and strive to learn from them.

I am always looking for research collaborations, get in touch if you have something relevant. If you're around the Bay Area, feel free to reach out for a cup of coffee!

Email  /  CV  /  Google Scholar  /  Github  /  Twitter  /  LinkedIn

profile photo
News
Research

My research interests lie at the intersection of Computer Vision and Machine Learning, specifically in the areas of multimodal data representations for 3D scene understanding.

CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv | Project Page | Video | Code
Computer Vision and Pattern Recognition (CVPR), 2025
🏆 Highlight (top 3%)

Cross-modal alignment method for 3D scenes that learns a unified, modality-agnostic embedding space, enabling scene-level alignment without semantic annotations.

SGAligner: 3D Scene Alignment with Scene Graphs
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv | Project Page | Video | Code
International Conference on Computer Vision (ICCV), 2023

3D Scene Graph Alignment robust to in-the-wild scenarios powering point cloud registration and map integration.

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2022
🏆 Oral (top 4.2%)
arXiv | Project Page | Video | Code

Efficient network for joint two-hand and object pose estimation in complex interactions, paired with the new H2O-3D dataset of two-hand interaction with YCB objects.

Monte Carlo Scene Search for 3D Scene Understanding
Shreyas Hampali*, Sinisa Stekovic*, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2021
arXiv | Project Page | Video | Code

Monte-Carlo Tree Search (MCTS) based analysis-by-synthesis method to recover complete scene (3D layout+objects) from a noisy RGB-D scan.

General 3D Room Layout from a Single View by Render-and-Compare
Sinisa Stekovic, Shreyas Hampali, Mahdi Rad, Sayan Deb Sarkar, Friedrich Fraundorfer, Vincent Lepetit
European Conference on Computer Vision (ECCV), 2020
arXiv | Project Page | Video | Code

3D layout estimation from a single perspective view, to recover complex non-cubiod layouts by solving a constrained discrete optimization problem.

Course Projects
prl

Ray Tracing
Computer Graphics Rendering Competition, Autumn Semester 2022

Implemented a ray tracer with functionalities such as advanced camera models, participating media, photon mapping, Disney BRDF, etc on the Nori framework.

Misc

  • Workshop Organisation: CV4AEC@CVPR 2023, 2024
  • Conference Review: CVPR, ICCV, ECCV
  • Teaching
    Teaching Assistant (Lead), Computer Vision For The Built Environment, Winter 2025

    Template adapted from this awesome website