Sayan Deb Sarkar

I'm a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). In summer '25, I interned with the Microsoft Spatial AI Lab, working on efficient video understanding in spatial context.

Before starting PhD, I was a CS master student at ETH Zürich, supervised by Prof. Marc Pollefeys, working on aligning real-world 3D environments from multi-modal data. I graduated with a Bachelors in Information Technology from Manipal University, India, where I spent time working on face recognition and medical imaging problems.

In 2020-21, I spent a wonderful time working with Shreyas Hampali and Mahdi Rad at Prof. Vincent Lepetit's lab on hand-object pose estimation and monte carlo scene search for 3D scene understanding. I view them as mentors entering research, and strive to learn from them.

My research interests are on multimodal 3D scene understanding and interactive editing. I am always looking for research collaborations, get in touch if you have something relevant. If you're around the Bay Area, feel free to reach out for a cup of coffee!

Email  /  CV  /  Google Scholar  /  Github  /  Twitter  /  LinkedIn

profile photo
News
Research

My research interests lie at the intersection of Computer Vision and Machine Learning, specifically in the areas of multimodal data representations for spatial understanding.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
Sayan Deb Sarkar, Sinisa Stekovic, Vincent Lepetit, Iro Armeni
arXiv | Project Page | Video | Code
Neural Information Processing Systems (NeurIPS), 2025

A training-free method that steers pre-trained generative rectified flow with differentiable guidance for robust, geometry-aware 3D appearance transfer across shapes and modalities.

SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
Binod Singh*, Sayan Deb Sarkar*, Iro Armeni
arXiv | Project Page | Video | Code
arXiv 2025

3D Scene Graph alignment framework across modalities using open-vocabulary cues and learned joint embeddings, achieving robust performance under noise and low overlap.
Master Student Project.

CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv | Project Page | Video | Code
Computer Vision and Pattern Recognition (CVPR), 2025
🏆 Highlight (top 3%)
Featured: Open Robotics

Cross-modal alignment method for 3D scenes that learns a unified, modality-agnostic embedding space, enabling scene-level alignment without semantic annotations.

SGAligner: 3D Scene Alignment with Scene Graphs
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv | Project Page | Video | Code
International Conference on Computer Vision (ICCV), 2023
Featured: RSIP Computer Vision Magazine, Learn OpenCV Blog

3D Scene Graph Alignment robust to in-the-wild scenarios powering point cloud registration and map integration.

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2022
🏆 Oral (top 4.2%)
arXiv | Project Page | Video | Code

Efficient network for joint two-hand and object pose estimation in complex interactions, paired with the new H2O-3D dataset of two-hand interaction with YCB objects.

Monte Carlo Scene Search for 3D Scene Understanding
Shreyas Hampali*, Sinisa Stekovic*, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2021
arXiv | Project Page | Video | Code

Monte-Carlo Tree Search (MCTS) based analysis-by-synthesis method to recover complete scene (3D layout+objects) from a noisy RGB-D scan.

General 3D Room Layout from a Single View by Render-and-Compare
Sinisa Stekovic, Shreyas Hampali, Mahdi Rad, Sayan Deb Sarkar, Friedrich Fraundorfer, Vincent Lepetit
European Conference on Computer Vision (ECCV), 2020
arXiv | Project Page | Video | Code

3D layout estimation from a single perspective view, to recover complex non-cubiod layouts by solving a constrained discrete optimization problem.

Course Projects
prl

Ray Tracing
Computer Graphics Rendering Competition, Autumn Semester 2022

Implemented a ray tracer with functionalities such as advanced camera models, participating media, photon mapping, Disney BRDF, etc on the Nori framework.

Misc

  • Workshop Organisation: CV4AEC@CVPR 2023, 2024
  • Conference Review: CVPR, ICCV, ECCV, NeurIPS, ICRA
  • Teaching
    Teaching Assistant (Lead), Computer Vision For The Built Environment, Winter 2025

    Template adapted from this awesome website