Sayan Deb Sarkar
I'm a 1st-year PhD student at Stanford University in the Gradient Spaces Group,
advised by Prof. Iro Armeni,
part of the Stanford Vision Lab (SVL).
Before starting PhD, I was a CS master student at ETH Zürich supervised by Prof. Marc Pollefeys, working on
aligning real-world 3D environments from multi-modal data. I graduated with a Bachelors in
Information Technology from Manipal University, India, where I spent time working on face recognition and medical imaging problems.
In 2020-21, I spent a wonderful time working with Shreyas Hampali,
Sinisa Stekovic and Mahdi Rad at
Prof. Vincent Lepetit's
lab on hand-object pose estimation and monte carlo scene search for 3D scene understanding.
I view them as mentors entering research, and strive to learn from them.
I am always looking for research collaborations, get in touch if you have something relevant.
If you're around the Bay Area, feel free to reach out for a cup of coffee!
Email  / 
CV  / 
Google Scholar  / 
Github  / 
Twitter  / 
LinkedIn
|
|
Research
My research interests lie at the intersection
of Computer Vision and Machine Learning, specifically in the areas of multimodal data representations for 3D scene understanding.
|
|
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv |
Project Page |
Video |
Code
Computer Vision and Pattern Recognition (CVPR), 2025
🏆 Highlight (top 3%)
Cross-modal alignment method for 3D scenes that learns a unified, modality-agnostic embedding space, enabling scene-level alignment without semantic annotations.
|
|
SGAligner: 3D Scene Alignment with Scene Graphs
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv |
Project Page |
Video |
Code
International Conference on Computer Vision (ICCV), 2023
3D Scene Graph Alignment robust to in-the-wild scenarios powering point cloud registration and map integration.
|
|
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2022
🏆 Oral (top 4.2%)
arXiv |
Project Page |
Video |
Code
Efficient network for joint two-hand and object pose estimation in complex interactions, paired with the new H2O-3D dataset of two-hand interaction with YCB objects.
|
|
Monte Carlo Scene Search for 3D Scene Understanding
Shreyas Hampali*, Sinisa Stekovic*, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2021
arXiv |
Project Page |
Video |
Code
Monte-Carlo Tree Search (MCTS) based analysis-by-synthesis method to recover complete scene (3D layout+objects) from a noisy RGB-D scan.
|
|
General 3D Room Layout from a Single View by Render-and-Compare
Sinisa Stekovic, Shreyas Hampali, Mahdi Rad, Sayan Deb Sarkar, Friedrich Fraundorfer, Vincent Lepetit
European Conference on Computer Vision (ECCV), 2020
arXiv |
Project Page |
Video |
Code
3D layout estimation from a single perspective view, to recover complex non-cubiod layouts by solving a constrained discrete optimization problem.
|
Misc
Workshop Organisation: CV4AEC@CVPR 2023, 2024
Conference Review: CVPR, ICCV, ECCV
|
|