|
Sayan Deb Sarkar
I'm a 2nd-year PhD student at Stanford University in the Gradient Spaces Group,
advised by Prof. Iro Armeni,
part of the Stanford Vision Lab (SVL). In summer '25, I interned with the Microsoft Spatial AI Lab, working on efficient video understanding in spatial context.
Before starting PhD, I was a CS master student at ETH Zürich, supervised by Prof. Marc Pollefeys, working on
aligning real-world 3D environments from multi-modal data. I graduated with a Bachelors in
Information Technology from Manipal University, India, where I spent time working on face recognition and medical imaging problems.
In 2020-21, I spent a wonderful time working with Shreyas Hampali and Mahdi Rad at
Prof. Vincent Lepetit's
lab on hand-object pose estimation and monte carlo scene search for 3D scene understanding.
I view them as mentors entering research, and strive to learn from them.
My research interests are on multimodal 3D scene understanding and interactive editing. I am always looking for research collaborations, get in touch if you have something relevant.
If you're around the Bay Area, feel free to reach out for a cup of coffee!
Email  / 
CV  / 
Google Scholar  / 
Github  / 
Twitter  / 
LinkedIn
|
|
|
Research
My research interests lie at the intersection of Computer Vision and Machine Learning, specifically in the areas of multimodal data representations for spatial understanding.
|
|
|
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
Sayan Deb Sarkar, Sinisa Stekovic, Vincent Lepetit, Iro Armeni
arXiv |
Project Page |
Video |
Code
Neural Information Processing Systems (NeurIPS), 2025
A training-free method that steers pre-trained generative rectified flow with differentiable guidance for robust, geometry-aware 3D appearance transfer across shapes and modalities.
|
|
|
SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
Binod Singh*, Sayan Deb Sarkar*, Iro Armeni
arXiv |
Project Page |
Video |
Code
arXiv 2025
3D Scene Graph alignment framework across modalities using open-vocabulary cues and learned joint embeddings, achieving robust performance under noise and low overlap.
Master Student Project.
|
|
|
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv |
Project Page |
Video |
Code
Computer Vision and Pattern Recognition (CVPR), 2025
🏆 Highlight (top 3%)
Featured: Open Robotics
Cross-modal alignment method for 3D scenes that learns a unified, modality-agnostic embedding space, enabling scene-level alignment without semantic annotations.
|
|
|
SGAligner: 3D Scene Alignment with Scene Graphs
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Dániel Béla Baráth, Iro Armeni
arXiv |
Project Page |
Video |
Code
International Conference on Computer Vision (ICCV), 2023
Featured: RSIP Computer Vision Magazine, Learn OpenCV Blog
3D Scene Graph Alignment robust to in-the-wild scenarios powering point cloud registration and map integration.
|
|
|
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2022
🏆 Oral (top 4.2%)
arXiv |
Project Page |
Video |
Code
Efficient network for joint two-hand and object pose estimation in complex interactions, paired with the new H2O-3D dataset of two-hand interaction with YCB objects.
|
|
|
Monte Carlo Scene Search for 3D Scene Understanding
Shreyas Hampali*, Sinisa Stekovic*, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit
Computer Vision and Pattern Recognition (CVPR), 2021
arXiv |
Project Page |
Video |
Code
Monte-Carlo Tree Search (MCTS) based analysis-by-synthesis method to recover complete scene (3D layout+objects) from a noisy RGB-D scan.
|
|
|
General 3D Room Layout from a Single View by Render-and-Compare
Sinisa Stekovic, Shreyas Hampali, Mahdi Rad, Sayan Deb Sarkar, Friedrich Fraundorfer, Vincent Lepetit
European Conference on Computer Vision (ECCV), 2020
arXiv |
Project Page |
Video |
Code
3D layout estimation from a single perspective view, to recover complex non-cubiod layouts by solving a constrained discrete optimization problem.
|
|
Misc
Workshop Organisation: CV4AEC@CVPR 2023, 2024
Conference Review: CVPR, ICCV, ECCV, NeurIPS, ICRA
|
|