|
Vaibhavi Singh
I am a graduate student in Computer Science at NYU Courant, specializing in spatial reasoning and planning for robot learning.
I aim to build better learning and representation systems that generalize broader and faster.
Prior to NYU, I built spatial rendering and imaging systems at Adobe, powering Photoshop, Substance 3D, and Creative Cloud — before moving through Salesforce and a healthcare AI startup working on noisy temporal data in clinical applications.
I am on the grad internship market and eager to connect for research opportunities!
Email /
LinkedIn /
GitHub /
Twitter /
Writings
|
|
GVR-Bench: Probing Visual Reasoning Abilities in Vision-Language Models
Geometric Reasoning, Visual Grounding, Spatial Transformations
Developed a systematic evaluation framework to probe the visual reasoning capabilities of Vision-Language Models in deterministic settings.
By engineering a suite of programmatic geometric tasks
(e.g., precise rotations, spatial translations), demonstrated a critical
dissociation between perceptual fidelity and logical execution. SOTA models achieved only 16.8% pixel-level accuracy despite maintaining high perceptual similarity. Our analysis established a formal error taxonomy—classifying failures into geometric imprecision, grounding errors, and hallucinations—providing empirical evidence that current end-to-end architectures require hybrid spatial computation modules for precise robotic manipulation.
Paper
Code
Slides
|
Tokenization Density vs. Scale: Dissecting Self-Supervised Learning Under Resource Constraints
Vision Transformers, Spatial Tokenization, Self-Supervised Learning, Fine-Grained Recognition
Trained self-supervised DiNOv1 models from scratch on low-resolution images (96×96 pixels) to investigate representation learning under severe resource constraints. Identified spatial tokenization density as the primary architectural bottleneck—switching from 16×16 patches (36 tokens/image) to 8×8 patches (144 tokens/image) improved fine-grained classification accuracy by 6 percentage points, outweighing gains from model depth or dataset scale. Demonstrated that strategically curated domain-aligned unlabeled data (74K task-relevant samples) provides superlinear returns compared to 10× larger generic datasets under compute constraints. Results reveal that feature tokenization granularity determines the discriminative capacity ceiling for constrained-resolution inputs, with implications for efficient vision encoders in robotics where spatial resolution and data efficiency are critical.
Paper
Slides
|
|
ICLR 2026 — Reviewer - Latent & Implicit Thinking Workshop,
CVPR 2026 — Reviewer - SynData4CV Workshop,
NeurIPS 2025,
ACL ARR 2026
|
Machine Learning Engineer
Healthcare AI Startup, India
2024 – 2025
Built clinical risk prediction models (XGBoost, TCN) achieving 0.87 F1-score through feature engineering, SMOTE for class imbalance, & hyperparameter optimization. Processed sparse EMR data for early-stage healthcare applications.
|
Software Engineer II
Salesforce, India
2023 – 2024
Engineered petabyte-scale data ingestion pipelines, reducing latency by 30% for Einstein AI & real-time analytics. Scaled multi-tenant Kubernetes infrastructure on AWS for 200+ microservices.
|
ML Systems Engineer (MTS II)
Adobe, India
2021 – 2023
Optimized heterogeneous compute (CPU/GPU) architectures for Photoshop and Creative Cloud applications, reducing latency for 20M+ users. Extended core C++ image-processing engines to handle complex image analysis and color rendering, ensuring high-throughput performance under strict SLAs.
|
Cloud Infrastructure Engineer (MTS I)
Adobe, India
2019 – 2021
Scaled distributed microservices for Adobe Cloud Platform, optimizing high-throughput request handling for 10M+ daily users. Reduced compute overhead by 12% through system-level performance profiling.
|
M.S. Computer Science (Machine Learning)
New York University, Courant Institute
2025 – 2027 (expected)
GPA: 3.89/4.00
Research focus: spatial reasoning and planning in foundation models
Coursework: Deep Learning (Yann LeCun), Computer Vision (Saining Xie)
|
B.E. Computer Engineering (Hons)
Netaji Subhas Institute of Technology, University of Delhi
2015 – 2019
First Class with Distinction
Graduated in the top 10% of the department
Recipient of EPFL-Swiss Government scholarship (Scala Days 2019)
Google Summer of Code Mentor, Anita Borg Institute
|
|
I like exercising, sketching, journalling and reading psychology a lot.
I love the gym, strength-training, colorful sketches, beautiful buildings,
writing endlessly and diving into the limits of human thinking and consciousness.
If you also find yourself enjoying any of these things, we should really connect!
|
Last updated: April 21, 2026
Website template from Jon Barron
|
|