Information Density vs. Model Scale: Tradeoffs for Downstream Reasoning
Scaling Laws, Data-Efficient Learning, Foundation Models, Tokenization
Investigated architectural bottlenecks in representation learning that limit performance on complex reasoning tasks. Through systematic ablation, identified tokenization granularity as the primary constraint—4× finer encoding (144 vs 36 tokens/image) outweighed model depth and dataset scale for fine-grained problem-solving.
Paper
Slides