2025
ACM MM 2025
MultiRef: Controllable Image Generation with Multiple Visual References
- Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Petr Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna
- Accepted by ACM MM 2025
- [Arxiv] [Hugging Face]
- Research Area: Multimodal AIGC, Controllable Image Generation, Evaluation Benchmarking
- Key Points: Developed RefBlend, a data engine for synthesizing complex multi-reference image-text pairs; analyzed the performance of agentic frameworks vs. end-to-end models.
- Key Contributions: Curated the MultiRef dataset and benchmark, managed end-to-end open-source hosting on Hugging Face
- Project Period: March 2025 - June 2025
Preprints
Reinforced Visual Perception with Tools
- Zetong Zhou, Dongping Chen, Zixian Ma, Zhihan Hu, Mingyang Fu, Sinan Wang, Yao Wan, Zhou Zhao, Ranjay Krishna
- [Arxiv]
- Research Area: MLLM, Reinforcement Learning, Visual Reasoning, Tool usage
- Key Points: Integrating GRPO to empower MLLMs to autonomously utilize visual tools for complex reasoning; developed a novel RL-based training pipeline for tool-augmented perception.
- Key Contributions: Contributed to baseline reproduction and dataset construction to enable effective RL-driven training for visual tool-usage.
- Project Period: January 2025 - May 2025