Publications

2025

ACM MM 2025

MultiRef: Controllable Image Generation with Multiple Visual References

  • Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Petr Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna
  • Accepted by ACM MM 2025
  • [Arxiv] [Hugging Face]
  • Research Area: Multimodal AIGC, Controllable Image Generation, Evaluation Benchmarking
  • Key Points: Developed RefBlend, a data engine for synthesizing complex multi-reference image-text pairs; analyzed the performance of agentic frameworks vs. end-to-end models.
  • Key Contributions: Curated the MultiRef dataset and benchmark, managed end-to-end open-source hosting on Hugging Face
  • Project Period: March 2025 - June 2025

Preprints

Reinforced Visual Perception with Tools

  • Zetong Zhou, Dongping Chen, Zixian Ma, Zhihan Hu, Mingyang Fu, Sinan Wang, Yao Wan, Zhou Zhao, Ranjay Krishna
  • [Arxiv]
  • Research Area: MLLM, Reinforcement Learning, Visual Reasoning, Tool usage
  • Key Points: Integrating GRPO to empower MLLMs to autonomously utilize visual tools for complex reasoning; developed a novel RL-based training pipeline for tool-augmented perception.
  • Key Contributions: Contributed to baseline reproduction and dataset construction to enable effective RL-driven training for visual tool-usage.
  • Project Period: January 2025 - May 2025