Sinan Wang

2025

MultiRef: Controllable Image Generation with Multiple Visual References

Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Petr Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna
Accepted by ACM MM 2025
[Arxiv] [Hugging Face]
Research Area: Multimodal AIGC, Controllable Image Generation, Evaluation Benchmarking
Key Points: Developed RefBlend, a data engine for synthesizing complex multi-reference image-text pairs; analyzed the performance of agentic frameworks vs. end-to-end models.
Key Contributions: Curated the MultiRef dataset and benchmark, managed end-to-end open-source hosting on Hugging Face
Project Period: March 2025 - June 2025

Reinforced Visual Perception with Tools

Zetong Zhou, Dongping Chen, Zixian Ma, Zhihan Hu, Mingyang Fu, Sinan Wang, Yao Wan, Zhou Zhao, Ranjay Krishna
[Arxiv]
Research Area: MLLM, Reinforcement Learning, Visual Reasoning, Tool usage
Key Points: Integrating GRPO to empower MLLMs to autonomously utilize visual tools for complex reasoning; developed a novel RL-based training pipeline for tool-augmented perception.
Key Contributions: Contributed to baseline reproduction and dataset construction to enable effective RL-driven training for visual tool-usage.
Project Period: January 2025 - May 2025