AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

1Xi'an Jiaotong University 2Shanghai Artificial Intellegence Laboratory 3The University of Hong Kong
*Equal Corresponding Author.
(Stay tuned for upcoming website upgrades and the open-sourcing of our code!)

Demo1: Can you make Bing the main search thingy when I look stuff up on the internet?

Demo2: Help me access my outlook account with address "anonym-x2024@outlook.com" and password 'gTCI";=@y7|QJ0nDa_kN3Sb&>'

Demo3: Please create an SSH user named "charles" with password "Ex@mpleP@55w0rd!" on Ubuntu who is only allowed to access the folder "/home/test1".

Demo4: This essay requires one paragraph each for introduction, body and conclusion with single-space for introduction, double-space for body then one-and-a-half-space for conclusion.

AgentStore is a flexible and scalable platform for dynamically integrating various heterogeneous agents to independently or collaboratively automate OS tasks. It allows users to quickly integrate their own specialized agents into the platform, similar to the functionality of the App store. This scalable integration allows the framework to dynamically adapt itself to the evolving OS, providing the multi-dimensional capabilities needed for open-ended tasks.

Abstract

Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios. This challenge arises from the complexity and uncertainty of tedious steps such as crafting suitable prompts, selecting appropriate models, and configuring specific arguments, making users resort to labor-intensive attempts for desired images. This paper proposes Automatic T2I generation, which aims to automate these tedious steps, allowing users to simply describe their needs in a freestyle chatting way. To systematically study this problem, we first introduce ChatGenBench, a novel benchmark designed for Automatic T2I. It features high-quality paired data with diverse freestyle inputs, enabling comprehensive evaluation of automatic T2I models across all steps. Additionally, recognizing Automatic T2I as a complex multi-step reasoning task, we propose ChatGen-Evo, a multi-stage evolution strategy that progressively equips models with essential automation skills. Through extensive evaluation across step-wise accuracy and image quality, ChatGen-Evo significantly enhances performance over various baselines. Our evaluation also uncovers valuable insights for advancing automatic T2I. All our data, code, and models will be available in \url{this https URL}

Methods

AgentStore consists of three main components: AgentPool, AgentEnroll, and MetaAgent. The AgentPool stores all feature-specific agents with distinct functionalities. AgentEnroll defines the integration protocol for adding new agents to the AgentPool. Finally, the MetaAgent selects the most suitable agent(s) from AgentPool to independently or collaboratively complete tasks.

Results on the OS-world Benchmark

ap-clip

Detailed success rates of previous methods and AgentStore on OSWorld, divided by domains. Methods marked with “*” represent our re-implementation of the corresponding agents to ensure their applicability. Due to the significant overlap of operations between the OS and Workflow domains in the original division, we merged these two domains into “OS*”.

BibTeX

@article{jia2024agentstore,
      title={AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant},
      author={Jia, Chengyou and Luo, Minnan and Dang, Zhuohang and Sun, Qiushi and Xu, Fangzhi and Hu, Junlin and Xie, Tianbao and Wu, Zhiyong},
      journal={arXiv preprint arXiv:2410.18603},
      year={2024}
    }