
OmniParser V2: Turning Any LLM into a Computer Use Agent - Microsoft Research

OmniParser V2 takes this capability to the next level. Compared to its predecessor (opens in new tab), it achieves higher accuracy in detecting smaller interactable elements and faster inference, making it a useful tool for GUI automation. In particular, OmniParser V2 is trained with a larger set of interactive element detection data and icon funct... See more