Creator Economy, AI & retail-tech investor | J.D. & J.M. | PNG collector | Musical lover | Happy to chat, using Cal.com link below to book calls (https://cal.com/darrenli)
Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors.
This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call
First, here a generalized framework for an autonomous agent :
Initialize Goal : Define the objective for the AI.
Task Creation : The AI checks its memory for the last X tasks completed (if any), and then uses it’s objective, and the context of it’s recently completed tasks, to generate a list of new tasks.
Analysis of safety preparations and evaluations for GPT-4V, a multimodal language model with image analysis capabilities, including early access testing, red teaming, and mitigations for potential risks and limitations.