GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6
š„¤ Cola [NeurIPS 2023]
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,ā ,ā„ Bo Li*,ā„ Sheng Shen⣠Jingkang Yangā„
Chunyuan Liā Kurt Keutzer⣠Trevor Darrell⣠Ziwei Liuā,ā„
ā„S-Lab, Nanyang Technological University
ā£University of California, Berkeley ā Microsoft Research, Redmond
*Equal Contribution ā Project Lead āCorresponding Author... See more
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,ā ,ā„ Bo Li*,ā„ Sheng Shen⣠Jingkang Yangā„
Chunyuan Liā Kurt Keutzer⣠Trevor Darrell⣠Ziwei Liuā,ā„
ā„S-Lab, Nanyang Technological University
ā£University of California, Berkeley ā Microsoft Research, Redmond
*Equal Contribution ā Project Lead āCorresponding Author... See more
cliangyu ⢠GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It is uniquely positioned to extract invaluable insights from diverse unstructured data types, including lengthy text, images, audio and video files. Leveraging the capabilities of... See more
towhee-io ⢠GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
voyage-multimodal-3: all-in-one embedding model for interleaved text, images, and screenshots
Voyage AIblog.voyageai.com
txtai
neuml.github.ioThe human-centric platform for production ML & AI
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Infrastructure for ML, AI, and Data Science | Outerbounds
LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more