GitHub - cliangyu/Cola: [NeurIPS2023] Official implementatio...

GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"

cliangyu github.com

RelatedHighlights

Repository for the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.84M CoT rationales extracted across 1,060 tasks"

Paper Link : https://arxiv.org/abs/2305.14045

kaistAI • GitHub - kaistAI/CoT-Collection: [Under Review] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Nicolay Gerold added

LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more

This AI newsletter is all you need #68

Nicolay Gerold added

新的序列建模方法,可以学习大型视觉模型,无需使用任何语言数据。通过定义“视觉句子”格式,将原始图像和视频以及注释数据源表示为序列,然后通过训练模型来预测下一个标记,从而实现模型的训练

灵度智能 • Article

Darren LI added

Guideline following Large Language Model for Information Extraction

Model Card for GoLLIE 34B

We present GoLLIE, a Large Language Model trained to follow annotation guidelines. GoLLIE outperforms previous approaches on zero-shot Information Extraction and allows the user to perform inferences with annotation schemas defined on the fly. Different from... See more

HiTZ/GoLLIE-34B · Hugging Face

Nicolay Gerold added

a couple of the top of my head:

LLM in the loop with preference optimization

synthetic data generation

cross modality "distillation" / dictionary remapping

constrained decoding

r/MachineLearning - Reddit

Nicolay Gerold added

Additional LLM paradigms beyond RAG

Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.

Sarah Wang • The Next Token of Progress: 4 Unlocks on the Generative AI Horizon

Darren LI added

Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors.

Xiao Liu • AgentBench: Evaluating LLMs as Agents

Darren LI added

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.

browse.arxiv.org

Darren LI added