Long-Context Retrieval Models with Monarch Mixer

hazyresearch.stanford.edu

RelatedHighlights

Advances in data processing techniques. You can increase context length in two ways. First, you can train the model with longer context lengths. That’s difficult because it’s much more computationally expensive, and it’s hard to find datasets with long context lengths (most documents in CommonCrawl have fewer than 2,000 tokens).

The second, more com... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

Nicolay Gerold added

One of the focus areas at Together Research is new architectures for long context, improved training, and inference performance over the Transformer architecture. Spinning out of a research program from our team and academic collaborators, with roots in signal processing-inspired sequence models, we are excited to introduce the StripedHyena models.... See more

Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

Nicolay Gerold added

introduced

FOD#27: "Now And Then"

Nicolay Gerold added

LLMs struggle when handling tasks which require extensive knowledge. This limitation highlights the need to supplement LLMs with non-parametric knowledge. This paper Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts analyze the effects of different types of non-parametric knowledge, such as textu... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

Nicolay Gerold added

memary: Open-Source Longterm Memory for Autonomous Agents

memary demo

Why use memary?

Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve relevan... See more

GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.

Nicolay Gerold added

Data

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removing the in-built alignment o... See more

HuggingFaceH4/zephyr-7b-alpha · Hugging Face

Nicolay Gerold added

simply accessing LLMs via APIs has limitations. Instead, combining them with other data sources and tools can enable more powerful applications. In this chapter, we will introduce LangChain as a way to overcome LLM limitations and build innovative language-based applications.

Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

LangChain enables building dynamic, data-aware applications that go beyond what is possible by simply accessing LLMs via API calls.