LLMs
Matei Zaharia, Omar Khattab, Lingjiao Chen, et al. • The Shift From Models to Compound AI Systems
How do models represent style, and how can we more precisely extract and steer it?
A commonly requested feature in almost any LLM-based writing application is “I want the AI to respond in my style of writing,” or “I want the AI to adhere to this style guide.” Aside from costly and complicated multi-stage finetuning processes like Anthropic’s RL with... See more
A commonly requested feature in almost any LLM-based writing application is “I want the AI to respond in my style of writing,” or “I want the AI to adhere to this style guide.” Aside from costly and complicated multi-stage finetuning processes like Anthropic’s RL with... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
The quality of dataset is 95% of everything. The rest 5% is not to ruin it with bad parameters.
After 500+ LoRAs made, here is the secret
OpenGPTs
This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
- The LLM you use (choose between the 60+ that LangChain offers)
- The prompts you use (use LangSmith to debug those)
- The tools you give it (choose from
github.com • Langchain-Ai/Opengpts
Announcing Together Inference Engine – the fastest inference available
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s... See more
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s... See more
Announcing Together Inference Engine – the fastest inference available
OpenAI is treating its new marketplace seriously now: The brand new GPT store will come with REVENUE SHARING.... (missing in the Plugins launch)
and launching a Stateful Assistants API:
- Persistent Threads (/api/openai/threads)
- Built in Retrieval (chunking etc done for you)
- Code Interpreter (RIP Adv Data Analysis?)
- Speech to Text and Text to... See more
and launching a Stateful Assistants API:
- Persistent Threads (/api/openai/threads)
- Built in Retrieval (chunking etc done for you)
- Code Interpreter (RIP Adv Data Analysis?)
- Speech to Text and Text to... See more
swyx • Tweet
Menlo Ventures released a report on ‘The State of Generative AI in the Enterprise’ and found that adoption is trailing the hype. Details below:
Generative AI still represents less than 1% of cloud spend by surveyed enterprises, including just an 8% increase in 2023.
Safety and ROI continue to be prime concerns, and the tangible advantages of being... See more
Generative AI still represents less than 1% of cloud spend by surveyed enterprises, including just an 8% increase in 2023.
Safety and ROI continue to be prime concerns, and the tangible advantages of being... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Memory Considerations
Since co-occurrence matrices are square, they grow exponential with the number of entities being embedded. For 50k entities and a 32-bit data format, a dense matrix will already be at 10GB. 100k entities puts it at 40GB.
If you are trying to embed even more entities than that or have limited RAM available, you may need to use a... See more
Since co-occurrence matrices are square, they grow exponential with the number of entities being embedded. For 50k entities and a 32-bit data format, a dense matrix will already be at 10GB. 100k entities puts it at 40GB.
If you are trying to embed even more entities than that or have limited RAM available, you may need to use a... See more