LLMs
In general, I see LLMs to be used in two broad categories: data processing, which is more of a worker use-cases, where the latency isn't the biggest issue but rather quality, and in user-interactions, where latency is a big factor. I think for the faster case a faster fallback is necessary. Or you escalate upwards, you first rely on a smaller more... See more
Discord - A New Way to Chat with Friends & Communities
How can we make interacting with conversational models feel more natural?
Every conversational interface to a language model adopts the same pattern:
A chat history sidebar, with each conversation lasting just a few turns
New sessions always begin in a brand-new thread
Every user query must always elicit exactly one response
None of these assumptions... See more
Every conversational interface to a language model adopts the same pattern:
A chat history sidebar, with each conversation lasting just a few turns
New sessions always begin in a brand-new thread
Every user query must always elicit exactly one response
None of these assumptions... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Why is Discord such a good GTM for AI applications?
Text interface. Most users are just generating images, videos, and audio in these Discord servers. Prompts are easily expressible in simple text commands. It’s why we’ve seen image generation strategies like Midjourney (all-in-one) flourish in Discord while more raw diffusion models haven’t grown... See more
Text interface. Most users are just generating images, videos, and audio in these Discord servers. Prompts are easily expressible in simple text commands. It’s why we’ve seen image generation strategies like Midjourney (all-in-one) flourish in Discord while more raw diffusion models haven’t grown... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
We identified 30 types of tasks that UX professionals used generative AI tools for in their work. We grouped these tasks under four roles: content editor, research assistant, ideation partner, or design assistant.
- Content editor : Generating and editing text, from microcopy to social media posts, based on specifications or copy given by UX
Mingjin Zhang • AI as a UX Assistant
GPT-4 Turbo can accept images as inputs in the Chat Completions API, enabling use cases such as generating captions, analyzing real world images in detail, and reading documents with figures. For example, BeMyEyes uses this technology to help people who are blind or have low vision with daily tasks like identifying a product or navigating a store.... See more
New models and developer products announced at DevDay
- You have access to a proprietary asset (like data) that others don’t have easy access to. In our “write job postings” example, perhaps you have a corpus of thousands of job postings including some outcome scores (as to how well they did). You could use this data to create better job postings. Others don’t have ready access to this data. Note: The
Dharmesh Shah • How To Build a Defensible A.I. Startup
Protecting LLM products:
(1) Is hard to bootstrap. This already hints to existing customers or you need to get a bunch of your customers to co-develop (insurance model → companies pooling their data to solve a problem they all have). This runs into a bunch of issues: competitive drive of the companies, data privacy and security.
(2) Reserved for existing companies. This is the co-pilot model.
(3) This might be the most sustainable one, but it is also the hardest one. I have not seen anything in that direction yet besides OpenAI.
Announcing Together Inference Engine – the fastest inference available
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s... See more
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s... See more
Announcing Together Inference Engine – the fastest inference available
The need for better AI or LLM-specific infrastructure, along with the host of problems that come with non-deterministic of LLMs, means that there’s more software work ahead of us, not less. Abstraction layers like LLMs create more possibilities and thus, more work.
Is this a good thing or a bad thing? I’m not sure.
A great example of this is frontend... See more
Is this a good thing or a bad thing? I’m not sure.
A great example of this is frontend... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦: it will improve your LLM performance on given use cases (e.g., coding, extracting text, etc.). Mainly, the LLM will specialize in a given task (a specialist will always beat a generalist in its domain)
𝘤𝘰𝘯𝘵𝘳𝘰𝘭: you can refine how your model should behave on specific inputs and outputs, resulting in a more robust product
𝘮𝘰𝘥𝘶𝘭𝘢𝘳𝘪𝘻𝘢𝘵𝘪𝘰𝘯:... See more
𝘤𝘰𝘯𝘵𝘳𝘰𝘭: you can refine how your model should behave on specific inputs and outputs, resulting in a more robust product
𝘮𝘰𝘥𝘶𝘭𝘢𝘳𝘪𝘻𝘢𝘵𝘪𝘰𝘯:... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Motivation for finetuning