LLMs
The Gemini API context caching feature is designed to reduce the cost of requests that contain repeat content with high input token counts.
When to use context caching
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases... See more
When to use context caching
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases... See more
Context caching guide | Google AI for Developers | Google for Developers
I’ve been giving talks and speaking with engineers and non-technical audiences about interpretability since 2022, and I still struggle to explain exactly what a “feature” is. I often use words like “concept” or “style”, or establish metaphors to debugging programs or making fMRI scans of brains. Both metaphors help people outside of the subfield... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Overview
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more
google • GitHub - google/maxtext: A simple, performant and scalable Jax LLM!
Today, we’re releasing the Assistants API, our first step towards helping developers build agent-like experiences within their own applications. An assistant is a purpose-built AI that has specific instructions, leverages extra knowledge, and can call models and tools to perform tasks. The new Assistants API provides new capabilities such as Code... See more
New models and developer products announced at DevDay
The xAI PromptIDE is an integrated development environment for prompt engineering and interpretability research. It accelerates prompt engineering through an SDK that allows implementing complex prompting techniques and rich analytics that visualize the network's outputs. We use it heavily in our continuous development of Grok.
PromptIDE
Two ways for an AI company to protect itself from competition: (a) depend not just on AI but also deep domain knowledge about a particular field, (b) have a very close relationship with the end users.
Paul Graham • Tweet
What’s the best way for an end user to organize and explore millions of latent space features?
I’ve found tens of thousands of interpretable features in my experiments, and frontier labs have demonstrated results with a thousand times more features in production-scale models. No doubt, as interpretability techniques advance, we’ll see feature maps... See more
I’ve found tens of thousands of interpretable features in my experiments, and frontier labs have demonstrated results with a thousand times more features in production-scale models. No doubt, as interpretability techniques advance, we’ll see feature maps... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on RewardBench (beating one of Cohere’s).
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU... See more
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
We generally lean towards picking more advanced commercial LLMs to quickly validate our ideas and obtain early feedback from users. Although they may be expensive, the general idea is that if problems can't be adequately solved with state-of-the-art foundational models like GPT-4, then more often than not, those problems may not be addressable... See more