LLMs
The Gemini API context caching feature is designed to reduce the cost of requests that contain repeat content with high input token counts.
When to use context caching
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases... See more
When to use context caching
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases... See more
Context caching guide | Google AI for Developers | Google for Developers
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the... See more
Developing Rapidly with Generative AI
Memory Considerations
Since co-occurrence matrices are square, they grow exponential with the number of entities being embedded. For 50k entities and a 32-bit data format, a dense matrix will already be at 10GB. 100k entities puts it at 40GB.
If you are trying to embed even more entities than that or have limited RAM available, you may need to use a... See more
Since co-occurrence matrices are square, they grow exponential with the number of entities being embedded. For 50k entities and a 32-bit data format, a dense matrix will already be at 10GB. 100k entities puts it at 40GB.
If you are trying to embed even more entities than that or have limited RAM available, you may need to use a... See more
What I've Learned Building Interactive Embedding Visualizations
Two ways for an AI company to protect itself from competition: (a) depend not just on AI but also deep domain knowledge about a particular field, (b) have a very close relationship with the end users.
Paul Graham • Tweet
The OpenAI Assistants API offers more than a simple prompt-sharing interface; it provides a sophisticated framework for AI interactions. It allows for persistent conversation sessions with automatic context management (Threads), structured interactions (Messages and Runs), integration with various tools for enhanced capabilities, customization... See more
Discord - A New Way to Chat with Friends & Communities
The exact metrics we use depend on the application — our main goal is to understand how users use the feature and quickly make improvements to better meet their needs. For internal applications, this might mean measuring efficiency and sentiment. For consumer-facing applications, we similarly focus on measures of user satisfaction - direct user... See more
Developing Rapidly with Generative AI
“I think a lot of people obviously want to talk about the sexy kind of new consumer applications. I would tell you that I think that the earliest and most significant effect that AI is going to have on our company is actually going to be as it relates to our developer productivity. Some of the tools that we’re seeing are going to allow our devs to... See more

![Thumbnail of Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]](https://shortwaveimages.com/proxy/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2Fw_2912%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F949e68ed-9f0c-47c2-9f12-38155122e288_2156x1212.png)