LLMs
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the... See more
Developing Rapidly with Generative AI
However development time, and maintenance can offset these savings. Hiring skilled data scientists, machine learning engineers, and DevOps professionals can be expensive and time consuming. Using available resources for “reimplementing” solutions hinder innovation and lead to a lack of focus. Since You not longer work on improving your model or... See more
Understanding the Cost of Generative AI Models in Production
a couple of the top of my head:
- LLM in the loop with preference optimization
- synthetic data generation
- cross modality "distillation" / dictionary remapping
- constrained decoding
r/MachineLearning - Reddit
Additional LLM paradigms beyond RAG
My $0.02 is that a lot of the future research/work there will be figuring out how to identify effective sub-graphs to provide additional context, to avoid having to pass in the entire graph. As well as trying to identify ontology-less structures in real-time, which includes NER and RE, as well as named entity/relationship... See more
r/MachineLearning - Reddit
What’s the best way for an end user to organize and explore millions of latent space features?
I’ve found tens of thousands of interpretable features in my experiments, and frontier labs have demonstrated results with a thousand times more features in production-scale models. No doubt, as interpretability techniques advance, we’ll see feature maps... See more
I’ve found tens of thousands of interpretable features in my experiments, and frontier labs have demonstrated results with a thousand times more features in production-scale models. No doubt, as interpretability techniques advance, we’ll see feature maps... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Principles for growable tools
There are three critical pieces to building a tool that can grow around its users over time.
There are three critical pieces to building a tool that can grow around its users over time.
- Design around play . Sometimes I call this design around experimentation . Using the tool for day-to-day work should involve playing and experimenting with what’s possible with the tool. Whether that’s writing small programs to
Beyond customization: build tools that grow with us | thesephist.com
📦 Service Deployment - Ray Serve (https://lnkd.in/eAV-Y6RN)
🧰 Data Transformation - Ray Data (https://lnkd.in/e7wYmenc)
🔌 LLM Integration - AIConfig (https://lnkd.in/esvH5NQa)
🗄 Vector Database - Weaviate (https://weaviate.io/)
📚 Supervised LLM Fine-Tuning - HuggingFace TLR (https://lnkd.in/e8_QYF-P)
📈 LLM Observability - Weights & Biases Traces (https... See more
🧰 Data Transformation - Ray Data (https://lnkd.in/e7wYmenc)
🔌 LLM Integration - AIConfig (https://lnkd.in/esvH5NQa)
🗄 Vector Database - Weaviate (https://weaviate.io/)
📚 Supervised LLM Fine-Tuning - HuggingFace TLR (https://lnkd.in/e8_QYF-P)
📈 LLM Observability - Weights & Biases Traces (https... See more
Paul Venuto • feed updates
OpenGPTs
This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
- The LLM you use (choose between the 60+ that LangChain offers)
- The prompts you use (use LangSmith to debug those)
- The tools you give it (choose from
github.com • Langchain-Ai/Opengpts
Overview
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more
MaxText is a high performance , highly scalable , open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference . MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler.
MaxText... See more