Collections
AI Template & SPR Library Featuring advanced prompts and SPRs 🟢 Website 🔵 LinkedIn 🔴 Patreon ⚪ Discord Prompt Engineering Advan
Simply adding "Repeat the question before answering it." somehow make the models answer the trick question correctly. Probable explanations: Repea
xo xo is a command-line tool to generate idiomatic code for different languages code based on a database schema or a custom query. Installing | B
We had a CLI tool, written mostly as a bunch of shell scripts, with a ton of available commands that performed all kinds of utility functions related
ShellCheck finds bugs in your shell scripts. You can cabal, apt, dnf, pkg or brew install it locally right now.
dasel Dasel (short for data-selector) allows you to query and modify data structures using selector strings. Comparable to jq / yq, but sup
Nomic Atlas Python Client Explore, label, search and share massive datasets in your web browser. This repository contains Python bindings for working
Introduction This library provides utilities for generating and scoring text explanations of sparse autoencoder (SAE) features. The explainer and scor
Gemma Scope Tutorial This is a barebones tutorial on how to use Gemma Scope, Google DeepMind's suite of Sparse Autoencoders (SAEs) on every layer and
What is feder Feder is a JavaScript tool designed to aid in the comprehension of embedding vectors. It visualizes index files from Faiss, HNSWlib, and
Jazz.ToolsPowerSyncFireproofAutomergeDXOSElectricSQLand of course, Berlin's own: Yjs .
Mem0: The Memory Layer for Personalized AI Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI expe
🎤 audapolis An editor for spoken-word media with transcription. audapolis aims to make the workflow for spoken-word-heavy media editing easier, fas
Get Started · Examples · Try the Demo · Docs · Discord Instant is a client-side database that makes it easy to build real-time and co
Build UIs without the grunt workStorybook is a frontend workshop for building UI components and pages in isolation. Thousands of teams use it for UI d
Papermark The open-source DocSend alternative. papermark.io Papermark is the open-source document sharing alternative to DocSend
Empower JavaScriptwith native APIs Liberate your development by using platform APIs directly without leaving your of JavaScript.
Welcome to Extension Extension is a plug-and-play, zero-config, cross-browser extension development tool for browser extensions with bu
#sapling Over winter break last year, I tried learning everything I could about AWS. I took a bunch of notes and decided to compile them all here. Whi
Terraform Providers: Terraform is primarily used for defining the infrastructure resources. Its strength lies in its vast collection of providers that
Pulumi Examples This repository contains examples of using Pulumi to build and deploy cloud applications and infrastructure across major programming
Burrow Burrow is a serverless and globally-distributed HTTP proxy for Go built on AWS Lambda. It is designed to be completely compatible with the stan
Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by ANY LLM of your choice, statistical methods, or NLP models tha
You can think your way into solving a deterministic system, but you cannot think your way into solving a probabilistic system. The first thing that I
Mem0: The Memory Layer for Personalized AI Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI expe
Zerox OCR A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, t
What is Pingora Pingora is a Rust framework to build fast, reliable and programmable networked systems. Pingora is battle tested as it has been servin
Luminal is a deep learning library that uses composable compilers to achieve high performance. Current ML libraries tend to be large and comple
orch orch is a library for building language model powered applications and agents for the Rust programming language. It was primarily built for usa
Rottnest : Data Lake Indices You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all y
Who will this data model serve? These are the stakeholders and users of the data model. Why does this data model need to be built? What is the purpos
Our Goals We made it lightweight and kept the efficiency in mind: Self-contained We ship
SQL Studio Single binary, single command SQL database explorer. SQL studio supports SQLite, libSQL, PostgreSQL, MySQL and DuckDB. Local SQLite DB File
Optimizing Further Creating so many indices and aggregating so many tables is sub-optimal. To optimize this, we employ materialized views, which creat
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactiv
Local database for development Each table in the database had an accompanying script that would generate a subset of the data for use in local develop
Who will this data model serve? These are the stakeholders and users of the data model. Why does this data model need to be built? What is the purpos
Indexify - Extraction and Retrieval from Videos, PDF and Audio for Interactive AI Applications
The solution: The ingestion service To meet these unique demands, the Search Infrastructure team implemented the Ingestion Service to gracefully handl
Surya Surya is a document OCR toolkit that does: OCR in 90+ languages that benchmarks favorably vs cloud services Line-level text detection in any la
ETL The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process. We had a series of shell scripts for each data
Public APIs A collective list of free APIs for use in software and web development
A better type of backendConvex is the fullstack TypeScript development platform. Replace your database, server functions and glue code.
What is Pingora Pingora is a Rust framework to build fast, reliable and programmable networked systems. Pingora is battle tested as it has been servin
The complete Protobuf platformAccelerate gRPC adoption with the Buf Schema Registry — built by the world's Protobuf experts.
Using feature flags It's always a good idea to put new features behind a feature flag. This contributes to a rollout strategy that can surface user fe
Shipping to production Before shipping to production, we think about all the different artifacts that might be affected by a new feature. Here's a non
Build & Deployments Our build process starts by pushing changes to a repository on GitHub. When code is pushed to a repository through a pull request,
The complete Protobuf platformAccelerate gRPC adoption with the Buf Schema Registry — built by the world's Protobuf experts.
Portkey's AI Gateway is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, G
dstack is an open-source toolkit and orchestration engine for running GPU workloads. It's designed for development, training, and deployment of gen AI
LanceDB LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal da
Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by ANY LLM of your choice, statistical methods, or NLP models tha
Take a look at our official page for user documentation and examples: langtest.org Key Features Generate and execute more than 50 distinct types of t
Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by ANY LLM of your choice, statistical methods, or NLP models tha
TurboSeek An open source AI search engine. Powered by Together.ai. Tech stack Next.js app router with Tailwind Together AI for LLM inference Mix
Welcome to Quartz 4Jun 13, 20242 min readQuartz is a fast, batteries-included static-site generator that transforms Markdown content into fully functi
libsearch 🔎 Simple, index-free text search for JavaScript, used across my personal projects like YC Vibe Check, linus.zone/entr, and my personal pr
Turn questions into data insights. Make your team more informed and save time, by using AI for Data Analysis on your
In your sport of choice: Perform a 5-minute Zone 5 effort. Make this a Very Hard effort but leave yourself room to improve next time. Calculate the
Tempo training feels different than we expect. Not a lot of huffing of the breath. No burning in the legs. Heart rate responds slowly. Aim for a 3
We can detect factually inconsistent summaries via the natural language inference (NLI) task. The NLI task works like this: Given a premise sentence a

Google Deepmind used similar idea to make LLMs faster in Accelerating Large Language Model Decoding with Speculative Sampling. Their algorithm uses a
FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm.
FuzzTypes FuzzTypes is a set of "autocorrecting" annotation types that expands upon Pydantic's included data conversions. Designed for simplicity, it
Unlike some other popular algorithms, DiskANN is designed to keep memory usage to a minimum. This makes it a great match for use cases where Turso alr
Welcome to RAGatouille Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by
rerankers A lightweight unified API for various reranking models. Developed by @bclavie as a member of answer.ai Welcome to rerankers! Our goal is
Large Language Model State Machine (llmstatemachine) Introduction llmstatemachine is a library for creating agents with GPT-based language models an
💸🤑 Announcing our Bounty Program: Help the Julep community fix bugs and ship features and get paid. More details here. Start your project with conve
ht - headless terminal ht (short for headless terminal) is a command line program that wraps an arbitrary other binary (e.g. bash, vim, etc.) with a V
nanosearch Nanosearch is an in-memory search engine designed for small (< 10,000 URL) websites. With Nanosearch, you can build a search engine in a fe
Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an
LLM-PowerHouse: A Curated Guide for Large Language Models with Custom Training and Inferencing Welcome to LLM-PowerHouse, your ultimate resource for u
Mistral-finetune mistral-finetune is a light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models. It is b
NeMo Curator NeMo Curator is a Python library specifically designed for scalable and efficient dataset preparation. It greatly accelerates data curati
Most commonly, ETL means moving data from some source system (e.g. a production database, Slack API) into an analytical data warehouse (e.g. Snowflake
This job copied 12m rows from Clickhouse to Snowflake in 16 minutes using: 5 CPUs: at $0.192 / CPU hour that comes out to $0.26 4.4 GiB of memory: at
Traditional ETL solutions are still quite powerful when it comes to: Common connectors with small-medium data volumes: we still have a lot of respect
Koheesio CI/CD Package Meta Koheesio, named after the Finnish word for cohesion, is a robust Python framework for build
PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible ar
Deep learning at the speed of light. Luminal is a deep learning library that uses composable compilers to achieve high performance. use luminal::prelu
STUMPY STUMPY is a powerful and scalable Python library that efficiently computes something called the matrix profile, which is just an academic way o
Who is this document for? This document is for engineers and researchers (both individuals and teams) interested in maximizing the performance of deep
First Stage Ranking p span[style*="font-size"] { line-height: 1.6; } After candidates are retrieved, the system needs to rank them by value to the
Second Stage p span[style*="font-size"] { line-height: 1.6; } Here, Instagram uses a Multi-Task Multi Label (MTML) neural network model. As the nam
Causal Ranker: A Causal Adaptation Framework for Recommendation ModelsJeong-Yoon Lee, Sudeep DasMost machine learning algorithms used in personalizati
The RDRec framework has two main stages: Interaction Rationale Distillation: This step involves extracting detailed user preferences and item attrib
Koyeb is a developer-friendly serverless platform designed to let businesses easily deploy reliable and scalable applications globally. The platform h
The Most Affordable Cloud for AI/ML Inference at Scale Deploy AI/ML production models without headaches on the lo
MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's
The human-centric platform for production ML & AIAccess data easily, scale compute cost-efficiently, and ship to production confidently with fully man
When data is extracted and transformed, it’s time to visualize and get the value from all your hard work. Visuals are done through Analytics and Busin
Rill is the fastest path from data lake to dashboard. Unlike most BI tools, Rill comes with its own embedded in-memory database. Data and compute are
The Design Philosophy of Great Tables Author Rich Iannone and Michael Chow Published
A Python API for Intelligent Visual Discovery
ata Collection Experimentation Evaluation and Deployment Monitoring and ResponseMetadata Data catalogs,Amundsen, AWSGlue, Hive metas-toresWeights & Bi
Hi everyone! How do you guys go about choosing the granularity of your ML response ?. For instance, let us say you have been tasked with predicting th
Requirements (or constraints): What does success look like? What can we not do? Methodology: How will we use data and code to achieve success? Impleme
MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's
One communication smell I see looks like this:❌ We can improve the load time by caching images on a CDN.“Improve” is vague. Would it improve by .0001
Let’s clarify by sharing the before-and-after (delta) and tying it to team goals.✅ Impact: Cut home page load time by 66%, from 3 seconds to 1 second
Model description Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. The model was trai
Stable Beluga 2 Use Stable Chat (Research Preview) to test Stability AI's best language models for free Model Description Stable Bel
DiscoLM German 7B v1 - GGUF Model creator: Disco Research Original model: DiscoLM German 7B v1 Description This repo contains GGU
pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on R
Dify is an LLM application development platform that has helped built over 100,000 applications. It integrates BaaS and LLMOps, covering the essential
AI Template & SPR Library Featuring advanced prompts and SPRs 🟢 Website 🔵 LinkedIn 🔴 Patreon ⚪ Discord Prompt Engineering Advan
Overview GPTScript is a new scripting language to automate your interaction with a Large Language Model (LLM), namely OpenAI. The ultimate goal is to
SudoLang v1.0.9 Introduction SudoLang is a pseudolanguage designed for interacting with LLMs. It provides a user-friendly interface that combines natu
Onyxia DatalabThe modern datascience stack made accessiblePool computing resources and provide a state of the art work environment to your data scient
End to end ML Project Project setup: Open this in VSCode Install Dev Containers Do Cmd + Shift + P -> Dev Containers: Rebuild Container Without Cache
Ensuring availability during peak traffic by maintaining all GPU instance types could lead to prohibitively high costs. To avoid the financial strain
More than reading popular books on Design Patterns, two things that helped me write and structure a large codebase better were
It's up to the architect to foresee these outcomes and decide whether they are comfortable with them. Does it matter where this lands, or does it make
Meet QuillYour AI-Powered Financial Research Assistant
Scientific ResearchAt Your FingertipsEpsilon uses AI to answer research questions with academic literature
The No-Code Generative AI Platform Deploy custom AI Assistants, Chatbots, and Workflow Automations to make your company 10x more efficient.
The backbone for Versatile aiMeet Instill Cloud, a no-code/low-code platform that accelerates AI application development by 10x. Effortlessly connect
What is Kestra Kestra is a universal open-source orchestrator that makes both scheduled and event-driven workflows easy. By bringing Infrastructure as
The what and why of Dagster#Welcome to Dagster! If you’re here, you probably have questions about data, orchestration, Dagster, and how everything fit
The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneo
So what abstractions do we have as of today? For example, let’s take the resource abstraction (Dagster, Prefect, referred to as an operator in Airflow
I mostly use a UI I made myself:https://github.com/shinomakoi/AI-MessengerWorks with llama.cpp and Exllama V2, supports LLaVA, character cards and moa
Ollama Web UI is another great option - https://github.com/ollama-webui/ollama-webui. It has look&feel similar to ChatGPT UI, offers an easy way to in
Vercel AI SDK An open source library for building AI-powered user interfaces. The Vercel AI SDK is an open-source library designed to help developers
Dify is an LLM application development platform that has helped built over 100,000 applications. It integrates BaaS and LLMOps, covering the essential
HeimdaLLM Pronounced [ˈhaɪm.dɔl.əm] or HEIM-dall-EM HeimdaLLM is a robust static analysis framework for validating that LLM-generated structured outpu
LMQL is a programming language for LLMs.Robust and modular LLM prompting using types, templates, constraints and an optimizing runtime.
Super JSON Mode is a Python framework that enables the efficient creation of structured output from an LLM by breaking up a target schema into atomic
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable
Public APIs A collective list of free APIs for use in software and web development
The next-generation command line.The source of truth for your team’s secrets, scripts, and SSH credentials.
Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an
Code at the speed of thoughtself.__wrap_n!=1&&self.__wrap_b(":Rkrqbf9la:",1)Zed is a high-performance, multiplayer code
If you don’t have time to clean up, you don’t have time to cook Professionals understand that the project is the whole project, not simply the fun
Hindsight is 2020I don’t mean “hindsight is 20/20.” I’m talking about the year 2020.See, early in 2023, I wrote an article about starting and growing
A solid foundation requires a robust team structure. Hiring individuals who not only have the requisite skills but also align with the company’s cultu
This dataset is an attempt to replicate the results of Microsoft's Orca Our dataset consists of: ~1 million of FLANv2 augmented with GPT-4 completion
Repository for the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", includin
TabLib Access on Hugging Face 🤗 (Sample, Full Dataset) Read the Paper (TabLib) Introduction Huge datasets have been critical for the performance
RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots
The shift described represents a fundamental change in the "architecture" of how creative value is captured, moving from individual products to entire
Nietzsche wrote in Thus Spoke Zarathustra that “Man muss noch Chaos in sich haben, um einen tanzenden Stern gebären zu können” – “One must still have
Claude’s outputs are the product of a form of mimicry, rather than as a report of genuine internal states.Consciousness is about internal states; the
I say please and thank you, not because I believe it matters to him, but because it matters to me. If I am going to use natural English to communicate
“Worry can become like a bad habit of the mind. The rule of neuroplasticity—that our brain keeps changing based on our repeated activity—says that wha
many of the quant-ities related to suffering are based on perceptions and similar uncertain infer-ences. The uncertainty will necessarily increase fru
Scientists have demonstrated that, as the years go by, much of what we think we remember is false. It seems our brains can't store every detail we exp
A leading indicator of personal growth is how curious you are able to be with all your emotions - especially the ones you weren't allowed to feel as a
a big theme from opening sessions at @bowmansschool is the importance of paying attention to your feelings a few people have said they don't have acc
How to be more emotionally intelligent (without trying so hard) 🧵 for @threadapalooza
The most competent people I know are pretty good at basically anything they put their minds to, because they just design a process and run it. I think
www.simonstalenhag.se
The text actually moves back and forth between all of these. Few novels pay less attention to the rules of fiction than Zen and the Art of Motorcycle
surprise may be a better proxy for creativity than quality. A polished output is not necessarily creative, but a surprising one might be. Yet even sur

In large part this is because education, like most social systems, is slow to adapt, iterate, and evolve to be relevant for changing times. The glacia
generative AI is but the latest in a line of innovations that draws attention to the flaws of the modern education system, leaving us to question how
The humanities, rightly understood, are the things that technology cannot take away or substitute for. Of course, I don’t mean ‘humanities’ in the way
ideas only become clear once you begin to work on them
The friction between idea and ability that AI evangelists promise to eradicate is not a problem suffered by a disadvantaged few. It’s the fundamental
the choice humanity faces in every age is between the idea of power and the power of ideas
“I would make sushi in my dreams. I would jump out of bed at night with ideas.” Leonardo Da Vinci, Michael Ferrero, and Colin Chapman all did the s
The five principles of prompting I developed work equally well as management techniques for humans: Give direction. Describe the desired style in de
People who are good at solving poorly defined problems don't get the same kind of kudos. They don’t get any special titles or clubs. There is no test
2) Clear Roles & Decision Rights Without clear swim lanes and decision rights, individuals on teams feel disempowered and projects tend to stall out.














