Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

By using this repo https://t.co/HOobVEmInD you can convert any decoder model (such as Llama 3.1 or Gemma 2) into an encoder (such as RoBERTa). Why would you want to do that? Because modern decoders support huge context sizes (128k+ tokens) while the longest context of an encoder is 4K (LongFormer), but encoders excel in such tasks as... See more

BURKOV x.com

Crossing the uncanny valley of conversational voice

chatgpt embeds an invisible watermark throughout its messages. so i built a decoder to find and remove it. here is a recording of me copying an output, pasting it into the the decoder, showing the hidden unicode characters, and then deciphering those characters (string of punctuation and an... See more

Riley Coyote x.com

Thumbnail of www-x-com-thesephist-status-1698095739899974031-81d1ab29b5684616

Today's experiment 🪄— Inverting OpenAI's embedding-ada-002 model to reconstruct input texts from just embeddings. A LOT of interesting tidbits here. I'll begin with these (cherry-picked) samples. Left column is input, middle is reconstructed from each paragraph's embedding only https://t.co/VHx2paKCxK

Linus

x.com

PDFs are satan’s file format. Almost everyone that builds RAG needs to deal with them - and it sucks. Solutions on the market are either too slow, too expensive or not OSS. It should be easier. Which is why we’re open sourcing https://t.co/0gCZxzbkWu

Ishaan Kapoor x.com

Repomix. com is an open-source tool that lets you drop in a GitHub repo, and it packs the codebase into a single blob optimized for AI Models. It also shows the token count, you can paste it straight into Grok4 / Grok4 Heavy for analysis or code mods. https://t.co/k9QZ6VjfV9

Tetsuo

x.com

flowchart explaining the rough process data contracts/sstore2 are well understood at this point so will skip that main unlock is using compressed data, then decompressing and loading at runtime via an injected helper script in the data uri "on-chain npm" etc... See more

dom hofmann

x.com

Every company I talk to is literally trying to solve this problem: How to automatically extract structured data from unstructured documents. For example, they want to process driver's licenses, no matter the file type or format. I recorded the solution for this:... See more

Santiago x.com