Sublime
An inspiration engine for ideas
By using this repo https://t.co/HOobVEmInD you can convert any decoder model (such as Llama 3.1 or Gemma 2) into an encoder (such as RoBERTa). Why would you want to do that? Because modern decoders support huge context sizes (128k+ tokens) while the longest context of an encoder is 4K (LongFormer), but encoders excel in such tasks as... See more
BURKOVx.com
chatgpt embeds an invisible watermark throughout its messages.
so i built a decoder to find and remove it.
here is a recording of me copying an output, pasting it into the the decoder, showing the hidden unicode characters, and then deciphering those characters (string of punctuation and an... See more
Riley Coyotex.com
Today's experiment 🪄— Inverting OpenAI's embedding-ada-002 model to reconstruct input texts from just embeddings.
A LOT of interesting tidbits here. I'll begin with these (cherry-picked) samples. Left column is input, middle is reconstructed from each paragraph's embedding only https://t.co/VHx2paKCxK
PDFs are satan’s file format.
Almost everyone that builds RAG needs to deal with them - and it sucks.
Solutions on the market are either too slow, too expensive or not OSS.
It should be easier. Which is why we’re open sourcing https://t.co/0gCZxzbkWu
Ishaan Kapoorx.com
flowchart explaining the rough process
data contracts/sstore2 are well understood at this point so will skip that
main unlock is using compressed data, then decompressing and loading at runtime via an injected helper script in the data uri
"on-chain npm" etc... See more
