Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Nice work by @jingfeng_yao, @XinggangWang, @shushengyang, Baoyuan Wang. References: ViTDet: https://t.co/wAUiofBLbc SAM: https://t.co/1al2SdcCpK ViTPose: https://t.co/WNSdLg6q0q ViTMatte: https://t.co/Kzh9gH2qDe

Lucas Beyer (bl16)x.com

Yann LeCun: "We're never going to get to human level AI by just training on text" https://t.co/ucDtggCwLO

Lior⚡x.com

Meta announces An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision... See more

x.com

@giffmana Great read! You may find this blog from @ppwwyyxx interesting. It goes into details of different resize/resample options in CV libraries and their consequences. This understanding helped us a lot back in the days: https://t.co/w8UVA7S6yz

Alexander Kirillov x.com

Video of my talk on self-supervised learning, energy-based models, and training methods for joint-embedding architectures (e.g. Siamese nets) in contrastive and non-contrastive modes. Given at the French-German Symposium on ML. (with panel discussion). https://t.co/zCf4PBm9O7

Yann LeCun x.com

No, brains don't build generative models at the pixel level. They learn abstract representations that *eliminate* noise, unpredictable stuff, and irrelevant information. The salvation is in Joint Embedding Predictive Architectures (JEPA). https://t.co/42ApHRbge9

Yann LeCun x.com

I don't wanna say "I told you so", but I told you so. Quote: "Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training - the phase of training an AI model that uses a vast amount of unlabeled data to understand language patterns and... See more

Yann LeCun x.com

Congrats John and Geoff! Both are former colleagues. I did my postdoc in Geoff's lab in Toronto. After that, I joined Bell Labs, where John was a part-time scientist (and a professor at Caltech). In fact, John was the reason why the department I joined was working on neural... See more

Yann LeCun x.com

A short post on the best architectures for real-time image and video processing. TL;DR: use convolutions with stride or pooling at the low levels, and stick self-attention circuits at higher levels, where feature vectors represent objects. PS: ready to bet that Tesla FSD uses convolutions (or perhaps more... See more

Yann LeCun x.com