Benchmarking
31%32 posts analysés sur les 12 dernières semaines
Sur les 12 dernières semaines
Moyenne tous posts confondus
Ce mois-ci vs le précédent
32 posts analysés sur les 12 dernières semaines
Sur les 12 dernières semaines
Moyenne tous posts confondus
Ce mois-ci vs le précédent
Pente de progression (6 semaines)
Meilleure semaine : 20 avr. (293 likes moy.)
Niels Rogge
Machine Learning Engineer at ML6 & Hugging Face
Hugging Face just released "ML-Intern"! 🔥 It's an open-source implementation of the real research loop that ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements id…
Niels Rogge
Machine Learning Engineer at ML6 & Hugging Face
This week, Mistral AI released a new model, Medium 3.5, but it wasn't well-received. 🇫🇷🥐😥 Various people noticed that it uses an outdated architecture based on Llama 2 and is priced higher than models such as DeepSe…
Maor Shlomo
Founder at Base44 | Prev: CEO and Co-Founder at Explorium | Forbes 30 under 30
We’re introducing a new model benchmark. And it’s a different kind of benchmark. (Basemark? Vibench?) A different kind because it’s breathing, constantly updated from millions of builders. Not a closed set of tasks. F…
Tom Aarsen
🤗 Sentence Transformers & NLTK maintainer, MLE @ Hugging Face
BidirLM-Omni-2.5B-Embedding is live! A single bidirectional encoder that embeds text, images, and audio into the same space. Here's the details: Benchmark sweep: 🥇 #1 open-data model on MTEB Multilingual V2 (text, #15 …
Ethan Mollick
Associate Professor at The Wharton School. Author of Co-Intelligence
I find that open weights models over-perform on benchmarks compared to actual real-world usage, and the new Kimi 2.6 Thinking feels like no exception. For example, a small amount of use will show that Kimi is not as good…
Nandan Mullakara
Follow for Agentic AI, Gen AI & RPA trends | Co-author: Agentic AI & RPA Projects | Favikon TOP 200 in AI | Oanalytica Who’s Who in Automation | Founder, Bot Nirvana | Ex-Fujitsu Head of Digital Automation
𝗜 𝗸𝗲𝗲𝗽 𝘀𝗲𝗲𝗶𝗻𝗴 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗳𝗮𝗶𝗹𝘂𝗿𝗲 𝗶𝗻 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻 𝗮𝗳𝘁𝗲𝗿 𝟭𝟬+ 𝘆𝗲𝗮𝗿𝘀. Different company. Different tools. Different team. 𝗦𝗮𝗺𝗲 𝗿𝗼𝗼𝘁 𝗰𝗮𝘂𝘀𝗲 every single time. Nobody …
Nick Saraev
Founder at Maker School: the straightest-line path to building an AI agency (2K+ members, ~$250K MRR) | Co-founder at LeftClick, an AI growth agency serving multibillion dollar portfolio companies.
Opus 4.7 dropped a few days ago and half the internet is treating it like some massive breakthrough. It isn't. It's a marginal step up over 4.6, where some benchmarks move 3-4 points, one or two actually regress, and t…
Nick Saraev
Founder at Maker School: the straightest-line path to building an AI agency (2K+ members, ~$250K MRR) | Co-founder at LeftClick, an AI growth agency serving multibillion dollar portfolio companies.
The system card for Claude Mythos Preview is 244 pages of "holy crap." This is the most capable model ever released by any lab. It's exceptional at automation, software engineering, general reasoning, and—a little conce…
Vincent RYCKBOSCH
Tout le monde sait vibe-coder un proto. Moi je livre ton SaaS IA qui passe en prod et qui se vend, en 3 mois → vria-consulting.fr
Alexandr Wang a vendu Scale AI 14,3 milliards a Meta. 9 mois plus tard, il enterre l'open-source. Il a 27 ans. Il est Chief AI Officer chez Meta. Et il vient de lancer Muse Spark, le premier modele proprietaire de l'his…
Jordan Bastin
Développeur fullstask l SaaS Builder | IA, automatisation & apps métier
Anthropic accusé de brider les performances de Claude. Quand tu bosses avec un outil d'IA tous les jours, tu finis par le "sentir". Et c'est exactement ce qui s'est passé cette semaine. Des développeurs ont commencé à…