Fadell trashes AI code; Cloudflare grabs Vite

Podcast Hot Take

Tony Fadell on Lenny's Podcast: Taste in the AI Era

Tony Fadell — father of the iPod, iPhone, and Nest — argues that because AI makes building trivially easy, the products that stand out are the ones that are really well thought through, and that builders must not "cognitively surrender to the machine."^{[1]Lenny's Podcast — Tony Fadell} His sharpest claim: real software architects who saw the leaked Claude source code "threw up" at its brittle main loop — a warning that AI-written code buys short-term gain for long-term technical debt unless humans architect, scope, and review it.

Taste, opinion-based decisions, and micromanagement

~08:05 Fadell distinguishes data-driven from opinion-based decisions: for a 1.0 in a brand-new category there are no analogues, so a small set of "taste makers" must make opinion-based calls in a "benevolent dictatorship" — especially in B2C, where customers can't react until the whole product-and-marketing ecosystem ships. ~13:09 He defends micromanagement as sweating only the few details that matter most to the customer (he cites the iPhone keyboard's hardware/software/filtering/graphics layers), clarifying that you micromanage the decision and the key data, not operations.

~02:00 The iPhone virtual-vs-physical keyboard fight was Apple's longest, most heated argument; Fadell ran speed/error tests, decided multi-touch was "good enough," and Steve Jobs ultimately said "we are going this way" or get out of the room. He notes Jobs was often wrong (Mac-only iPod, no stylus, no Windows connectivity), and that "skunk works" projects kept rejected ideas alive until they became key features.

"Because it's so easy to build, the things that stand out are the things that are really well thought through."

Three generations, and the press-release-first method

~21:16 Innovation starts from pain plus a newly-available technology. ~30:21 His core product law: everything needs three generations — "make the product, fix the product, then fix the business." The first iPod sold only to Mac geeks (<1% of the market); Windows connectivity and the iTunes Store made it take off and arguably saved a near-bankrupt Apple. ~34:24 On marketing, he champions meeting customers in their context via "earned and owned media," and working backwards from a press release (Amazon's method) — even, half-jokingly, writing the infomercial first and dialing it back to the honest truth.

"Make the product, fix the product, then fix the business. I've never seen anyone get it all right the first time."

The AI takes: brittle code and "luxury vs fast software"

~49:30 Fadell warns against the belief that "I can just make a prompt and it gets spit out." ~50:32 He cites the leaked Claude source code, claiming real architects "threw up" at the brittle, unreadable main loop — even as Dario Amodei reportedly says 90–100% of Anthropic's code is AI-written and monitored. The point: AI code can run and pass tests, but without a "mixture of experts" (architects, optimizers, coders, security reviewers) structuring it, you accumulate massive technical debt. ~54:33 He likens fast/throwaway "fast software" (H&M) to crafted "luxury software" (his repeated example: the Flighty app), and critiques OpenAI as a viral tech demo that scattered into Sora/Codex without product thinking — while Anthropic now carries a higher valuation and more revenue.

"Anybody who looked at the [leaked Claude] code who's a real software architect and engineer threw up… This is the main loop. People are like, how can you do this? This looks brittle."

The next device, hardware's comeback, and ethics

~65:43 Fadell insists we'll still need a display long-term (likely a foldable slab) — Humane-style projection failed because it still needs a surface and "you can't glance at a map" with voice alone. But he wants to flip the hierarchy to voice-primary, keyboard-secondary, tap-tertiary once models earn memory, intelligence, and social trust (he likens the wait to full self-driving). ~73:49 He notes everyone is rushing back to "atoms + software" now that pure SaaS looks vibe-codeable (Waymo, Snap Specs, his own Build Collective portfolio). ~84:57 He closes on ethics: don't design to addict users, comparing phones/social media to junk food without "nutrition labels."

"Don't surrender to the machine. We can use the machines, but don't cognitively surrender."

Tools: Claude / Claude Code, Codex, ChatGPT, Anthropic, Sora, Siri, Whisper Flow, Flighty, Waymo, Snap Specs, Humane, Cerebras

Podcast Developer Tools

Kun Chen Interview

Shipping 40 PRs a Day: An Ex-Meta L8's Agent Workflow

Ex-Meta/Microsoft L8 engineer Kun Chen, now a solo AI builder, walks through a plan–code–validate workflow that ships 20–40 PRs a day across 5+ parallel sessions and 20–30 concurrent agents.^{[2]Kun Chen Interview} His key move: the human is only needed at the start (planning) and end (validation), so he refuses to review first-pass agent code line-by-line — and built three custom tools (Lavish, Treehouse, No Mistakes) to remove himself from the loop everywhere else.

The plan–code–validate model

~01:00 Chen spends the most time in planning (human-led with agent help), delegates coding almost entirely to agents, and runs validation mostly via agents with his judgment reserved for ambiguity. ~04:02 Because the human is only needed at the bookends, the goal is to maximize autonomous agent runtime and parallelize sessions — at least 5 active, ~20–30 agents on average. ~03:01 More detailed plans let agents run longer before bouncing back; one-line prompts finish fast and stall.

Lavish, Treehouse, and No Mistakes

~08:05 Lavish (npx lavish) replaces the unreadable "wall of text" plans agents produce with rich interactive HTML artifacts you can annotate and click through — inspired by an "HTML over markdown" article, with best-practice instructions ("push back on me," surface risks) baked in. ~13:07 Treehouse is a "no-brainer" git-worktree manager that draws from a pool of worktrees with dependencies pre-installed, killing the cognitive load of naming/tracking worktrees and reinstalling node_modules. ~32:25 No Mistakes (alias nm) is a fresh-context validation pipeline: intent analysis, branch, rebase on main, a high-recall review, a CI-style test phase with screenshot/video evidence, doc updates, lint, then push and open a PR with a risk assessment. Obvious bugs auto-fix; product-implication fixes escalate to him.

"If you review every single line of code, you become the bottleneck. So I don't review this first pass code from the agent."

"Eventually I got to a point where I find myself never catching anything the agents don't catch."

Sub-agents, benchmarks, and advice

~24:17 He uses sub-agents mainly to protect the main session's context window — carving out exploration or running isolated experiments in parallel (e.g., a ProgramBench run ~200×8 across TypeScript, JavaScript, and Python to see which language helps agents pass more tests with fewer tokens). ~42:28 His bigger point: average engineers ship 10–15 PRs a month, so when agents 10x that, team processes — especially PR review — break, and many startups have stopped waiting on human review. His advice for builders: build constantly (even throwaway toys), push yourself to run more agents in parallel as a forcing function (without token-maxing for its own sake), and turn anything you do manually into an agent task. He also plugs Claude Code's /insights command for surfacing skill and memory tweaks.

"When you start to write like 10 times more PRs, we are not ready for that. We have to move ourselves out of the loop as much as possible."

Tools: Lavish, Treehouse, No Mistakes, OpenCode, Claude Code (/insights), Codex, Cursor, Claude Design, ProgramBench, tmux, git worktree, Playwright, Electron, Linear

Industry Developer Tools

Better Stack

Cloudflare Buys Vite (VoidZero): Big Tech's Open-Source Land Grab

Cloudflare acquired VoidZero — Evan You's company behind Vite, Vitest, Rollup, and OXC, "basically half the modern JavaScript toolchain" — promising the projects stay open source and community-governed, with a $1M Vite Ecosystem Fund for maintainers.^{[3]Better Stack — Big Tech Is Buying Up Open Source} Better Stack frames it as one move in an accelerating pattern: Anthropic bought Bun, Cloudflare grabbed Astro, and OpenAI took the team behind uv — a war to own the default toolchain so AI-generated code deploys to your platform.

The deal

~00:00 The whole VoidZero company — Evan You included — joins Cloudflare; Vue is explicitly excluded and stays community-governed. Cloudflare pledges all projects remain open source, vendor-agnostic, and community-driven, and seeded a separate $1M Vite Ecosystem Fund run by the core team to pay maintainers. VoidZero had previously raised $17M in VC, giving a rough reference for scale. A previously announced VoidZero deployment platform ("Void") will be shut down and open-sourced, with learnings folded into Cloudflare's own CLI (cf dev, cf deploy wrapping Vite primitives).

Why everyone is buying dev tools

~01:01 The strategic logic is deployment ownership: own the default tooling and you own the stack from dev to deploy. ~03:01 Crucially, AI models currently recommend Vercel by a wide margin for deployment — so whoever owns what AI reaches for by default wins. Cloudflare's own Vite plugin already had ~10M weekly downloads (~10% of Vite's base).

"Pretty much every front end outside of Next.js is being built using Vite, and if Cloudflare didn't support that well, AI just isn't going to use it."

The concerns

~03:30 Better Stack flags real risks: acqui-hired teams drift toward employer priorities over time; a Hacker News commenter cited Cloudflare's BastionZero acquisition (bugs piled up, changelog went dark, a month's shutdown notice) as a cautionary tale; and consolidating a historically neutral tool under one vendor is a structural change. The MIT license is the safety valve — if Cloudflare misbehaves, the community can fork, à la Redis/Valkey.

"Vite has always been a neutral tool. And while they promise to continue to be, feeling a little bit uncomfortable about it now being owned by a single vendor… I do think is slightly valid."

Tools: Vite, Vitest, Rollup, OXC, Vue, Cloudflare, VoidZero, Bun, Astro, uv, Vercel, Next.js, TanStack, Redis/Valkey

AI Models

Better Stack AI Search

Gemma 4 12B: Google Deletes the Encoder

Google DeepMind's Gemma 4 12B is an encoder-free multimodal model: vision and audio inputs skip dedicated encoder networks entirely and flow straight into the language backbone via lightweight linear projection layers.^{[4]Better Stack — Gemma 4 12B} The result approaches the quality of Google's 26B models, runs on a 16 GB-VRAM laptop fully offline, and ships under Apache 2.0.^{[5]AI Search — AI NEWS}

How encoder-free works

~00:00 Traditional multimodal models chain a vision encoder (~550M params) and an audio encoder before the LLM. Gemma 4 12B chops images into 48×48 pixel patches (2,304 values each) and passes them through a single 35M-param linear projection that just reformats raw pixels into the LLM's token space — no attention, no edge/object detection, "zero analytical thinking." Audio is sliced into 40ms / 640-value frames and projected the same way; because audio is already a chronological sequence like text, the transformer handles it natively.

"DeepMind shrunk it by completely deleting all of that heavy brain power. They realized the main language backbone is already incredibly smart and has plenty of layers to do the actual visual reasoning."

Performance and deployment

~06:03 Gemma 4 12B gets close to Google's 26B models while fitting on 16 GB+ VRAM, and ships with native multi-token-prediction drafters for fast local inference without quantization. ~07:06 The presenter found Google's official AI Edge Gallery app too buggy for image tests and instead ran the 8-bit model in MLX on an M2 MacBook Pro, calling image-reasoning speed "insane" and "the best local model I tested." AI Search adds that the 12B sits between the phone-class 4B and the 26B MoE variant, supports agentic behaviors and audio transcription, and runs in LM Studio and Ollama.

Tools: Gemma 4 12B, MLX, Google AI Edge Gallery, LM Studio, Ollama, multi-token prediction (MTP)

Developer Tools Hot Take

AICodeKing

Google Antigravity 3.0: /teamwork, Science Skills, Flash Low

Google's Antigravity coding platform shipped a batch of updates: the multi-agent /teamwork preview drops from the $200 Ultra plan to all paid tiers, a DeepMind "science skills" bundle wires in 30+ scientific databases, and a new "Flash low" mode trims thinking effort for trivial edits.^{[6]AICodeKing — Antigravity 3.0} AICodeKing's take: Antigravity is becoming a full multi-surface agent platform — though token costs and quota visibility remain weak spots.

Multi-agent /teamwork for everyone

~01:02 /teamwork orchestrates specialized sub-agents (orchestrator, explorer, implementer, reviewer, stress tester, auditor) in parallel. It was originally demoed building a functional OS from one prompt — using 93 sub-agents, 15,000+ model calls, and 2.6B+ total tokens — and is now available on the $20 plan, not just $200 Ultra. The host recommends reserving it for big refactors or full app builds where parallelism justifies the token burn.

Science skills, new Flash, and low-effort mode

~03:03 The DeepMind science-skills plugin turns Antigravity into a scientific workbench with structured access to AlphaFold, UniProt, PubMed, arXiv, ChEMBL, ClinVar, PDB, PubChem and 30+ more — with proper scripts and citations rather than generic web search. ~06:04 A new Gemini 3.5 Flash version improves endurance on long tasks (Google reset all rate limits to allow testing). ~07:05 "Flash low" cuts thinking effort for typo fixes, doc updates, and CSS tweaks. ~08:05 Antigravity CLI v1.0.4 adds session syncing between CLI and desktop.

"If a skill can make a cheaper model use the right tools, avoid mistakes, and spend fewer tokens, then that is way better than just throwing the biggest model at every task."

~09:06 The host's verdict: Antigravity now feels "less like a normal AI editor and more like a full agent platform" — but Google still needs much better usage tracking and quota visibility.

Tools: Antigravity (desktop + CLI), Gemini 3.5 Flash, AlphaFold, UniProt, PubMed, arXiv, ChEMBL, ClinVar, PDB, PubChem

AI Tools

AI Engineer

Rafael Levi at AI Engineer: Self-Healing Scrapers with Bright Data MCP

Bright Data's Rafael Levi argues the trick to web scraping at scale isn't having an LLM parse every page — it's using the LLM once to write a scraper, then letting an MCP server explore, write, run, and self-heal it.^{[7]AI Engineer — Rafael Levi, Bright Data} A live demo bypasses Walmart's anti-bot CAPTCHAs and shows the scraper-first approach saving ~62% of tokens — roughly 1M tokens per 3-page scrape.

~00:15 Levi reframes scraping: don't burn millions of tokens parsing pages — build a scraper that parses for you. ~01:16 Bright Data's MCP lets an agent explore a site's selectors, write the scraper, run it, and maintain it — he runs collections every 30 minutes where an LLM spins up, checks the data, auto-fixes breaks, and shuts down.

~03:17 The demo uses Claude Code with Bright Data's GitHub skills to build a two-input ("headphones", max pages) Walmart scraper. The MCP fetches the page as "scrape as markdown" to find selectors, then writes a Python scraper on the Web Unlocker API. ~11:20 Without MCP, the agent hits a robot-verification wall; with it, the CAPTCHA (including click-and-hold) is solved transparently. ~13:29 A token breakdown shows ~62% savings. ~18:34 He cites a 150M-IP pool, remote browsers with pre-recorded human-like mouse/typing, 500+ domain APIs, and won lawsuits against Meta and Musk as legal precedent that public data is fair game.

"Public data is public data. It doesn't matter how you collect it. The judge said it's like walking on the street, writing down prices on a counter, and selling it — it's public."

Tools: Bright Data MCP, Web Unlocker, Scrape-as-Markdown, Remote Browser, Claude Code, Claude Haiku, 500+ pre-built domain APIs

Developer Tools

AI Engineer

Dat Ngo at AI Engineer: LLM Observability & Evals with Arize

Arize AI Architect Dat Ngo lays out three pillars for production-ready agents — observability, evaluation, and experimentation — and previews automating the whole flywheel so engineers don't have to.^{[8]AI Engineer — Dat Ngo, Arize} His frame: "AI is just software reimagined."

~00:07 Working with Uber, Booking, and Reddit gives Arize a wide view of how teams build, break, and fix AI systems. ~03:08 Observability is OpenTelemetry-first: one auto-instrumentation line produces traces and spans, plus sessions (conversational state), distributional views (traffic per branch), and trajectory analysis (right components, right order). ~07:10 Evals come in five signal flavors (LLM-as-judge, human feedback, golden datasets, deterministic checks, business metrics) across four scopes (span, multi-span, trajectory, session).

~12:11 Experimentation collects low-signal traces into datasets to test prompt changes, model swaps, and orchestration tweaks. ~14:12 He closes with Alex, Arize's built-in AI assistant that ingests all telemetry and can autonomously diagnose issues, propose evals, and drive the improvement loop.

"Just because you can eval something doesn't mean you always should. You want the minimal set of evals to understand if your application is working as intended."

Tools: Arize AX, Arize Phoenix (OSS), OpenTelemetry, Claude Code, Codex, Alex (Arize assistant)

Developer Tools

AI Engineer

Audry Hsu at AI Engineer: An LLM Endpoint in Under 5 Minutes

RunPod's Audry Hsu live-demos deploying a production-ready serverless LLM endpoint from a Hub listing in under five minutes — first click to first API response.^{[9]AI Engineer — Audry Hsu, RunPod} The pitch: developers should build, not manage GPU infrastructure.

~00:07 GPU access is slow and opaque amid a supply crunch; RunPod wants to abstract it away. ~02:07 Origin story: founders started with basement GPU rigs from a failed crypto-mining venture in 2022, offered free GPUs on Reddit for feedback, and now serve 500,000+ developers, 30+ data centers, and $120M ARR. ~04:12 Four primitives: Pods, Serverless (zero cost when idle), Clusters, and the Hub (curated pre-configured AI repos).

~06:12 The demo picks a pre-vetted vLLM Hub listing, bumps the context window, and deploys to H100s (A100 backup) priced in fractions of a cent per second. ~11:22 A first query cold-starts in ~41 seconds (dominated by model download) then runs in ~1.5 seconds; always-on workers eliminate cold starts. A CLI, Python SDK, and agent-ready skills let the whole flow be scripted.

"You bring your code and we'll bring the rest."

Tools: RunPod Serverless, Hub, Pods, Clusters, vLLM, Hugging Face, RunPod CLI, RunPod Python SDK

Developer Tools

Simon Willison

datasette-agent-edit: Reusable Editing Primitives for Agents

Simon Willison released datasette-agent-edit 0.1a0, a foundational Datasette Agent plugin that provides storage-agnostic, reusable text-editing tools modeled on Anthropic's published Claude text-editor design.^{[10]Simon Willison — datasette-agent-edit 0.1a0} It gives downstream plugins three primitives so they don't reimplement editing logic.

The plugin abstracts agentic text editing into three standard tools: view (display file sections with line numbers), str_replace (locate and replace an exact, unique string — fails if ambiguous), and insert (add text after a line number). The design follows Anthropic's Claude text-editor pattern, which Willison calls his "favorite published design" for agentic editing. Because it's storage-agnostic, downstream Datasette Agent plugins can layer their own backends on top — Willison plans to build on it plugins for collaboratively editing Markdown docs, updating SQL queries, and modifying SVG files. The 0.1a0 tag signals early alpha aimed at plugin authors, not end users.

"Agentic editing of text is a little tricky to get right. My favorite published design for this is for the Claude text editor."

Tools: Datasette Agent, datasette-agent-edit, Claude text editor (Anthropic)

AI Models

Sam Witteveen

NVIDIA Nemotron 3.5 ASR: One Model to Replace Your Speech Stack

NVIDIA's Nemo team shipped Nemotron 3.5 ASR — a 600M-parameter streaming speech-to-text model covering 40 languages from one checkpoint — that Sam Witteveen says can "replace your whole speech-to-text stack."^{[11]Sam Witteveen — Nemotron 3.5 ASR} Its cache-aware streaming reuses encoder state instead of reprocessing overlapping audio, for up to a 17x efficiency gain on an H100.

~00:00 19 of the 40 languages work out of the box with auto-detection, 13 are production-grade, and 8 are adaptation languages needing fine-tuning. Community MLX/quantized ports appeared almost immediately. ~02:00 Cache-aware streaming caches the encoder's self-attention and activations and reuses them as new audio arrives — like the KV-cache trick in LLM decoding — for up to 17x efficiency on an H100.

~04:03 Chunk size (80ms to ~1s) is an inference-time setting with no retraining: smaller chunks give near-word-by-word low-latency output, larger chunks give phrase-level output — with little real accuracy difference in his testing. ~07:03 Word boosting is a decode-time "boosting tree" that biases the model toward custom vocab (product names, surnames) — he demos it fixing "Qwen," "Witteveen," and "Nemotron" with no fine-tuning. ~18:11 Diarization via the Nemo framework works well batch (podcasts) but struggled live; speaker embeddings can map names to speakers.

"This is basically a 600 million checkpoint that can probably replace your whole speech-to-text stack."

Tools: NVIDIA Nemotron 3.5 ASR, Nemo framework, Parakeet, MLX, H100, DGX, Whisper

AI Models

AI Search

MiniMax M3: Open-Source Frontier Model, 1M Context, $0.20/M

MiniMax previewed M3, an upcoming open-source multimodal model combining state-of-the-art agentic coding (beating GPT-5.5 on SWE-Bench Pro), a 1M-token context window, and image/video/desktop-action input — priced at $0.20 per million tokens.^{[5]AI Search — AI NEWS}

~30:58 M3 uses MiniMax sparse attention to handle ~700,000 words (a medium codebase) without proportional compute cost, and can verify its own code by viewing the built app's UI. On the Artificial Analysis leaderboard it leads open models by a point over Kimi K2.6 and Mimo; pricing matches DeepSeek and Mimo but runs 3x cheaper than Kimi K2. Results on the Arena blind-test leaderboard are more mixed (GLM 5.1 outranks it there). Weights are slated to drop within 1–2 weeks.

Tools: MiniMax M3, GPT-5.5, DeepSeek V4, Kimi K2, GLM 5.1

AI Models

AI Search

Qwen 3.7 Plus: An 11-Hour Autonomous Multimodal Agent

Alibaba's Qwen 3.7 Plus is a multimodal agent model that takes text, images, video, and visual input and was demoed running autonomously for 11 hours to build a complete English-vocabulary app, including self-testing and bug fixes.^{[5]AI Search — AI NEWS}

~22:49 Designed for long-horizon agentic use — reasoning across steps, analyzing on-screen UIs, and iterating on its own output. A second demo replicates a stock-charting app from reference screenshots. Benchmarks show it beating DeepSeek V4 Pro, GLM 5.1, and Gemini 2.6 on average across agentic coding, reasoning, and world knowledge. It plugs into Claude Code, Open Claude, and Qwen Code — but is API-only via Alibaba Cloud Studio for now, with no open weights yet.

Tools: Qwen 3.7 Plus, Claude Code, Open Claude, Qwen Code, DeepSeek V4 Pro, GLM 5.1, Gemini 2.6

AI Models

AI Search

NVIDIA Nemotron 3 Ultra: 550B Open MoE

NVIDIA's largest open-source release, Nemotron 3 Ultra, is a 550B-parameter mixture-of-experts model with only 55B active params, a 1M-token context, and ~5x faster inference than other leading open models.^{[5]AI Search — AI NEWS}

~41:04 Targeted at agentic workflows (planning, tool use, file reading, subtask delegation, multi-turn execution), it leans on a hybrid Mamba-transformer architecture for long-context efficiency, NVFP4 quantization, and multi-token prediction. NVIDIA claims ~5x faster inference and ~30% lower cost per completed task than rival open models. At 350 GB it needs multiple GPUs; weights are on Hugging Face.

Tools: Nemotron 3 Ultra, Hugging Face, NVFP4

AI Models

AI Search

Microsoft Goes In-House: MAI Thinking 1 & MAI Image 2.5

Microsoft released its first in-house models: MAI Thinking 1 (a 1T-parameter MoE with 35B active) and MAI Image 2.5, which beats Google's Gemini image models on Arena but trails GPT Image 2.^{[5]AI Search — AI NEWS}

~46:09 MAI Thinking 1 is a medium-sized thinking model that outperforms Sonnet 4.6 on several benchmarks but trails top closed/open models overall — currently in private preview in Microsoft Foundry. MAI Image 2.5 handles text-to-image and natural-language editing, ranking just behind GPT Image 2 on Arena. Both are paid and closed-source, available only through Microsoft Foundry.

Tools: MAI Thinking 1, MAI Image 2.5, Microsoft Foundry, GPT Image 2

AI Models

AI Search

The Open 3D & World-Model Wave: Cosmos 3, OmniDreams, Deja View, Pager

A cluster of open-source 3D and "physical world" models landed: NVIDIA's Cosmos 3 (synthetic training data for robots and self-driving), OmniDreams (real-time editable driving video), the tiny 117M-param Deja View 3D reconstructor, and Google/Meta's Pager for 360° panorama geometry.^{[5]AI Search — AI NEWS}

~24:52 Cosmos 3 is a fully-open foundation model for physical AI — text/image/video/audio/action in, photorealistic world-state video out — for autonomous-driving and robot-manipulation training. Two variants: nano (35 GB) and super (~130 GB), all on Hugging Face. ~40:04 OmniDreams generates photorealistic multi-camera driving footage that responds to the vehicle's actions, supporting edge cases like a mattress falling off a car, jaywalker insertion, and weather changes.

~03:05 Deja View reconstructs scenes into 3D Gaussian splats using just 117M params (~468 MB) by reusing one transformer block repeatedly — matching Depth Anything Three, roughly 10x larger. ~05:06 Pager (Google + Meta) predicts depth and surface normals from a single 360° panorama by treating its six cube faces as a multi-view set, and ships with the Pano InfiniGen and Zurich Pano datasets.

Tools: Cosmos 3, OmniDreams, Deja View, Pager, Depth Anything Three, Pano InfiniGen, Hugging Face

AI Models

AI Search

Layout-First Image Models: Reeve 2 & Ideogram 4

Two image models doubled down on layout control via intermediate bounding boxes: Reeve 2 (closed, #2 on the Arena text-to-image leaderboard) and Ideogram 4 (open-weights, top open model on Design Arena), both generating each element as a separately editable layer.^{[5]AI Search — AI NEWS}

~12:44 Reeve 2 first produces a code-like layout with bounding boxes determining where objects, text, and elements go, then renders — giving strong compositional control for dense posters and infographics. It ranks #2 on Arena (behind GPT Image 2, ahead of Gemini). Paid and closed-source. ~15:46 Ideogram 4 takes JSON prompts with explicit bounding boxes (an LLM can convert plain text), each region an editable layer. It's the clear open-weights leader on Design Arena but #9 on the broader blind-test Arena, with aggressive built-in censorship as its main drawback. Weights and a ComfyUI workflow are on GitHub.

Tools: Reeve 2, Ideogram 4, ComfyUI, GPT Image 2

AI Tools

AI Search

Open Video & Layer Tools: Bernini, Neva, Stable Layers

Open releases pushed on video and image layers: ByteDance's Bernini (unified video editor), Baidu's Neva (video with natively synced audio that beats a model 3,000x its size on audio params), and Stability AI's Stable Layers (decomposing flat images into transparent RGBA layers).^{[5]AI Search — AI NEWS}

~01:01 Bernini is an open-source unified video editor (an open Gemini-Omni equivalent) accepting text, image, and video references to add characters, remove objects, change perspective, swap backgrounds, and insert clips — already on Hugging Face with ComfyUI integration underway (~84 GB combined, quantized versions expected). ~44:06 Neva (Baidu Ernie team) adds natively synchronized audio to Alibaba's Wan 1.2.2 in a single pass, using just 6.3M added params yet benchmarking above LTX2.3's 19B; 720p with dual-channel audio. ~27:55 Stable Layers learns to split images into clean transparent layers using a VLM reward and RL — no paired ground-truth data — beating Qwen Image Layered (paper only so far).

Tools: Bernini, Neva, Wan 1.2.2, LTX2.3, Stable Layers, Qwen Image Layered, ComfyUI, Hugging Face

AI Tools

AI Search

Real-Time Voice, Music & Avatars: Magenta, Wave TTS, Higgs, Stream Character

Real-time generative audio and avatars advanced on multiple fronts: Google's Magenta Real-Time 2 plays like an instrument at 200ms latency, ByteDance's Wave TTS clones a voice from 3–4 seconds, Higgs Audio V3 takes inline emotion tags, and Alibaba's Stream Character streams a talking avatar in near-real time.^{[5]AI Search — AI NEWS}

~07:08 Magenta Real-Time 2 (2.4B params, open-source) responds live to MIDI, audio, and text — latency dropped from ~3s to 200ms, runs on a MacBook without a GPU, and is aimed at being a DAW plugin. ~36:01 Wave TTS (open, ~10 GB, built on F5 TTS) does zero-shot voice cloning from a few seconds of audio. ~43:05 Higgs Audio V3 (open, 9.3 GB) embeds emotion, speed, pitch, pause, and sound-effect tags inline, leading open TTS benchmarks. ~37:02 Stream Character streams a realistic avatar speaking a transcript in real time on a single H100, with motion control and 5-minute-plus generation. ~11:43 Relatedly, Mama does markerless multi-person motion capture from ordinary multi-cam video, trained on a 2.5M-crop synthetic dataset.

Tools: Magenta Real-Time 2, Wave TTS, F5 TTS, Higgs Audio V3, Stream Character, Mama

AI Tools

AI Search

ChatGPT Learns to "Dream": Background Memory Synthesis

OpenAI upgraded ChatGPT's memory with a "dreaming" feature that synthesizes context from past conversations in the background — rather than relying on explicitly saved notes — to keep memories fresh and temporally accurate.^{[5]AI Search — AI NEWS}

~09:40 Dreaming fixes stale memory: instead of, say, still thinking you're in Singapore after a trip ends and suggesting local restaurants, the updated memory recognizes the trip is over and suggests takeout near home. OpenAI reports measurable gains in task accuracy, factual recall, and temporal consistency. It's rolling out to free users, with increased memory capacity for paid subscribers.

Tools: ChatGPT

Industry

AI Search

Hardware & Robots: RTX Spark, DR02, UBTech, a Quantum Leap

The hardware beat: NVIDIA's RTX Spark brings 128 GB unified memory and 1 petaFLOP FP4 to laptops, Deep Robotics' DR02 humanoid sprints and flips circuit breakers, UBTech teased full-body companion robots, and Microsoft used AI to co-design a quantum chip it claims is 1,000x more reliable.^{[5]AI Search — AI NEWS}

~26:55 RTX Spark pairs a Blackwell GPU (6,100+ cores), a 20-core efficient CPU, 1 petaFLOP FP4, and up to 128 GB unified memory — enough to run a 72 GB model like Qwen 3.6 35B locally. Laptops ship fall 2026. ~29:57 DR02 is an all-weather industrial humanoid that sprints, backflips, and flips a delicate circuit breaker, with dust/water protection. ~30:58 UBTech teased a full-body, emotionally expressive humanoid couple debuting later in June.

~34:00 Microsoft's Project Discovery AI managed workflows, automated measurements, and suggested designs to produce a new quantum-chip material stack it claims is 1,000x more reliable than its prior qubit generation — halving its timeline to a scalable quantum computer to 2029. The headline is less the chip than AI accelerating frontier hardware R&D.

Tools: RTX Spark, Qwen 3.6 35B, DR02, UBTech, Microsoft Project Discovery

Hot Take AI Future

Nate B Jones

Nate B Jones: Why Uber's "AI Mistake" Is the Wrong Lesson

Across three shorts, Nate B Jones argues the AI-skeptic read of Uber's ROI troubles is backwards: the problem isn't that agents don't work, it's that companies measure AI by token usage and commit counts instead of end-to-end product outcomes.^{[13]Nate B Jones — Uber's massive AI mistake revealed} The deeper points: own the whole pipeline, and domain expertise is where AI hits a wall.

Rethink pipeline ownership

AI's value only lands when agents cover the full loop — customer signal → product decision → plan → code → testing → launch-risk review → rollout measurement → next decision.^{[12]Nate B Jones — Fix your AI pipeline: Rethink ownership} Keep those steps siloed and agents just optimize isolated tasks; fragmented pipelines produce fragmented gains.

"If those steps are still separate, the agents are stuck optimizing for individual tasks."

The Uber misread

Uber's COO admitted the company couldn't draw a clean line from heavy AI-coding-tool usage to measurable new customer features — which became fuel for "bubble" takes.^{[13]Nate B Jones — Uber's massive AI mistake revealed} Jones calls that the wrong lesson: the issue is measuring AI by tokens and commits, not outcomes — and public evidence shows Uber is already doing real agentic work.

"I think that's the wrong lesson, people. The important part of the Uber story is not that Uber ignored agents. Public evidence says the opposite."

Where AI hits a wall

AI produces output that "looks right" but isn't actually correct; only a domain expert can articulate the implicit constraints that were never written down.^{[14]Nate B Jones — Where AI hits a wall} His examples: a strategy partner rejecting a competitive analysis that any firm with the same model could produce, and a loan officer flagging that a debt-service-coverage ratio and a minimum-net-worth requirement have completely different monitoring triggers. AI commoditizes generic output; human expertise makes it defensible.

"Any firm with access to the same model could have produced the framing that I'm seeing here."

Developer Tools Podcast

Arjay McCandless Real Python Dwarkesh Patel

Quick Hits: Scraper Proxies, the One Page Every Project Needs, and a Stranger's Universe

Three short clips worth a minute: how web scrapers dodge IP bans with proxies, why every project needs a single canonical home page, and physicist Adam Brown on the one sentence that expanded a stranger's entire view of the universe.

How web scrapers avoid bans

Scrapers get caught when high-volume requests come from a single home IP, so they route through proxies.^{[15]Arjay McCandless — How web scrapers avoid bans} Three categories trade cost against detectability: datacenter proxies (cheap, easy to flag), residential proxies (expensive, nearly undetectable because traffic looks like real households), and cloud VM proxies on AWS/GCP/Azure (cheap and easy to spin up, but datacenter IP ranges are well-known and blockable).

The one page every project needs

Real Python makes a simple high-leverage argument: no matter how many tools a project uses (Trello, bug trackers, plans, requirements docs), there should be one canonical home page linking to all of them — including contributors and their roles.^{[16]Real Python — The One Page Every Project Needs} It reduces navigation overhead and onboarding friction with a single authoritative index.

"You should have one place that has the lead into all the other places."

A single fact that opened up the universe

In a clip from his Dwarkesh Patel conversation, physicist Adam Brown recalls hitchhiking with a prosperous rancher who, at 50, had never heard that the stars are simply distant suns.^{[17]Dwarkesh Patel — Adam Brown clip} The revelation instantly reshaped his picture of the cosmos — he called his wife, then bought Brown lunch. Brown's point: intellectual capability is widespread, but access to even basic scientific facts is not. One sentence was enough.

"He was totally intellectually capable of understanding it. He just never in his 50 years of existence up to that moment ever heard that fact."

Tony Fadell on Lenny's Podcast: Taste in the AI Era

Taste, opinion-based decisions, and micromanagement

Three generations, and the press-release-first method

The AI takes: brittle code and "luxury vs fast software"

The next device, hardware's comeback, and ethics

Shipping 40 PRs a Day: An Ex-Meta L8's Agent Workflow

The plan–code–validate model

Lavish, Treehouse, and No Mistakes

Sub-agents, benchmarks, and advice

Cloudflare Buys Vite (VoidZero): Big Tech's Open-Source Land Grab

The deal

Why everyone is buying dev tools

The concerns

Gemma 4 12B: Google Deletes the Encoder

How encoder-free works

Performance and deployment

Google Antigravity 3.0: /teamwork, Science Skills, Flash Low

Multi-agent /teamwork for everyone

Science skills, new Flash, and low-effort mode

Rafael Levi at AI Engineer: Self-Healing Scrapers with Bright Data MCP

Dat Ngo at AI Engineer: LLM Observability & Evals with Arize

Audry Hsu at AI Engineer: An LLM Endpoint in Under 5 Minutes

datasette-agent-edit: Reusable Editing Primitives for Agents

NVIDIA Nemotron 3.5 ASR: One Model to Replace Your Speech Stack

MiniMax M3: Open-Source Frontier Model, 1M Context, $0.20/M

Qwen 3.7 Plus: An 11-Hour Autonomous Multimodal Agent

NVIDIA Nemotron 3 Ultra: 550B Open MoE

Microsoft Goes In-House: MAI Thinking 1 & MAI Image 2.5

The Open 3D & World-Model Wave: Cosmos 3, OmniDreams, Deja View, Pager

Layout-First Image Models: Reeve 2 & Ideogram 4

Open Video & Layer Tools: Bernini, Neva, Stable Layers

Real-Time Voice, Music & Avatars: Magenta, Wave TTS, Higgs, Stream Character

ChatGPT Learns to "Dream": Background Memory Synthesis

Hardware & Robots: RTX Spark, DR02, UBTech, a Quantum Leap

Nate B Jones: Why Uber's "AI Mistake" Is the Wrong Lesson

Rethink pipeline ownership

The Uber misread

Where AI hits a wall

Quick Hits: Scraper Proxies, the One Page Every Project Needs, and a Stranger's Universe

How web scrapers avoid bans

The one page every project needs

A single fact that opened up the universe

Sources