Anthropic files for a $965B IPO

Industry

Tech Brew

Anthropic files a confidential S-1 at a $965B valuation

Anthropic confidentially submitted a draft S-1 to the SEC on June 1, signaling intent to go public as the most valuable private AI company, fresh off a $65B Series H that put its valuation near $965B.^{[1]Tech Brew — Anthropic's opening bid} The filing puts Anthropic ahead of OpenAI in the IPO race — though hedge language ("gives us the option") leaves room to delay if markets, or a looming SpaceX mega-listing, sour.

The draft is confidential, so terms aren't public, but Tech Brew pegs the offering as potentially the largest tech IPO in years.^{[1]Tech Brew — Anthropic's opening bid} Sam Altman downplayed the competitive timing in a CNBC hit, but the symbolism of Anthropic reaching the public markets first is hard to miss.

Analysts flagged headwinds: rising Claude pricing is generating enterprise resistance, with some customers testing cheaper open-source alternatives. Amazon — a major stakeholder — stands to see its stake appreciate sharply if the listing lands. The piece invokes the 2019 Lyft–Uber dynamic as a cautionary tale about going first in an IPO cluster.

"Some open source LLMs are as good without the price tag." — an analyst quoted by Tech Brew

AI Models

Simon Willison

Microsoft ships its own MAI models

Microsoft announced two in-house LLMs — MAI-Thinking-1, a 1T-total / 35B-active MoE reasoning model it claims is "preferred to Sonnet 4.6 in blind human side-by-side evaluations," and MAI-Code-1-Flash (137B total / 5B active), now rolling out to GitHub Copilot users in VS Code.^{[2]Simon Willison — Microsoft's new MAI models} Simon Willison initially flagged the "commercially licensed, no distillation" framing as a differentiator — then walked it back.

Microsoft's launch copy stressed that both models were "trained from the ground up on enterprise grade, clean and appropriately licensed data, without distillation from third-party models."^{[2]Simon Willison — Microsoft's new MAI models} Willison flagged that as the interesting part — but issued corrections after the technical paper revealed the corpus actually derives from large-scale web crawls (~1.2 trillion pages filtered to 794 billion, plus 24.2 billion from Common Crawl after dedup), i.e. standard industry practice rather than anything uniquely clean.

The notably low active-parameter counts (35B and 5B) point at Microsoft optimizing hard for inference cost — the Code-1-Flash branding underscores a "high performance at lower cost" pitch aimed squarely at Copilot economics.

Tools: MAI-Thinking-1, MAI-Code-1-Flash, GitHub Copilot, Visual Studio Code

AI Models

Artificial Analysis Artificial Analysis

The speech-to-text race heats up

Microsoft AI also dropped MAI-Transcribe-1.5, a speech-to-text model hitting 2.4% WER (3rd on the AA-WER leaderboard) while running at ~276× real-time — more than double the next-fastest top-10 model.^{[3]Artificial Analysis — MAI-Transcribe-1.5} The same day, Artificial Analysis launched AA-WER Streaming, a new benchmark built for the latency-sensitive voice-agent era.^{[4]Artificial Analysis — AA-WER Streaming}

MAI-Transcribe-1.5

On the offline leaderboard it sits behind Alibaba's Fun-Realtime-ASR-preview (1.7%) and ElevenLabs Scribe v2 (2.2%), scoring 1.6% on VoxPopuli-Cleaned-AA, 4.0% on Earnings22, and 2.0% on AA-AgentTalk.^{[3]Artificial Analysis — MAI-Transcribe-1.5} The headline is throughput: ~276× real-time. It covers 43 languages, supports keyword biasing for names/medical terms, and runs $6 per 1,000 minutes via Microsoft Foundry.

AA-WER Streaming

The new benchmark tests ~8 hours of audio across three datasets and measures two latency points relative to Silero VAD-detected end of speech: first final transcription and first partial transcription.^{[4]Artificial Analysis — AA-WER Streaming} On final transcription, Cartesia Ink-2 (semantic endpoints) leads at 3.59% WER / 0.21s; ElevenLabs Scribe v2 Realtime leads partials at 3.65% / 0.13s; Deepgram Flux is fastest of all at ~0.02s but a noisier 7.36% WER. Final transcripts average only ~0.7pp more accurate than first partials, and pricing spans $2–$17 per 1,000 minutes.

"Fast transcripts are especially important for keeping responses feeling natural and leaves more bandwidth for reasoning and tool execution." — Artificial Analysis

Tools: MAI-Transcribe-1.5, Cartesia Ink-2, ElevenLabs Scribe v2 Realtime, Deepgram Flux, AssemblyAI U3 Realtime Pro, Silero VAD, Microsoft Foundry

Industry

Sherwood News

Nvidia's RTX Spark PC superchip

At Computex 2026, Jensen Huang unveiled the RTX Spark — an Arm CPU, Blackwell GPU, and dedicated AI hardware fused into one "superchip" for running agentic AI locally on Windows PCs.^{[5]Sherwood News — Nvidia's PC superchip} It plants Nvidia squarely against Intel, AMD, Qualcomm, and Apple in the PC processor market. Microsoft is launch partner with a Surface Laptop Ultra; Dell, HP, and Lenovo follow later in 2026.

Pricing wasn't disclosed. Markets reacted fast: Nvidia shares jumped over 6% on the announcement while Intel, Qualcomm, and AMD fell, and Arm Holdings gained on the CPU-architecture tie-in.^{[5]Sherwood News — Nvidia's PC superchip} Huang framed local agentic compute as a platform-level shift, not a spec bump.

"There is no question this reinvention of the computer is as big of a deal as the smartphone revolution." — Jensen Huang, Computex

Tools: RTX Spark, Blackwell, Arm, Surface Laptop Ultra

Industry

Anthropic

Anthropic expands Project Glasswing

Anthropic is scaling Project Glasswing — its software-security initiative built on the Claude Mythos Preview model — from ~50 initial partners to roughly 150 more organizations across 15+ countries.^{[6]Anthropic — Expanding Project Glasswing} Early partners have already surfaced 10,000+ high- and critical-severity vulnerabilities. Alongside the expansion, Anthropic released Claude Security, a public codebase-scanning product built on Claude Opus 4.8.

The new cohort spans companies, nonprofits, and critical-infrastructure vendors in power, water, healthcare, communications, and hardware/software supply chains — partners whose codebases serve governments and millions of people. Anthropic estimates a major attack on any single partner could affect 100M+ people, and requires each to meet specific security bars before gaining access.^{[6]Anthropic — Expanding Project Glasswing}

Mythos Preview's capabilities now include patch writing, penetration testing, threat detection, and legacy-code rebuilding. The stated strategic goal is shifting the industry from mere vulnerability discovery toward disclosing, fixing, and deploying patched software at scale — while building safeguards against misuse of those same offensive capabilities.

Tools: Claude Mythos Preview, Claude Security, Claude Opus 4.8

Developer Tools

Simon Willison Simon Willison Simon Willison

Simon Willison ships a sandboxed-Python stack

Willison released micropython-wasm — MicroPython compiled to WASI and run via Wasmtime, giving each execution no default filesystem or network access plus configurable memory caps, CPU "fuel," and timeouts.^{[7]Simon Willison — micropython-wasm 0.1a1} He immediately put it to work in datasette-agent-micropython, letting Datasette Agent safely run model-generated code — and reports "GPT-5.5 has so far failed to break out of the sandbox."^{[8]Simon Willison — datasette-agent-micropython 0.1a0}

micropython-wasm offers two execution models: a one-shot run() with no state persistence, and stateful sessions (a live background interpreter, or a replay-based session that re-runs prior snippets to simulate persistence). Host Python functions can be exposed to guest code via JSON serialization, and a read-only /input mount allows controlled file access.^{[7]Simon Willison — micropython-wasm 0.1a1} He shipped 0.1a0 and 0.1a1 the same day — the patch fixing limitations found while building the Datasette integration — and flags the use of Binaryen's --translate-to-exnref postprocessing (instead of the full --spill-pointers pipeline) as needing stress-testing against hostile code.

Separately, his Pasted File Editor is a small browser tool inspired by how Claude's apps auto-convert large pastes (over 1,000 characters) into attachments rather than inline text, keeping the editor clean.^{[9]Simon Willison — Pasted File Editor} He had Codex desktop build the prototype from a plain-English description of the Claude feature and shared it as a GitHub Gist — a tidy example of using one AI tool to clone another's UX.

Tools: micropython-wasm, MicroPython, Wasmtime, WASI, Binaryen, datasette-agent-micropython, Datasette Agent, GPT-5.5, Codex desktop

Podcast

Latent Space

Latent Space × Kyle Daigle: GitHub's agent era

GitHub COO (now also CMO of Microsoft's Developer Division) Kyle Daigle joins Latent Space ahead of Microsoft Build to talk running a 3,000-person org on AI, the explosion of agent-driven activity (200M+ developers, on pace for ~14B commits this year), and the scaling outages that growth has caused.^{[10]Latent Space — GitHub's Agent Era, Kyle Daigle} His sharpest takes: the era of giant "mega skills" is over, human trust (not verification) is the unsolved bottleneck for agent-written code, and ambient context — not better coding agents alone — is the next frontier.

~00:00 Running a company on AI. Daigle's most valuable LLM use isn't "write me a blog post" but recursive, backward-looking retrospection — pulling from PRs, Slack, Obsidian notes, and Teams transcripts (via the "work IQ" MCP server) to reconstruct what worked this week. ~08:04 GitHub rolled AI out to non-engineers with one rule — nobody has to change how they work — distributing shared skills plus the CLI and the new Copilot desktop app, with read access across GitHub, Teams, email, and Slack. (Eight years post-acquisition, GitHub still runs chatops on Slack.)

~11:30 Death of mega-skills. His team now builds "incredibly micro skills" that each do one thing well — the Legos — and lets an instruction book stitch them together, because monolithic skills break weeks later "and you're screwed." Summarization is a matrix problem: the atomic verb is shared, but what it means for an analyst vs. a customer vs. marketing is a meaningful permutation. ~16:10 He pitches a "golden age for former developers now in leadership" — recounting building a full revenue-planning deck end-to-end with AI (including a SQLite app to inspect data and a skill to make the slides deliberately "humanly bad") and presenting it to the CRO and CFO without ever mentioning AI.

~22:18 GitHub history & security. From the first Actions launch (Universe 2018) through npm, CodeQL/Semmle, and Dependabot, the deepest risk has always been arbitrary code execution — the old "GitHub Services" ran user-submitted Ruby with no containers. He pushes back on the "just vendor source via AI" idea (floated by Vercel's Malte and Mitchell Hashimoto): vendoring shrinks dependencies but won't solve the core problem because agents can be convinced anything is safe.

~35:24 Codifying trust. The richest thread: reinventions of PR flow (Mitchell's "vouch," Peter's "prompt request") all struggle because "we're ultimately trying to codify trust" — a social problem, not a verification one. Stars and commit counts are passive, gameable signals; Sponsors (costs real money) is "hard trust." The endgame, he half-jokes, is some form of human digital ID.

~48:29 200M developers, 14× commits, and outages. ~1B commits in 2025 is now ~275M per week — on pace for ~14B this year, and still accelerating. That 14× has broken GitHub in novel ways: Actions needs far more CPU (expanding onto Azure), the permissioning layer still sits on a legacy DB internally called "MySQL 1," and the industry's swing back to giant monorepos stresses the git layer. He calls it a "diagonal scaling" problem and promises material (not incremental) reliability fixes over the next three months.

~62:39 Copilot's next act. After ~1.5 years of fine-tuning for next-edit suggestions, frontier models "sherlocked" that work — "what happened to Copilot." Now everything runs on a single SDK/harness powering the coding agent, CLI, desktop app, and cloud agents (bring-your-own-key). The unsolved frontier is context: he praises ambient AI and OpenClaw for connecting all the data a person cares about, frames Microsoft as the original OS company building "the new OS for AI," and previews a Build interview with Satya Nadella.

"We're ending the era of these massive, beautiful, perfect skills that are just like not any of those things."

"1 billion commits in 2025. Now it's 275 million per week, on pace for 14 billion this year."

"How I build on the weekend should be how I build at work."

Tools: GitHub Copilot (desktop app, CLI), GitHub Actions, GitHub Sponsors, work IQ MCP server, Foundry IQ, Azure dev compute, npm, CodeQL/Semmle, Dependabot, Vitess, OpenClaw, Obsidian

Podcast

AI Engineer

Kobie Crawford at AI Engineer: Task Fidelity Scaling Laws

Snorkel's Kobie Crawford presents empirical evidence that task quality — not architecture or harness — is the decisive factor in agentic RL outcomes.^{[11]AI Engineer — Task Fidelity Scaling Laws, Snorkel} Splitting terminal-bench-style tasks into "accepted" vs. "rejected" buckets via a four-criteria test, Snorkel found RL training on high-quality tasks yielded ~6% model improvement vs. only ~1% for low-quality ones — a roughly 5× uplift from data quality alone, on identical compute.

~00:14 Crawford frames Snorkel as a "frontier AI data lab" producing datasets for foundation models "to hill climb on," tracing its origins to a Stanford lab and an open-source library. ~02:15 The thesis since founding: data quality is critical — and in the agentic space, "task quality and data quality are largely the same thing."

~03:15 Four criteria. A quality task is (1) containerized for reproducibility and parallel rollouts, (2) achievable but non-trivial, (3) functionally correct, and (4) backed by a reliable environment. Tasks passing all four are "accepted"; the rest are "rejected." ~05:17 Validated with Sonnet 4.5 and Codex (GPT-5.2/5.1), accepted tasks averaged 2× the tool calls, lower pass rates, and more output tokens — markers of genuine difficulty.

~06:17 Cleaner failures. Breaking failures into meaningful (model can't reach a needed conclusion) vs. degenerate/environmental ones, accepted tasks produce "cleaner failures" — useful signal to learn from. ~08:18 The headline RL result: same model, same compute, same task count — low-quality tasks improved the base model ~1%, high-quality ~6%.

~10:19 Snorkel leans on expert-in-the-loop generation, with the platform enabling quality at scale. ~11:20 Q&A covers noise from uncompletable tasks (across terminal-bench and SWE-bench variants), under-specified tasks with hidden dependencies, the limits of one-shot "fail" labels for inherently iterative problems, and using rubrics plus LLM judges to hit high inter-annotator agreement.

"The low-quality tasks only improved the base model by about 1%... the higher-quality tasks [gave] about a 6% improvement. That 5x uplift based on just quality is really striking."

Tools: Snorkel, Harbor framework, Open Env, terminal-bench, SWE-bench, Sonnet 4.5, Codex (GPT-5.2/5.1)

Podcast

AI Engineer

Benjamin Verbeek at AI Engineer: How Lovable self-improves every hour

Lovable's Benjamin Verbeek details how the vibe-coding platform — now creating 200,000+ projects per day — chases "continuous learning at scale" so a mistake happens once and never again.^{[12]AI Engineer — How Lovable self-improves, Benjamin Verbeek} Two mechanisms power an hourly improvement loop: a "Lovable Stack Overflow" knowledge bank that detects when users get stuck and injects fixes for future users, and a "vent" tool that lets the agent itself complain directly to the team's Slack.

~00:15 The holy grail: continuous learning so users never re-explain the same thing. ~03:18 Lovable scaled from a few thousand users a year ago to 200K+ projects/day — early pains included GitHub banning them on Verbeek's first day for creating too many repos. The product problem is framed as "friction": technical users push past yellow friction and red blockers, but non-technical users (the 99% Lovable builds for) walk the moment they hit a block.

~05:19 Detecting "stuck." An LLM judge scans sessions for signals — repeated asks, complaints, explicit failure, abandonment — then splits stuck into solvable-with-better-prompting vs. not, and the latter into easy gaps vs. genuinely hard efforts. ~06:20 The "Lovable Stack Overflow" captures the transition from stuck→unstuck and asks what context should have been injected at the start to jump straight to the solution.

~09:24 New entries are clustered across similar issues, eval-checked by an agent (occasionally a human), then injected in production by a lightweight model — with a blank-injection A/B arm to rank fixes by project success. Verbeek stresses the bank goes stale "incredibly quickly" whenever a new model ships, so much old knowledge must be thrown away.

~12:25 The vent tool. Reasoning that a frustrated human would vent to their boss, Lovable gave the agent a send-feedback tool that fires only when "really frustrated" — for missing tools, unclear schemas, conflicting docs, or broken platform behavior. One vent griped about Framer Motion's TypeScript cubic-bezier types; a copy tool silently failing on filenames with spaces (non-breaking spaces from WhatsApp/Mac screenshots, missed by their regex) generated ~20 complaints in the first hour. ~16:29 Vents now feed an agent that dedupes, investigates, and opens PRs continuously (devs still review and merge); vent spikes even double as incident detection.

"We want to have a mistake happen once and then never again."

"It said it's too easy to send feedback and I can't pull it back. It was being ashamed of what it had sent to Slack."

Tools: Lovable, LLM judge, Slack, GitHub, Framer Motion, automated PR review/merge pipeline

Podcast

AI Engineer

Benjamin Cowen at AI Engineer: What Lies Beneath the API

Modal forward-deployed engineer Benjamin Cowen argues that as AI products mature and specialize, they inevitably cross into a custom domain where fine-tuning beats frontier APIs on cost, latency, and business-specific metrics.^{[13]AI Engineer — What Lies Beneath the API, Modal} His pitch: training your own model is now far cheaper than people assume — and if you've built an agent harness with evals and collected data, you probably already have what you need to train.

~01:08 The spectrum. At one end, frontier APIs unlock fast building but can't be customized beyond prompt engineering (his joke: "caveman mode," telling an LLM to talk like a caveman to cut tokens — doesn't scale at 100×). ~02:10 At the other end, full training historically meant clusters, isolation from production, and dedicated infra engineers. ~03:11 Modal's bet is a middle ground that preserves fast iteration.

~04:13 Customer results. Intercom beating their frontier API at one-fifth the cost; Pentress reporting order-of-magnitude gains; Decagon's framing that frontier labs "want their models to win on everything," while you only need to win at your business logic. ~05:14 Signals it's time to fine-tune: still paying more for your API than customers pay you, plateauing evals, or latency/throughput limits — but if you lack data or mature evals, collect first ("garbage in, garbage out").

~08:17 Training in ~300 lines. Supervised fine-tuning fits in roughly 300 lines of Python; serverless helps with hyperparameter sweeps (fan out, kill weak runs) and RL alike — one customer scaled to 50,000–100,000 sandboxes for RL rollouts. ~10:18 After training comes serving — vLLM, SGLang, Triton, or custom — auto-scaling to traffic. His close: you may be 6–12 months from training, not 10 years, so start collecting data and building evals now.

"If you've built a product, you probably have at least touched all the things you need to train if you haven't already done it."

Tools: Modal, vLLM, SGLang, Triton Inference Server, Python

Podcast

Sequoia Capital

Alfred Wahlforss at Sequoia: AI-native customer research

Listen Labs founder Alfred Wahlforss explains how his AI-first research platform runs thousands of simultaneous voice/video interviews across a 30M-participant panel, then analyzes — and increasingly simulates — customer responses.^{[14]Sequoia Capital — Listen Labs' Alfred Wahlforss} Launched ~a year ago, it already serves 20% of the Fortune 500. The thesis: as AGI makes building easy, the valuable hard part becomes knowing what to build.

~00:00 A user types a research question ("how can we improve Cursor's onboarding"); Listen auto-generates an interview guide, dispatches AI interviewers across the panel, and returns analysis and recommendations. Customers include Microsoft, Anthropic, Sweetgreen, and NBC. ~02:02 Chubbies discovered chest hair interacted poorly with a shirt material and redesigned it; Manscaped changed a Super Bowl ad on Listen insights.

~05:06 Product mechanics. Each interview is "essentially a Zoom call with the agent," with emotion detection from eyes and tone to close the say/do gap; every data point links back to source video so the AI isn't "just hallucinating." Wahlforss counters survey-bias skepticism by noting voice interviews force reasoning and yield more consistent answers than multiple-choice. ~07:09 Origin: he and his cofounder built viral AI-avatar app "BeFake," then an internal AI interviewer to understand churn — which became Listen.

~11:11 The audience moat. 80% of engineering goes into the panel. Customer bases follow a power law (Sweetgreen's ~1% core "knows what seed oils are" and drives 80% of revenue); the goal is a billion-person panel stratified by what each person is an expert on. Legacy panels suffer ~10% incidence rates. ~15:13 On consultants: AI compresses margins and forces unbundling, but implementation work retains value — Listen even works with Bain.

~19:15 Market research 3.0. Generative agent simulation (launching in a couple of months): interview one person deeply, feed maximal context to an LLM, and predict their answers at ~95% accuracy on some questions. Audiences must be continually "hydrated" via the 1M+ interviews run (a data network effect), with explicit domain bounds and back-testing — even inserting nonsense ("the name of their dog") to confirm the model knows what it can't predict. Tellingly, plain ChatGPT picked the wrong talk title where Listen's simulation picked the winner, because base models are trained on "the average person." ~30:27 The eval is the moat: it climbed from 20% to 85%, then a harder eval reset them to 20% — and a new MCP lets you tell Claude to "run Listen in a loop" to generate and test concepts.

"As we get closer to AGI, it will be easier to build things, but the hard part will be knowing what to build."

"In ChatGPT it picked the wrong one, and in our simulation it picked the right one... the models are trained on the average person."

Tools: Listen Labs, Qualtrics, ChatGPT, Claude, MCP, augmented responses, Meta/LinkedIn ads

Podcast

Dwarkesh Patel

Dwarkesh × David Reich: the Neanderthal DNA puzzle

In a clip from his Dwarkesh Podcast appearance, geneticist David Reich highlights a result "not seen in any other species": Neanderthal mitochondrial DNA and Y chromosomes cluster with modern humans, while the rest of the Neanderthal genome clusters with Denisovans.^{[15]Dwarkesh Patel — The Neanderthal DNA Puzzle, David Reich}

~00:00 Reich connects the anomaly to male reproductive competition in traditional societies — where a subset of men dominate reproduction — suggesting archaic males would have been outcompeted for local females. Evidence from central African rainforest hunter-gatherers (differential treatment of children by parental group) supports the idea that matrilineal/patrilineal expansions could replace Y and mitochondrial lineages even while autosomal DNA reflects a different history.^{[15]Dwarkesh Patel — The Neanderthal DNA Puzzle, David Reich}

"This is like a crazy result that is not seen in any other species where you see this pattern."

AI Future Developer Tools

AI Jason

Self-improving companies and closed-loop agents

AI Jason breaks down YC's "self-improving company" framing — agents handling internal ops and autonomously writing 45% of their own tools — as a control-systems problem: open-loop workflows (humans drive triggering) vs. closed-loop systems where agent status, decisions, and outcomes feed back into an intelligence layer.^{[16]AI Jason — proactive agents & self-improving company} YC's current batch reportedly hits 5× more revenue per employee than 18 months ago.

~00:00 The five elements of each AI loop: data ingestion, a policy layer encoding SOPs, tool/system access, quality gates (human or AI evaluators), and a learning feedback mechanism. The practical starting point is a memory layer plus cron jobs — agents log outcomes, extract procedural learnings into skills, and re-run on a cadence. An SEO example showed 3× traffic in 1–2 months.

~07:04 Memory architecture. JBrain (open-source Claude Code plugin) structures memory around entities (meetings, people, programs), stores markdown timeline logs, and auto-converts entries into a vector DB. The presenter's own "Loopony" plugin scaffolds company-scale loops via a copy-paste setup, and "Printing Press" tackles the problem that most APIs/MCPs/CLIs aren't agent-native — encoding 10 principles for agent-native CLI design.

~06:04 Live experiment. An autonomous ad loop (by "Gio") ran cron-driven skills for ad analysis, copywriting, image generation, and research. Week one it tested 10 formats (whiteboard sketches, notebook pages, cardboard signs, tweet screenshots) and learned that low-production "ugly" assets won; week two it autonomously settled on a whiteboard format and a free-skill-pack offer, generating 243 leads on a $1,500 budget.

"YC companies in the current batch are already hitting 5x more revenue per employee compared with 18 months ago."

"Agent learns that ugly ad assets... actually win better."

Tools: Claude Code, JBrain, Loopony, Printing Press, MCP, Google Analytics, Ahrefs, Google Search Console

AI Tools Hot Take

AICodeKing

PewDiePie's Odysseus and the free-API loophole

AICodeKing reviews Odysseus, a self-hosted AI "super app" launched by PewDiePie that bundles agent/plain chat, a hardware-scanning model "cookbook," deep research (built on Tongyi Deep Research), blind model comparison, memory, email, notes, calendar, and a document editor — built on Open Code and designed not to monetize users.^{[17]AICodeKing — Odysseus + Gemma-4 26B & FREE APIs}

~00:02 Odysseus is local-first via Ollama (the presenter ran Gemma 26B with working web-search and shell tools in agent mode) but also supports OpenRouter and Nvidia NIM. Built on Open Code, it inherits strong default agent capabilities; setup on a Mac via Verdant was straightforward.

~04:04 The free-API angle. Although it's local-first, the presenter notes free model APIs — OpenRouter's free tier (including Kimi K2.6) and Nvidia NIM — make it viable without powerful local hardware, undercutting paid alternatives the title pegs as "RIP" (Hermes, OpenClaw). "Obviously, this is not the expected use case, but it's still good nonetheless."

"This project will obviously not monetize the users like some other open-source projects have done in the past."

Tools: Odysseus, Open Code, Ollama, Gemma 26B, OpenRouter, Nvidia NIM, Kimi K2.6, Tongyi Deep Research

AI Tools

AI Search

Nvidia's Pixel Diffusion: free, sub-second 4K upscaling

AI Search walks through Pixel Diffusion (PD), Nvidia's new open-source 4K upscaler that denoises directly in pixel space (not latent space) for sharper, fewer-artifact results — upscaling 1K→4K in under a second, up to 5.9× faster than SeedVR2.^{[18]AI Search — The BEST AI for 4K images} The upscaler weights are tiny: 2.7 GB (BF16) or 1.5 GB (MXFP8, which needs a 50-series Blackwell GPU).

~00:00 Side-by-side comparisons show PD more consistent and faithful to source textures than SeedVR2, with the presenter calling it "probably the best and fastest and most efficient upscaler you can use right now."

~04:00 ComfyUI workflows. Three downloadable workflows: standalone text-to-1K (least impressive vs. Zimage/Flux), image→2K/4K upscale, and a generate-then-upscale pipeline (Zimage Turbo, Flux 2, or SD3 → PD). Required models include a Gemma 2 2B text encoder and the AE.safetensors VAE. He recommends workflows 02 or 03 over standalone text-to-image. ~11:08 A sponsored segment covers Higsfield's all-in-one creation platform (Cedance 2.0 video, GPT Image 2, a "Supercomputer" agent).

Tools: Pixel Diffusion (PD), ComfyUI, SeedVR2, Z Image Turbo, Flux 2, Stable Diffusion 3, Gemma 2 2B

Developer Tools

DeepLearningAI

Andrew Ng's vibe-coding course for non-coders

Andrew Ng pitches a no-code on-ramp: describe what you want in plain English, let an AI chat generate an HTML file, open it in a browser, and iterate by prompting — no IDE, no terminal, no manual code editing.^{[19]DeepLearningAI — Build Your Own App In 30 Minutes, Andrew Ng}

~03:04 Five prompt building blocks: goal, input, output, layout, and special features. Ng's food-truck analogy — "give me a sandwich" vs. "a vegetarian sandwich with hummus on multigrain bread" — drives home that specificity yields predictable results, and prompts can be built incrementally or all at once.

~19:12 Two sample apps. A birthday-card generator (fill-in-the-blank inputs, an "I'm Feeling Lucky" button, copy-to-clipboard) and a Pong game (difficulty levels, scoring, custom colors) — each a single dependency-free HTML file. On debugging, the fix is to describe the broken behavior in plain English and ask the AI to fix it, without understanding the underlying JS error.

"The easiest way to create software in the AI era is no longer to type out code yourself. Instead, you should tell AI to do it for you."

Tools: ChatGPT, Gemini, Claude, HTML, JavaScript

Developer Tools

marimo marimo marimo

marimo's new interactive widgets and molab

marimo shipped a run of updates: a paint widget that feeds canvas strokes into a Stable Diffusion image-to-image pipeline on a Modal GPU, updating ~once per second in near real-time;^{[20]marimo — Python Widgets for Stable Diffusion} a circular slider that wraps past its max for angles/times;^{[21]marimo — Inventing a new Slider} and a molab upgrade that runs shared notebooks on the server without forking.^{[22]marimo — molab got a bunch better}

~00:00 Paint widget. Because the widget partially lives in the frontend, a Python loop can update the image background asynchronously while you draw. A "random pixel" brush tells the model to re-imagine specific regions rather than read white pixels as blank, and a text-prompt field plus hyperparameter controls steer output live — pairing marimo's reactive widgets with on-demand Modal GPU compute.^{[20]marimo — Python Widgets for Stable Diffusion}

The circular slider (single-value variant included for dashboards) ships in the latest Wiggly Stuff release.^{[21]marimo — Inventing a new Slider} And molab now renders a LaTeX-aware preview by default when you open a shared link, with a toggle for WebAssembly, code view, or a new server-run mode that discards changes on close unless you hit "save a copy" — keeping personal workspaces clean.^{[22]marimo — molab got a bunch better}

Tools: marimo, molab, Modal, Stable Diffusion, Wiggly Stuff

AI Models

Sequoia Capital

The hidden numerical bug in large-scale RL

In a Sequoia clip, a Cursor engineer explains a subtle, pervasive bug in asynchronous RL: when re-running forward passes to recompute log probabilities for previously generated tokens, floating-point non-determinism in sparse-model inference makes the values differ — sometimes significantly — even with the same model version.^{[23]Sequoia Capital — Cursor: The Hidden Bug in Large-Scale RL}

~00:00 Training large sparse models like Kimi with async RL means the generating model may lag the trainer checkpoint by a few steps, so the trainer must re-run forward passes. In theory the same version gives identical log probs; in practice "you get slightly or sometimes very different log probability values for the same tokens." The engineer also notes future Cursor Composer versions aim to use their own base model rather than an open-source one.^{[23]Sequoia Capital — Cursor: The Hidden Bug in Large-Scale RL}

Tools: Kimi, Cursor Composer

Hot Take Productivity AI Future

Nate B Jones Nate B Jones Nate B Jones Lenny's Podcast

How AI rewrites team size and meetings

Nate B Jones argues meetings flip from worthwhile to net-negative as AI multiplies individual output: at ~$250K/person the coordination tax was worth paying, but at ~$2M/person "most of those meetings end up being net negative, destroying value at a rate that scales with how productive your people are."^{[26]Nate B Jones — Why your meetings are destroying your output} His prescription is the five-person AI "strike team."^{[24]Nate B Jones — Is your AI team actually efficient?}

The five-person strike team. Five is the right size when correctness is the constraint — not for headcount, but because every AI-generated output passes through at least one peer with enough context to catch meaningful errors at the right level of abstraction.^{[24]Nate B Jones — Is your AI team actually efficient?} Five people can collectively span product, engineering, design, data, and domain — prioritizing error-catching fidelity over raw throughput.

Dunbar, validated by the military. Dunbar's layered limits — 5 core, 15 deep trust, 50 meaningful, 150 stable — aren't soft guidelines; army mathematicians confirmed communication effectiveness peaks at exactly these sizes.^{[25]Nate B Jones — Why you only have 150 friends}

And impact stays unpredictable. On Lenny's Podcast, a guest skewers job-exposure scoring — comparing it to the failed expert-systems approach to image recognition. Just as 1997 analysts wouldn't have predicted ride-hailing apps devastating taxi drivers, which jobs AI transforms is fundamentally unpredictable with current frameworks.^{[27]Lenny's Podcast — We can't predict AI's impact}

"At $2 million per person, most of those meetings end up being net negative... destroying value at a rate that scales with how productive your people are."

"You can't look at a senior partner at a law firm and say, well, 17% of their work could be automated."

AI Models

Nate Herk

100 years of AI, explained

Nate Herk traces AI from Turing's 1950 Imitation Game and the 1956 Dartmouth conference through two AI winters, the 2012 AlexNet breakthrough, the 2017 transformer, and ChatGPT's record-breaking launch — ending on Claude Code's rise to $1B+ ARR within six months of launch.^{[28]Nate Herk — 100 Years of AI Explained}

~00:00 Turing to AlexNet. From Turing's Bombe and McCarthy coining "artificial intelligence," through the symbolic-vs-neural split and the 1987 collapse of the Lisp-machine expert-systems market, to Hinton's 1986 backprop revival — finally unblocked by Nvidia GPUs and Fei-Fei Li's ImageNet, letting AlexNet win the 2012 ImageNet competition at 15% error, 11 points below the prior year.

~13:03 Transformers and the gold rush. The 2017 transformer (built for translation) became GPT-1/2/3; ChatGPT hit 1M users in 5 days and 100M in 2 months. Microsoft put ~$13B into OpenAI; by April 2026, Amazon committed an additional $25B and Google $40B to Anthropic.

~16:03 Claude Code's dominance. While OpenAI chased consumers and Google embedded Gemini, Anthropic focused on developers — and by November 2025 Claude Code was generating over $1B/year, just six months after launch, "one of the fastest revenue jumps in software history."

"By November 2025, it was bringing in over a billion dollars a year. Only 6 months after their launch."

Tools: AlexNet, ImageNet, ChatGPT, GPT-1/2/3, Claude, Claude Code, Gemini, Artifacts

AI Future Developer Tools Industry

Two Minute Papers OpenAI Real Python Acquired

Quick hits

Shorter items worth a look: a second Nobel for AlphaFold, OpenAI's new in-Codex app sharing, the social skill of code review, and an Acquired lesson on operating leverage.

A second Nobel for AlphaFold? A Two Minute Papers teaser floats the idea, noting 3M+ researchers now use AlphaFold for "incredibly important work" — making a repeat plausible.^{[29]Two Minute Papers — A Second Nobel Prize for AlphaFold?}

OpenAI: build and share apps in Codex. OpenAI posted a demo showing apps being built and shared directly within Codex — pushing the coding agent further toward a publish-and-distribute surface.^{[32]OpenAI — Build and share apps in Codex}

Code review is a social skill. A Real Python clip notes that reviewers who raise the same concern repeatedly without traction learn to let it go — knowing when to advocate vs. defer comes from reading team culture.^{[30]Real Python — When to Push Back, When to Let Go}

Operating leverage cuts both ways. On Acquired, the hosts recount the Wellington Fund's AUM falling from $2B to $480M by 1973 — mostly via redemptions — to illustrate how investment managers, like software companies, enjoy high operating leverage on the way up and suffer it brutally on the way down.^{[31]Acquired — The Wellington Fund's AUM collapse}

Anthropic files a confidential S-1 at a $965B valuation

Microsoft ships its own MAI models

The speech-to-text race heats up

MAI-Transcribe-1.5

AA-WER Streaming

Nvidia's RTX Spark PC superchip

Anthropic expands Project Glasswing

Simon Willison ships a sandboxed-Python stack

Latent Space × Kyle Daigle: GitHub's agent era

Kobie Crawford at AI Engineer: Task Fidelity Scaling Laws

Benjamin Verbeek at AI Engineer: How Lovable self-improves every hour

Benjamin Cowen at AI Engineer: What Lies Beneath the API

Alfred Wahlforss at Sequoia: AI-native customer research

Dwarkesh × David Reich: the Neanderthal DNA puzzle

Self-improving companies and closed-loop agents

PewDiePie's Odysseus and the free-API loophole

Nvidia's Pixel Diffusion: free, sub-second 4K upscaling

Andrew Ng's vibe-coding course for non-coders

marimo's new interactive widgets and molab

The hidden numerical bug in large-scale RL

How AI rewrites team size and meetings

100 years of AI, explained

Quick hits

Sources