April 16, 2026
Anthropic released claude-opus-4-7 at the same $5/M input, $25/M output pricing as 4.6, but with a new tokenizer that encodes the same prompt into up to 1.35x more tokens
[1]Anthropic — Introducing Claude Opus 4.7.
Vision leaps: it accepts images up to 2,576px on the long edge (~3.75 megapixels, 3x prior Claude), scores 98.5% on XBOW visual acuity vs 54.5% for 4.6, and Rakuten reports it resolves 3x more production SWE tasks
[1]Anthropic — Introducing Claude Opus 4.7.
A new xhigh effort level sits between high and max, and Claude Code's default effort was silently upgraded to xhigh, meaning identical workflows now burn roughly as many tokens as Opus 4.6's old max
[3]Better Stack — Opus 4.7 Is GREAT (except the token usage).
A new /ultrareview slash command triggers dedicated review sessions — Pro and Max users each get three free ultrareviews
[1]Anthropic — Introducing Claude Opus 4.7.
The default effort level jumps to xhigh across all plans, and auto mode is extended to Max subscribers as a safer alternative to bypass-permissions
[2]Developers Digest — Claude Opus 4.7 in 5 Minutes.
On the API side, task budgets are now in public beta and file-system-based memory is improved for long-running multi-session agents
[1]Anthropic — Introducing Claude Opus 4.7.
Better Stack tested 4.7 head-to-head with 4.6, GPT-5.4, and Gemini 3.1 on a personal-finance dashboard (~00:00). Opus 4.7 won with a clean dark-mode React/TypeScript/Vite app and a working in-memory Express backend — but skipped a persistent DB (Opus 4.6 had used SQLite) and oddly picked React 18 + React Router 6 despite a newer cutoff[3]Better Stack — Opus 4.7 Review.
The cost warning (~02:30): the new tokenizer can use up to 35% more tokens on the same input, and 4.7 thinks more at higher effort levels. The presenter notes 4.7's high actually outscores 4.6's max while using fewer tokens — so dropping from xhigh to high in Claude Code settings may preserve quality while cutting costs.
The same input prompt could now use up to 35% more tokens, and it also thinks more. So that's even more tokens to burn.
Anthropic bundled Claude Code into a redesigned desktop app next to Claude chat and Claude Co-work — positioned as an "agent orchestration command center" with multi-session sidebar, integrated terminal, and drag-and-drop workspaces [5]The AI Daily Brief — Vibe Coding Gets an Upgrade. Theo (t3.gg) spent an hour with it and found 40+ bugs — broken copy-paste, hotkeys targeting the wrong window, voice input bleeding into every textbox — and built his own open-source competitor T3 Code with Julius in one-fifth the time [4]Theo - t3.gg — Claude's new Cursor killer just dropped. Alongside the app, Anthropic introduced Claude Code routines: event-triggered agents (GitHub events, API calls) running on Anthropic-managed cloud infrastructure — "dynamic cron jobs" [5]The AI Daily Brief — Vibe Coding Gets an Upgrade.
Per AI Daily Brief (~02:01), the app is explicitly built for how agentic coding feels now: many sessions in flight, humans in the orchestrator seat. Features include a sidebar of active/recent sessions filterable by status/project/environment, an integrated terminal and file editor, and drag-and-drop workspace layout. Cursor 3 shipped an identical UI earlier this month and Codex is signaling the same — the three stacks are converging.
Cursor, Codex, and Claude Code desktop app look exactly the same now.
Also introduced (~05:03): Claude Code routines. A routine packages a prompt, repos, and MCP connectors, then runs on Anthropic's cloud infrastructure — keeping the agent working when your laptop is closed. Anthropic uses them internally for docs maintenance and backlog grooming. Enterprise vibe-coding hardening is the companion thread: Lovable added native payments ("you're not vibe coding PCI level one"), Superblocks 2.0 pitches vibe coding as an "enterprise attack vector," and Microsoft is testing Open Claw inside Copilot (~07:03).
Theo opens with begrudging approval — the new app uses less memory than the CLI ("a trash piece of software") and the Claude virtual machine service burns ~2.5 GB RAM but doesn't wreck the system [4]Theo - t3.gg — Claude's new Cursor killer. Then he hits an avalanche of UX failures inside minutes (~04:00):
command-1/2/3) always target the first window regardless of focus.gitignore editsAI is really good at building the happy path… but as soon as you hit an edge case… models do not find edge cases. Users find edge cases.
Theo and Julius shipped T3 Code, a free open-source desktop app, in roughly 1/5 the time (~06:40). Features he shows off: favicon auto-grab, multi-tab with scroll built in two prompts, diff view via diffs.com, snapshot-based before/after, properly bound hotkeys, and handling of multi-million-token threads (Julius scrolled 10 minutes to reach the top of one). The Codex CLI is Apache-2.0 and ships an app server any harness can plug into; only the Codex desktop is closed, which annoyed Theo enough to build T3 Code in the first place.
If the Codex app was open source like the CLI is, I probably wouldn't have made T3 Code.
He closes on strategic intent (~23:40): lab-built coding apps exist to lock users in and showcase models. On both fronts, Theo says Claude Code fails. He calls out Claude still demanding CLAUDE.md / .claude/ while Cursor and friends adopt the AGENTS.md / agents/ standard, and notes Anthropic's terms of service are "shitty" specifically to block switching.
The level of slop that is being shipped by Anthropic is unfathomable and y'all just grin and bear it.
OpenAI radically expanded Codex from a coding assistant into a general-purpose autonomous agent. A new "background computer use" mode lets Codex drive its own cursor across any Mac app in parallel with the user, spin up multiple simultaneous agents, and schedule future tasks via "Heartbeat Automations"
[6]OpenAI — Codex for (almost) everything.
Codex now supports 90+ plugins (Atlassian Rovo, CircleCI, GitLab Issues, Microsoft Suite, Neon by Databricks, Superpowers), ships image generation via gpt-image-1.5, and has cross-session memory
[7]OpenAI — Codex for (almost) everything (YouTube).
Usage is 6x since January, now over 2M weekly users
[7]OpenAI — Codex for (almost) everything (YouTube).
Codex now operates a dedicated desktop cursor that opens apps, clicks, types, and runs tasks without blocking or interfering with the user's current work [6]OpenAI — Codex for (almost) everything. An in-app browser lets users annotate webpages to give Codex context and instructions, and the browser is initially scoped to frontend/game dev tasks[7]OpenAI — Codex for (almost) everything (YouTube). Heartbeat Automations let Codex schedule future tasks and resume long-horizon work — the same "dynamic cron" pattern Anthropic just shipped with routines.
gpt-image-1.5 for mockups, product concepts, in-game visuals.Cited enterprise pilots: Notion and Ramp for engineering automation [7]OpenAI — Codex for (almost) everything (YouTube). OpenAI's positioning is that Codex is now an agentic operating layer — not a coding chat — with the same "orchestration command center" framing as Anthropic's redesigned Claude Code.
Codex goes from a code-editor-centric tool to covering full-day agentic workflows similar in scope to Anthropic's computer-use offerings.
Nate Herk argues Opus 4.7 isn't as big a leap as it looks — Anthropic silently dropped 4.6's effort default to "medium" and disabled extended thinking on February 9 without announcing it, then released 4.7 as a "fix" [8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?. He cites an AMD senior director's analysis of ~7,000 Claude Code sessions finding thinking depth collapsed 73% and the model skipped reading files before editing them 33.7% of the time, with $200/month plans burned through in hours [8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?.
At ~00:00, Nate lays out the conspiracy: users reported hallucinated git commit hashes, fake package names, and premature task abandonment starting in February. The AMD engineer's 7,000-session audit is the core evidence — thinking depth down 73%, file-read-before-edit compliance down to 33.7%. Nate's framing: 4.7's gains partly restore deliberately degraded 4.6 behavior.
At ~04:02, Nate concedes the structural wins: a new X-high effort tier, vision gains from architecture changes, a more expensive tokenizer (1–1.3x), SWE-bench Pro jumps, 2x+ biomolecular reasoning, and a new /ultra-review slash command. Anthropic also published a 232-page system card for 4.7. In hands-on tests Nate found 4.7 better at financial chart analysis and caught its own math errors in a SaaS model — but rated the 4.6 interactive deliverable more polished.
At ~10:01, Nate piles on the desktop app critique: Theo found 40+ bugs within one hour — broken buttons, voice input bleeding into every textbox, layout issues. He asks how "one of the world's largest AI companies" shipped trivial one-prompt fixes as day-one bugs.
Simon Willison ran Qwen3.6-35B-A3B locally on a MacBook Pro M5 using a 20.9 GB quantized GGUF (Unsloth's Q4_K_S) through LM Studio — and found it outperformed Claude Opus 4.7 on the SVG pelican benchmark[9]Simon Willison — Qwen3.6 on my laptop drew a better pelican than Opus 4.7. On the pelican-on-bicycle test, Qwen produced correct bicycle geometry, clouds, and a detailed pouch; Opus 4.7 — even with thinking_level: max — generated a frame with entirely wrong shape. Simon notes he doubts the quantized local model beats Opus on general capability — but for SVG illustration tasks specifically, the local model currently wins.
llm-lmstudio pluginthinking_level: max
On "flamingo on unicycle," Qwen added creative SVG comments like <!-- Sunglasses on flamingo! --> and richer character details, while Opus produced a "competent if slightly dull vector illustration" lacking visual personality. Simon is careful to bound the claim: local Qwen is not beating Opus at general tasks — just at this specific aesthetic SVG task.
Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7.
Better Stack pitted Anthropic's cloud-based /ultraplan against the local Superpowers skill on a real release-pipeline feature. Superpowers asked double the clarifying questions (6 vs 3) and produced a plan 4x longer (833 vs 195 lines) with test-driven task breakdowns — but Ultraplan wins for async workflows: kick off on a laptop, continue on a phone
[10]Better Stack — Claude Ultraplan vs Superpowers.
The /ultraplan slash command spins up a cloud container, clones the GitHub repo, and produces a plan in ~2-3 minutes — suggesting sub-agent parallelism
[10]Better Stack — Ultraplan vs Superpowers. It consumed 33% of the creator's monthly Pro token quota for one plan and initially cloned the wrong repo (context issue; corrected after a comment).
Superpowers runs entirely locally, asks 6 clarifying questions (vs 3), and has two phases: a design plan and an implementation plan. Output was 833 lines with test cases and test-first task execution, and used ~75k tokens with prompt caching. The creator's preference for 90% of local work, because skills, MCP tools, and the whole dev loop live there.
At ~06:09: device-agnostic async — start a feature on a laptop, close it, continue on a phone or tablet. Requires repo on GitHub and a Pro or Max sub. The creator forgot to install the Claude GitHub app on the demo repo, which would have auto-created PRs.
If I do choose to work locally, which I'm doing 90% of the time, then I'll probably use Superpowers because all my code is there locally, my skills, my MCP tools, everything.
Simon shipped llm-anthropic 0.25, adding support for claude-opus-4.7 (including thinking_effort: xhigh) and two new options: thinking_display (shows extended reasoning output as JSON) and thinking_adaptive. Default max_tokens raised to the per-model maximum; an obsolete beta header removed
[11]Simon Willison — llm-anthropic 0.25.
Separately, he built a Claude Artifact to edit the datasette.io news YAML with live preview and validation, flagging both markdown syntax and YAML formatting errors before they reach the live site
[12]Simon Willison — datasette.io news preview.
claude-opus-4.7, supports thinking_effort: xhighthinking_display: boolean for displaying extended reasoning (JSON output only)thinking_adaptive: boolean for adaptive thinking behaviormax_tokens raised to per-model maximum
The datasette.io site maintains its news section as a YAML file in a GitHub repo, which Simon found error-prone to edit directly. He used Claude Artifacts' ability to analyze GitHub repos in conversation to build a custom preview UI that accepts YAML input and renders how entries will appear on the homepage, plus a validation badge surfacing markdown/YAML errors before commit.
OpenAI released GPT-Rosalind, its first domain-specialized life sciences model — targeting biochemistry, genomics, and drug discovery workflows. Named after Rosalind Franklin, it ranked above the 95th percentile of human experts on prediction tasks, hit the 84th percentile for sequence generation, and posted the highest published score on the BixBench bioinformatics benchmark [13]OpenAI — Introducing GPT-Rosalind for life sciences research. Launch partners include Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific, via a limited "trusted access program."
GPT-Rosalind supports evidence synthesis, hypothesis generation, experimental planning, and multi-step research. It can query specialized databases, parse literature, call computational tools, and propose experimental pathways. A companion Life Sciences Codex plugin connects the model to 50+ scientific tools and data sources.
Initially a research preview for qualified enterprise customers — not general availability. Positions OpenAI directly against Google's life-sciences push; no public pricing was announced.
OpenAI framed the model as a tool to help scientists move faster through analytically demanding work, not as a replacement for researchers.
OpenAI's Joy Jiao (research lead) and Yun Wang (product lead) walk through the new biochemistry-focused model series that anchors GPT-Rosalind, a research plugin with 50+ templatized skills, the Ginkgo Bioworks collaboration where GPT-5 designed a biology experiment that produced a non-zero amount of protein on its first try, biosecurity safeguards, and a 10-year vision of autonomous research institutes [14]OpenAI Podcast — Ep 16: Building AI for Life Sciences.
At ~01:00, Joy and Yun describe a biochemistry-focused model series anchored on complex research workflows. Starts with genomics and protein understanding, focused on early discovery where more thinking time can break research bottlenecks. Ships across ChatGPT (literature synthesis) and Codex (long-trajectory agentic work). The life sciences research plugin exposes 50+ "templatized repeatable workflows" — cross-evidence match across papers, pathway analysis — as one-click deploys on top of the base models.
One of the taglines was to scale test-time compute to cure all disease. So that is like our team tagline.
At ~05:06: when GPT-5 finished training, the team wasn't sure it could do any biology (training data was mostly math and CS). The Ginkgo collaboration (July 2025) tested whether models could design experiments that actually produce the desired product. First set of designs came back with non-zero protein production — surprising even the team.
The future that me and Joy see is that it's no longer human bottlenecks but rather maybe compute bottlenecks.
At ~09:10: bio is one of the most severe rising-capability risks. Precursor steps to dangerous pathogens look benign and similar to legitimate research ("help me clone a gene" could be GFP or a toxin). Current stance is risk-averse self-refusal, frustrating professional scientists. Roadmap: differentiated access — verify users at legitimate research institutions or pharmas where reagents are tracked, then unlock fuller capability.
The safest model here would be a model that had no capability — it's very safe, but it's not very good.
At ~20:14, Joy frames two scaling axes: parameter growth (GPT-2 → GPT-3 produced emergent properties) and test-time compute on reasoning models. Models can now "think for days or effectively forever" about a problem — reframing data centers as complex reasoning infrastructure (hence Stargate). At ~24:16: she wants to see AI design a new drug or cure a disease within a few years. Nearer-term wins: drug repurposing via mechanistic understanding and personalized medicine via ASO/RNA design.
You have these autonomous labs — mostly robots, all hooked up to AI — autonomous research institutes that are constantly running and curing human disease.
OpenAI is scaling Trusted Access for Cyber (TAC) — originally launched February 2026 — to thousands of verified individual defenders and hundreds of teams. The centerpiece is GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 with lower refusal boundaries for legitimate defensive work, supporting binary reverse engineering for malware analysis and vulnerability detection [15]OpenAI — Accelerating the cyber defense ecosystem. OpenAI's Codex Security agent has contributed to 3,000+ critical and high-severity vulnerability fixes, and Codex for Open Source has scanned 1,000+ projects.
chatgpt.com/cyber; enterprise direct with OpenAIPhysical Intelligence co-founder Quan Vuong lays out PI's thesis: one cross-embodiment foundation model trained on ~10 robot platforms beats specialists by 50%[16]YC — The GPT Moment for Robotics Is Here. Nearly all of PI's complex demos — coffee-making, laundry folding, mobile nav — run inference in a real remote data center via "real-time chunking." And he gives a full playbook for the "Cambrian explosion" of vertical robotics startups.
At ~05:04, Vuong walks through the Open Cross-Embodiment (RT-X) result: absorb data from ~10 heterogeneous platforms into a single high-capacity model, and the generalist beats every specialist by 50%. The model learns "how to control any robot," not one specific platform — which sidesteps the drift problem where single-platform data goes stale every 3 months.
There is this joke in robotic grad school that if you want to add two years to your PhD, just work on a new robot platform.
At ~23:13, the hot take: almost all of PI's robot evaluations — including the really complex demos — run inference on a model hosted in a real remote data center, not on-device. "Real-time chunking" queries the next chunk 50ms before the current ~100ms queue ends so chunks stitch together smoothly. Vuong says he has never physically seen the Weave or Ultra robots and doesn't know how their data is collected — an intentional decoupling.
Almost all of the robot evaluation that we run at PI today, including the really complicated demo… the model is actually hosted in the cloud. This is not a server in the office. It's a real cloud.
At ~13:08: tasks that last year needed hundreds of hours of data collection are now doable zero-shot. Weave (YC, ex-Apple founders) folds diverse laundry in a real laundromat. Ultra (YC) runs in a real e-commerce warehouse packing real Amazon orders into narrow soft pouches — 100 minutes at 4x speed, minimal intervention, with lighting changes historically brutal for robotics vision.
It still blows my mind to see a robot actually folding laundry because until ChatGPT I didn't know if this would exist even in my entire lifetime.
At ~29:18, Vuong names the recipe for vertical robotics startups: (1) understand the existing workflow; (2) use cheap hardware — reactive models compensate for imprecise motion; (3) build data-collection + eval infra; (4) deploy mixed-autonomy with humans correcting failures; (5) reach break-even per robot before scaling.
The equation for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.
At ~43:37: Vuong's dream side project is an automated robotics research scientist. PI already runs a Claude-based "pre-training on-call" agent babysitting training runs — delivering ~50% improvement in compute utilization. "An embarrassingly large amount of money on API queries."
The episode's framing: most post-fundraise hiring surges fail because companies add ammunition (people who need direction) without adding barrels (people who can independently drive an initiative from idea to done). Hiring more without expanding barrels just stacks people behind the same bottlenecks [17]Lenny's Podcast — Hire barrels, not ammunition.
Can they take an idea and make it happen? One way or the other, they're going to get your company across that hill. That's a barrel.
The failure mode: a CEO hires aggressively, burn rate spikes, output per unit of time barely moves. The real bottleneck is the small count of people who can take an initiative from inception to success. More bodies = more collaboration tax, same barrels.
The number of people that can independently drive an initiative from inception to success is very limited within any company.
If you hire more people without expanding the number of what I call barrels… all you're doing is stacking people behind the same initiatives.
Mario Zechner's AIE Miami talk is a three-act tragedy: why he left Claude Code, why he built Pi (a minimal self-modifying coding agent), and why the wider OSS world is drowning in agent-generated slop [18]Mario Zechner at AI Engineer — Building Pi in a World of Slop. Punchline: "Slow the fuck down. Learn to say no. Fewer features, but the ones that matter."
The harness kept mutating his context behind his back (changing system prompts, injecting "may or may not be relevant" system reminders), offered zero observability, zero model choice, and only shallow hooks spawning a new process per invocation. Alternatives had their own sins — OpenCode prunes tool output past a token threshold ("lobotomizes the model"), injects LSP errors into edit tool results, stores every message as an individual JSON file, and defaults CORS so any website can hit the local server.
My context wasn't my context. Cloud code is the thing that controls my context.
Terminal Bench's December 2025 leaderboard shows the minimalist Terminal harness — which only sends keystrokes to a tmux session — beats native harnesses across model families.
We are in the fuck around and find out phase of coding agents, and their current form is not their final form.
Four packages (AI abstraction, agent core = while-loop + tool calling, bespoke TUI framework, the coding agent itself). Four tools (read, write, edit, bash). A tiny system prompt that hands the agent Pi's own handcrafted docs plus example extensions. Extensions are hot-reloaded TypeScript modules shipped via npm. YOLO-by-default on security because he thinks per-call approval dialogs aren't real security. User-built extensions include a 5-minute clone of Anthropic's /bashing, Nico's chatroom of agents talking to each other, and Pi playing NES / Doom. Pi placed sixth on Terminal Bench in October. Pi became the agentic core inside OpenCode; his issue tracker filled with LLM-generated garbage, so he built an auto-close PR filter asking humans to write issues "in your human voice, no longer than a screen of text" — clankers don't read comments. Mitchell turned it into vouch.
You don't need 10,000 tokens to tell them they're a coding agent. They know — because they are coding agents.
Agents are "compounding boo-boos with serial learning, no bottlenecks, and delayed pain." 90% of training data is "our old garbage," so every local agent decision pulls in over-abstraction, duplication, backward compatibility, and defense-in-depth — "enterprise-grade complexity within two weeks with just two humans and 10 agents." Review agents ("the ouroboros") don't catch it. 1M-token context windows are "a heck," agentic search also fails, so agents patch locally and break globally — and you can't trust your tests because the agent wrote them.
You know what we call a sufficiently detailed spec? It's a program.
The agent patches locally and fucks up globally. If you see this in your codebase, you're fucked.
Prescription: scope tasks tightly so the agent can find everything it needs, modularize, give it an eval function when possible, let it rip on non-critical and boring work — but for anything that matters, hand-write it.
If you do anything important, write it by hand. Friction is the thing that builds the understanding of the system in your head.
Slow the fuck down. Think about what you're building and why. Don't just build because your agent can do it.
Diego Carpentero argues a fine-tuned ModernBERT classifier is a cheap, self-hostable defensive layer against six LLM attack vectors: direct prompt injection, indirect/context injection, gibberish-suffix alignment breaks, RAG poisoning, MCP tool-description exploits, and agentic attacks. He hits ~85% accuracy at 35–40ms per classification on the InjecGuard dataset, running on commodity hardware for under $1 [19]Diego Carpentero at AI Engineer — $1 AI Guardrails with ModernBERT.
These attacks, they are no longer the exception, they are now the baseline.
Dataset: InjecGuard — 75k labeled examples from 20 open sources. Hugging Face Datasets with batched map tokenization, dynamic padding, a classification head on the CLS token (optionally mean pooling for long sequences), BF16 (cut training memory ~40%, enabling batch size 64), and StableAdamW. Switching base → large added nearly 6 accuracy points. Final: ~85% accuracy, 35–40ms latency, self-hosted for under $1. Demo on HuggingFace Space correctly classifies the Sydney prompt, Wikipedia redirect, ad-review override, GCG gibberish suffix, and MCP key-exfiltration as unsafe.
In a knowledge database comprising 8 million documents, poisoning only five chunks was enough to be successful in this attack.
We have to build safety mechanisms that protect machines, humans and society.
Two multi-hour livestream sessions from AI Engineer Miami on Apr 16. Day 2 features talks from Cerebras, OpenCode, Cursor, Arize AI, and more [20]AI Engineer Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!. The keynote track headlines OpenCode, Google DeepMind, and OpenAI [21]AIE Miami Keynote & Talks ft. OpenCode, Google DeepMind, OpenAI, and more!.
Multi-speaker livestream featuring Cerebras (AI inference hardware), OpenCode (open-source coding tools), Cursor (AI code editor), and Arize AI (observability/eval). Transcript was not available on fetch, so specific speaker content and timestamps can't be confirmed beyond the title.
Main keynote-and-talks stream featuring OpenCode, Google DeepMind, and OpenAI. Likely covered high-level announcements, research highlights, and product direction from the largest AI labs present. Transcript also unavailable.
Standalone speaker talks (Mario Zechner, Diego Carpentero) are covered in their own topics above.
Google DeepMind's Gemma 4 (2B–31B, Apache 2.0) hit 10M downloads in its first week; the 31B dense model beats some models 10–20x larger on benchmarks, and the 2B runs on a first-gen Nintendo Switch [22]Two Minute Papers — Why DeepMind's New AI Broke The Internet. A community uncensored fine-tune (SuperGemma-4 26B, Jun Song on Hugging Face) runs at 46.2 tok/s on Apple Silicon via MLX and plugs straight into Hermes agent and Open Claw via an OpenAI-compatible endpoint [23]AICodeKing — SuperGemma-4 (26B) UNCENSORED + Hermes, Open Claw, OpenCode. On the image side, Baidu's Ernie Image (and Turbo variant) dethroned Z Image on benchmarks — Q2K GGUFs run in ~3 GB VRAM [24]AI Search — New BEST local AI image generator is here!.
At ~01:01: 2B runs on a first-gen Nintendo Switch and phones without internet; 31B is a dense (not MoE) model ranking third best open, beating some 10x-larger models and remaining competitive with 20x-larger ones.
Four architectural wins:
Context doubled to 256K. License upgraded from restrictive Gemma License to Apache 2.0. At ~07:08, it's pitched as a drop-in for cloud LLMs inside agentic harnesses like Open Claw — "a frontier model just got locked down for a few select clients… that's all right, just plug in Gemma 4 and you're good to go for free."
A community uncensored fine-tune of Gemma 4 26B A4B (~25B total, ~3.8B active MoE). Native system prompt + function calling, 256K context. MLX 4-bit V2 release on Hugging Face; creator claims Quick Bench overall 95.8 vs 91.4 and 46.2 tok/s vs 42.5 baseline. Setup: pip install mlx-lm, mlx_lm.server with --port 8080, let MLX auto-detect the bundled chat template (forcing a path "can corrupt responses"). GGUF Q4_K_M (~16.8 GB) available for Windows/Linux. Plugs into Hermes agent or Open Claw through any OpenAI-compatible client path.
Anything that can talk to an OpenAI compatible endpoint can basically use it.
Baidu's Ernie Image tops published benchmarks over Z Image, Qwen Image, Flux 2 Klein, and edges close to closed Nano Banana 2 on head-to-head prompt tests. Wins: photorealism without the plasticky Flux-era look, complex multi-element prompts, in-image text, infographic layout, manga/comic panels. Weaknesses: human anatomy, highly abstract spatial instructions. Base vs Turbo: Turbo is 3–5x faster with minimal quality sacrifice. Both ~16 GB; full stack with Mistral 3B text encoder + Flux 2 VAE is ~20 GB. Unsloth GGUF quantizations bring VRAM to as low as ~3 GB (Q2K). ComfyUI has built-in Ernie Image workflow templates; with the ComfyUI-GGUF extension (by city96), an 8 GB GPU can run Q6_1 (6.7 GB). CFG=1, steps=8 for Turbo, under 10 seconds per generation.
Ernie looks way more realistic and natural and imperfect.
Google's Gemini app now generates personalized images from simple prompts by pulling context and reference photos directly from the user's Google Photos library — powered by the Nano Banana 2 image model and a new "Personal Intelligence" layer [25]Google — Personalized images in Gemini with Nano Banana 2. Prompts like "Design my dream house" or "Create a claymation image of me and my family enjoying our favorite activity" pull real context instead of forcing a prompt-engineering step.
Users connect their Google Photos library via the "+" icon. People and pet labeling from existing Photos organization is used to identify individuals. A Sources button shows which images guided the generated output; users can swap reference photos or request style variations (watercolors, charcoal sketches, oil paintings). Google states the Gemini app does not directly train its models on users' private Google Photos libraries.
Rolling out to US Google AI Plus, Pro, and Ultra subscribers.
Ramp built a multi-tenant AI usage ingestion pipeline (LiteLLM/OpenRouter webhooks → Kafka → ClickHouse) after a LiteLLM upgrade silently introduced phantom Gemini tokens that inflated costs until token-level auditing caught it. ClickHouse ReplacingMergeTree ordered by (business_id, source, event_id) gives storage-level dedup, costs are stored as Decimal(20,10) to prevent drift, and the system handles tens of thousands of events per minute per customer
[26]Ramp Builders — Building a Unified Pipeline for AI Token Spend.
AI gateways (LiteLLM, OpenRouter) → idempotent webhook endpoints → Apache Kafka → ClickHouse → REST analytics API → dashboards. Webhook endpoints:
POST /developer/v1/ai-usage/litellm — LiteLLM Standard Logging formatPOST /developer/v1/ai-usage/openrouter — OpenTelemetry OTLP trace with GenAI semantic conventions
ClickHouse uses ReplacingMergeTree ordered by (business_id, source, event_id) with created_at as the version column — replayed events converge to exactly-once without application logic. Decimal(20,10) prevents rounding drift across millions of monthly events. Multi-tenancy: Row-Level Security scoped to business_id. Kafka absorbs traffic spikes without backpressure. Monitoring: Datadog via StatsD.
Without Ramp's internal per-product tagging and token-level visibility this mystery would have remained unsolved, and Ramp would have had to swallow the additional costs.
Structural finding: input tokens run ~10x output tokens in production — prompt efficiency is the primary cost lever. Context: nearly 50% of Ramp customers now pay for at least one AI provider, and average monthly AI token spend across customers grew 13x.
Allbirds ($BIRD) is reinventing itself as NewBird AI, a GPU-as-a-service rental business on long-term contracts. The company sold its sneaker brand to American Exchange Group for $39M in March 2026, then announced a $50M financing deal to buy GPUs in April. The pivot triggered a 600%+ stock surge (~$3 to over $20) despite a $22M market cap — a stark contrast to its $4B IPO-peak valuation in 2021 [27]The Rundown AI — Allbirds ditches sneakers for AI compute.
The Rundown frames it as an opportunistic rebrand riding the compute shortage — comparable to the blockchain-era name changes — not a genuine business thesis.
Nate's thesis: AI models run 10–50x faster than humans, but real-world productivity gains are capped at 2–3x because every tool, API, and file system was designed for human speed, not agent speed. Making models infinitely faster still yields only ~2–3x. The bottleneck isn't inference — it's the entire human-affordance stack wrapped around the model [28]Nate B Jones — Your AI Is 50x Faster. You're Getting 2x..
At ~03:00: every timeout, rate limit, auth flow, pagination scheme, and startup sequence was calibrated to human perception and hand-speed. Jeff Dean made the same point at GTC — if an agent is 50x faster, milliseconds lost to tool startup, context switches, and paginated APIs dominate cost. NVIDIA's Billy Deli said inference is now 90% of data-center power, heading to 10–20k tokens/sec/user — consumed by agents, not humans.
We spent a trillion dollars on these agents. We want them to think collectively. We got them to do it. We made the sand to think. Now we're bottlenecking them on tool calls that were designed for humans.
At ~06:06:
MCP has blinded us to where this needs to go — you can take a human-friendly API and stick an MCP over the top and the agent will make do, but that doesn't mean you don't eat wall clock time.
At ~14:10:
I think it's a promotion to the hardest and most valuable job in computing.
DHH argues the golden era of "learned guild" programmers has already peaked: companies treat dev as a cost center, and if AI cuts dev headcount 10x they'll simply take the savings [29]The Pragmatic Engineer — DHH: "We've seen peak programmer". Nate B Jones revisits Toby Luki's Red Queen memo — "stagnation is slow-motion failure" — as the defining document for 2026 workforce restructuring [30]Nate B Jones — How the Red Queen memo exposed who will actually survive. Real Python counters that AI code is great for one-off tasks but the trade-offs shift the moment you have to maintain the output [31]Real Python — AI Code Is Great Until You Have to Maintain It.
DHH splits software into two camps: unlimited-scope companies (like his own) that absorb productivity gains by building more, and cost-center shops that pocket the savings. Real constraint value shifts to product management — figuring out what to build, who to talk to, where to focus — a role he admits he historically undervalued.
The Red Queen memo (Toby Luki, early 2025) forecast role dissolution, junior talent deprioritization, AI fluency as baseline, and dramatic compensation polarization. Nate says all of it is playing out in 2026.
Stagnation is almost certain… if we do nothing. And stagnation is slow-motion failure.
The volume is at 11, and this is happening faster and faster and faster.
If it's a one-way thing, great, cuz then you didn't have to write all that code and it's fantastic. But, as soon as you start maintaining these things, the questions change.
The argument: success stories people share (translating Bootstrap 3 → 4) tend to be tasks nobody touches again. The moment ongoing maintenance enters, readability and the ability to reason about the code become critical — and AI output often falls short.
Four separate security stories hit on the same day: a WordPress supply-chain attack via 31 Flippa-acquired plugins [32]Fireship — WordPress supply-chain attack via Flippa, a disgruntled researcher dropping two Windows Defender zero-days after a bug-bounty dispute [33]Low Level — Windows Defender Blue Hammer / Red Sun zero-days, Google DeepMind's "unhackable" SynthID watermark reverse-engineered via a phase-shift attack [34]Better Stack — SynthID watermark cracked, and a researcher showing how malware can still steal data from Windows Recall — with Microsoft declining to call it a vulnerability [35]Tech Brew — Is it time to recall Windows 11?.
At ~01:01: instead of exploiting a vulnerability, the attacker purchased a portfolio of 31 WordPress plugins on Flippa for a mid-six-figure sum, embedded dormant backdoors ~8 months ago, then activated them. Payloads pulled remote code and modified wp-config.php (containing DB creds + security keys). The C2 domain was resolved through an Ethereum smart contract for instant swapability. 96% of WordPress vulnerabilities originate in its plugin system — PHP scripts with full server privileges and no sandboxing.
The attacker didn't exploit a vulnerability. Instead, they legitimately acquired and took control of a portfolio of plugins by simply purchasing them for money from the original developer on Flippa.
At ~03:04: Cloudflare shipped Mdash, an MIT-licensed WordPress-compatible alternative built on Astro. Each plugin runs in its own sandboxed Cloudflare Worker with capability-based permissions declared in a manifest — directly addressing the full-privilege plugin problem.
A researcher calling themselves "Nightmare Eclipse" released working PoC code for two Windows Defender zero-days — Blue Hammer and Red Sun — after claiming MSRC violated a bug bounty agreement and left them homeless. Both abuse TOCTOU race conditions. Blue Hammer blocks Defender's cloud-file VDM signature update with a fake stub, replaces the VDM file with a symlink pointing to the SAM hive, then lets Defender (running as SYSTEM) snapshot the symlink into a VSS file — the attacker extracts SAM from the snapshot and pass-the-hashes to admin. Red Sun exploits Defender's behavior of rewriting cloud-tagged malicious files before quarantining; target swap + content swap causes Defender to write arbitrary code into System32 and install it as a service. Rust wouldn't have fixed either — these are logic/concurrency issues, not memory safety.
I was not bluffing Microsoft and I'll do it again.
Developer Alouch Denny released "reverse synth ID." By analyzing blank Gemini outputs (Gemini White and Black), they isolated the exact Fourier-transform coordinates where SynthID's spread-spectrum signal lives — and discovered the signal is unequal across channels (green 1.0, red 0.85, blue 0.7) and the phase template is near-identical across all images. A "phase-shift attack" targets specific frequency bins and shifts the watermark's phase just enough to destroy coherence — dropping Google's detector confidence 90%+ while preserving 43 dB PSNR (image looks perfect).
The moment you can see the signal in the math, you can basically delete it.
Windows Recall, which launched publicly in April 2025 after delays, periodically screenshots user activity and makes it AI-searchable. A researcher showed malware can trigger a legitimate Windows security prompt, wait for the user to authenticate, then intercept the vault's contents as they transfer to an unprotected display process. Microsoft's stance: inter-process communication is intentional system behavior, not a vulnerability. Security experts dispute that, noting the attack vector differs from typical short-lived credential exchanges. Windows 11 has faced sustained backlash since 2021 over forced Copilot integrations, Start menu ads, and undisableable AI features — earning the "Microslop" nickname.
Lighter dev-tooling and research grab-bag: Real Python's ChromaDB vector-math primer (magnitude, dot product, cosine similarity with spaCy embeddings) [36]Real Python — Vector Databases and Embeddings With ChromaDB, Better Stack's three Docker-build speedups that took a 10-minute build to under 3 minutes [37]Better Stack — Your Docker Builds Are Slow (and it's your fault), and Data Science Weekly #647's roundup of geospatial CLIP, Sebastian Raschka on coding-agent components, and Nathan Benaich's April 2026 State of AI [38]Data Science Weekly — Issue 647.
At ~00:00: vectors as ordered numerical arrays; three operations (Euclidean norm, dot product, cosine similarity) demonstrated in pure Python then NumPy (np.linalg.norm, np.dot, @). At ~12:08: spaCy's en_core_web_lg (300K+ embeddings, 300 dims); practical comparisons reveal "cat" vs "dog" = 0.80, "tasty" vs "delicious" = 0.92, "cat" vs "spaceship" = 0.13, "delicious" vs "spaceship" = 0.04.
Cosine similarity is the normalized dot product of two vectors. It isn't influenced by their scale, only their direction.
Three practical fixes: (1) copy package files and install deps before copying source, to preserve the dep-install layer cache; (2) add a .dockerignore (author cut build context from 500 MB to 20 MB); (3) use BuildKit cache mounts (--mount=type=cache) — author's install step dropped from 3 min to 8 seconds.
Put this all together and your builds can drop from like 10 minutes or so to under 3 minutes. Same code, no new tools, just fixing what most people overlook.
Links worth pulling: