Opus 4.7 ships; Codex eats the desktop

April 16, 2026

38 sources · 20 channels · blogs, newsletters, YouTube

AI Models Developer Tools
Anthropic Developers Digest Better Stack

Claude Opus 4.7 lands: $5/$25 pricing, 3x vision, xhigh effort

Anthropic released claude-opus-4-7 at the same $5/M input, $25/M output pricing as 4.6, but with a new tokenizer that encodes the same prompt into up to 1.35x more tokens [1]Anthropic — Introducing Claude Opus 4.7. Vision leaps: it accepts images up to 2,576px on the long edge (~3.75 megapixels, 3x prior Claude), scores 98.5% on XBOW visual acuity vs 54.5% for 4.6, and Rakuten reports it resolves 3x more production SWE tasks [1]Anthropic — Introducing Claude Opus 4.7. A new xhigh effort level sits between high and max, and Claude Code's default effort was silently upgraded to xhigh, meaning identical workflows now burn roughly as many tokens as Opus 4.6's old max [3]Better Stack — Opus 4.7 Is GREAT (except the token usage).

Read more

Benchmarks vs 4.6

What's new in Claude Code

A new /ultrareview slash command triggers dedicated review sessions — Pro and Max users each get three free ultrareviews [1]Anthropic — Introducing Claude Opus 4.7. The default effort level jumps to xhigh across all plans, and auto mode is extended to Max subscribers as a safer alternative to bypass-permissions [2]Developers Digest — Claude Opus 4.7 in 5 Minutes. On the API side, task budgets are now in public beta and file-system-based memory is improved for long-running multi-session agents [1]Anthropic — Introducing Claude Opus 4.7.

Better Stack's hands-on review

Better Stack tested 4.7 head-to-head with 4.6, GPT-5.4, and Gemini 3.1 on a personal-finance dashboard (~00:00). Opus 4.7 won with a clean dark-mode React/TypeScript/Vite app and a working in-memory Express backend — but skipped a persistent DB (Opus 4.6 had used SQLite) and oddly picked React 18 + React Router 6 despite a newer cutoff[3]Better Stack — Opus 4.7 Review.

The cost warning (~02:30): the new tokenizer can use up to 35% more tokens on the same input, and 4.7 thinks more at higher effort levels. The presenter notes 4.7's high actually outscores 4.6's max while using fewer tokens — so dropping from xhigh to high in Claude Code settings may preserve quality while cutting costs.

The same input prompt could now use up to 35% more tokens, and it also thinks more. So that's even more tokens to burn.

Caveats

Tools: Claude Code, Claude API, Claude desktop app, Opus 4.6/4.7, GPT-5.4, Gemini 3.1, React, Vite, Express, Augment Code, SWE-bench Pro, XBOW, CursorBench, GDPval-AA.
AI Tools Hot Take
Theo - t3.gg The AI Daily Brief

Claude Code desktop app: the Cursor killer (or is it?)

Anthropic bundled Claude Code into a redesigned desktop app next to Claude chat and Claude Co-work — positioned as an "agent orchestration command center" with multi-session sidebar, integrated terminal, and drag-and-drop workspaces [5]The AI Daily Brief — Vibe Coding Gets an Upgrade. Theo (t3.gg) spent an hour with it and found 40+ bugs — broken copy-paste, hotkeys targeting the wrong window, voice input bleeding into every textbox — and built his own open-source competitor T3 Code with Julius in one-fifth the time [4]Theo - t3.gg — Claude's new Cursor killer just dropped. Alongside the app, Anthropic introduced Claude Code routines: event-triggered agents (GitHub events, API calls) running on Anthropic-managed cloud infrastructure — "dynamic cron jobs" [5]The AI Daily Brief — Vibe Coding Gets an Upgrade.

Read more

What shipped (the AI Daily Brief view)

Per AI Daily Brief (~02:01), the app is explicitly built for how agentic coding feels now: many sessions in flight, humans in the orchestrator seat. Features include a sidebar of active/recent sessions filterable by status/project/environment, an integrated terminal and file editor, and drag-and-drop workspace layout. Cursor 3 shipped an identical UI earlier this month and Codex is signaling the same — the three stacks are converging.

Cursor, Codex, and Claude Code desktop app look exactly the same now.

Also introduced (~05:03): Claude Code routines. A routine packages a prompt, repos, and MCP connectors, then runs on Anthropic's cloud infrastructure — keeping the agent working when your laptop is closed. Anthropic uses them internally for docs maintenance and backlog grooming. Enterprise vibe-coding hardening is the companion thread: Lovable added native payments ("you're not vibe coding PCI level one"), Superblocks 2.0 pitches vibe coding as an "enterprise attack vector," and Microsoft is testing Open Claw inside Copilot (~07:03).

Theo's roast

Theo opens with begrudging approval — the new app uses less memory than the CLI ("a trash piece of software") and the Claude virtual machine service burns ~2.5 GB RAM but doesn't wreck the system [4]Theo - t3.gg — Claude's new Cursor killer. Then he hits an avalanche of UX failures inside minutes (~04:00):

  • No copy buttons anywhere; copied text has bad newlines/word-wrap
  • Pasted screenshots attach to the previous message ("they don't even know how to use their SDK")
  • Bypass-permissions mode doesn't persist; resizing the window breaks layout
  • Hotkeys (including command-1/2/3) always target the first window regardless of focus
  • Voice button's Stop button stops the entire thread instead of the transcription
  • Work trees are stored inside the project folder by default, forcing .gitignore edits
AI is really good at building the happy path… but as soon as you hit an edge case… models do not find edge cases. Users find edge cases.

T3 Code and the Codex contrast

Theo and Julius shipped T3 Code, a free open-source desktop app, in roughly 1/5 the time (~06:40). Features he shows off: favicon auto-grab, multi-tab with scroll built in two prompts, diff view via diffs.com, snapshot-based before/after, properly bound hotkeys, and handling of multi-million-token threads (Julius scrolled 10 minutes to reach the top of one). The Codex CLI is Apache-2.0 and ships an app server any harness can plug into; only the Codex desktop is closed, which annoyed Theo enough to build T3 Code in the first place.

If the Codex app was open source like the CLI is, I probably wouldn't have made T3 Code.

He closes on strategic intent (~23:40): lab-built coding apps exist to lock users in and showcase models. On both fronts, Theo says Claude Code fails. He calls out Claude still demanding CLAUDE.md / .claude/ while Cursor and friends adopt the AGENTS.md / agents/ standard, and notes Anthropic's terms of service are "shitty" specifically to block switching.

The level of slop that is being shipped by Anthropic is unfathomable and y'all just grin and bear it.
Tools: Claude Code desktop app, Claude Code CLI, Claude Co-work, Claude routines, Claude skills/connectors, T3 Code, Codex CLI/app, Cursor 3, Lovable, Superblocks 2.0, Microsoft Copilot + Open Claw, AGENTS.md, Google AI Studio, Stitch, Chrome Skills.
AI Tools Developer Tools
OpenAI News OpenAI (YouTube)

OpenAI Codex eats the desktop: background computer use, 90+ plugins

OpenAI radically expanded Codex from a coding assistant into a general-purpose autonomous agent. A new "background computer use" mode lets Codex drive its own cursor across any Mac app in parallel with the user, spin up multiple simultaneous agents, and schedule future tasks via "Heartbeat Automations" [6]OpenAI — Codex for (almost) everything. Codex now supports 90+ plugins (Atlassian Rovo, CircleCI, GitLab Issues, Microsoft Suite, Neon by Databricks, Superpowers), ships image generation via gpt-image-1.5, and has cross-session memory [7]OpenAI — Codex for (almost) everything (YouTube). Usage is 6x since January, now over 2M weekly users [7]OpenAI — Codex for (almost) everything (YouTube).

Read more

What "background computer use" actually means

Codex now operates a dedicated desktop cursor that opens apps, clicks, types, and runs tasks without blocking or interfering with the user's current work [6]OpenAI — Codex for (almost) everything. An in-app browser lets users annotate webpages to give Codex context and instructions, and the browser is initially scoped to frontend/game dev tasks[7]OpenAI — Codex for (almost) everything (YouTube). Heartbeat Automations let Codex schedule future tasks and resume long-horizon work — the same "dynamic cron" pattern Anthropic just shipped with routines.

Model, plugins, pricing

  • GPT-5.3-Codex: 25% faster than its predecessor, with real-time mid-task steering[6]OpenAI — Codex for (almost) everything. SWE-bench Verified ~80% (vs Claude Code Opus 4.6 ~80.9% — Codex is now inside striking distance).
  • 90+ plugins including Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon (Databricks), Remotion, Render, Superpowers[7]OpenAI — Codex for (almost) everything (YouTube).
  • Image generation via gpt-image-1.5 for mockups, product concepts, in-game visuals.
  • Memory preview: recalls past sessions and user preferences.
  • Official Codex plugin that runs inside Claude Code.
  • Pricing: ChatGPT Business/Enterprise workspaces can buy Codex-only seats at token-based pay-as-you-go with no rate limits; Business annual drops to $20/seat. Open-source maintainers get 6 months ChatGPT Pro + Codex + API credits[7]OpenAI — Codex for (almost) everything (YouTube).

Where it fits

Cited enterprise pilots: Notion and Ramp for engineering automation [7]OpenAI — Codex for (almost) everything (YouTube). OpenAI's positioning is that Codex is now an agentic operating layer — not a coding chat — with the same "orchestration command center" framing as Anthropic's redesigned Claude Code.

Codex goes from a code-editor-centric tool to covering full-day agentic workflows similar in scope to Anthropic's computer-use offerings.
Tools: Codex (macOS desktop, Web, API), GPT-5.3-Codex, gpt-image-1.5, 90+ plugins (Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, Superpowers), Heartbeat Automations, in-app browser, cross-session memory.
Hot Take
Nate Herk | AI Automation

Nate Herk's hot take: Anthropic secretly throttled Opus 4.6

Nate Herk argues Opus 4.7 isn't as big a leap as it looks — Anthropic silently dropped 4.6's effort default to "medium" and disabled extended thinking on February 9 without announcing it, then released 4.7 as a "fix" [8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?. He cites an AMD senior director's analysis of ~7,000 Claude Code sessions finding thinking depth collapsed 73% and the model skipped reading files before editing them 33.7% of the time, with $200/month plans burned through in hours [8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?.

Read more

The throttling claim

At ~00:00, Nate lays out the conspiracy: users reported hallucinated git commit hashes, fake package names, and premature task abandonment starting in February. The AMD engineer's 7,000-session audit is the core evidence — thinking depth down 73%, file-read-before-edit compliance down to 33.7%. Nate's framing: 4.7's gains partly restore deliberately degraded 4.6 behavior.

What's genuinely new in 4.7

At ~04:02, Nate concedes the structural wins: a new X-high effort tier, vision gains from architecture changes, a more expensive tokenizer (1–1.3x), SWE-bench Pro jumps, 2x+ biomolecular reasoning, and a new /ultra-review slash command. Anthropic also published a 232-page system card for 4.7. In hands-on tests Nate found 4.7 better at financial chart analysis and caught its own math errors in a SaaS model — but rated the 4.6 interactive deliverable more polished.

Day-one app bugs

At ~10:01, Nate piles on the desktop app critique: Theo found 40+ bugs within one hour — broken buttons, voice input bleeding into every textbox, layout issues. He asks how "one of the world's largest AI companies" shipped trivial one-prompt fixes as day-one bugs.

Tools: Claude Code, Claude Code Desktop App, Claude (web), VS Code.
Hot Take AI Models
Simon Willison's Weblog

Simon Willison: local Qwen3.6 beat Opus 4.7 on the pelican benchmark

Simon Willison ran Qwen3.6-35B-A3B locally on a MacBook Pro M5 using a 20.9 GB quantized GGUF (Unsloth's Q4_K_S) through LM Studio — and found it outperformed Claude Opus 4.7 on the SVG pelican benchmark[9]Simon Willison — Qwen3.6 on my laptop drew a better pelican than Opus 4.7. On the pelican-on-bicycle test, Qwen produced correct bicycle geometry, clouds, and a detailed pouch; Opus 4.7 — even with thinking_level: max — generated a frame with entirely wrong shape. Simon notes he doubts the quantized local model beats Opus on general capability — but for SVG illustration tasks specifically, the local model currently wins.

Read more

Setup

  • Model: Qwen3.6-35B-A3B-UD-Q4_K_S via Unsloth (20.9 GB GGUF)
  • Host: MacBook Pro M5 via LM Studio with the llm-lmstudio plugin
  • Comparison: Claude Opus 4.7 with thinking_level: max

Results

On "flamingo on unicycle," Qwen added creative SVG comments like <!-- Sunglasses on flamingo! --> and richer character details, while Opus produced a "competent if slightly dull vector illustration" lacking visual personality. Simon is careful to bound the claim: local Qwen is not beating Opus at general tasks — just at this specific aesthetic SVG task.

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7.
Tools: Qwen3.6-35B-A3B, Unsloth GGUF, LM Studio, llm-lmstudio plugin, Claude Opus 4.7.
AI Tools Developer Tools
Better Stack

Better Stack: Ultraplan vs Superpowers planning shootout

Better Stack pitted Anthropic's cloud-based /ultraplan against the local Superpowers skill on a real release-pipeline feature. Superpowers asked double the clarifying questions (6 vs 3) and produced a plan 4x longer (833 vs 195 lines) with test-driven task breakdowns — but Ultraplan wins for async workflows: kick off on a laptop, continue on a phone [10]Better Stack — Claude Ultraplan vs Superpowers.

Read more

Ultraplan (cloud)

The /ultraplan slash command spins up a cloud container, clones the GitHub repo, and produces a plan in ~2-3 minutes — suggesting sub-agent parallelism [10]Better Stack — Ultraplan vs Superpowers. It consumed 33% of the creator's monthly Pro token quota for one plan and initially cloned the wrong repo (context issue; corrected after a comment).

Superpowers (local)

Superpowers runs entirely locally, asks 6 clarifying questions (vs 3), and has two phases: a design plan and an implementation plan. Output was 833 lines with test cases and test-first task execution, and used ~75k tokens with prompt caching. The creator's preference for 90% of local work, because skills, MCP tools, and the whole dev loop live there.

When Ultraplan wins

At ~06:09: device-agnostic async — start a feature on a laptop, close it, continue on a phone or tablet. Requires repo on GitHub and a Pro or Max sub. The creator forgot to install the Claude GitHub app on the demo repo, which would have auto-created PRs.

If I do choose to work locally, which I'm doing 90% of the time, then I'll probably use Superpowers because all my code is there locally, my skills, my MCP tools, everything.
Tools: Claude Code, Ultraplan, Superpowers, Claude Code for Mac, GitHub Actions.
Developer Tools
Simon Willison's Weblog Simon Willison's Weblog

Simon Willison's dev tools: llm-anthropic 0.25 + Datasette artifact

Simon shipped llm-anthropic 0.25, adding support for claude-opus-4.7 (including thinking_effort: xhigh) and two new options: thinking_display (shows extended reasoning output as JSON) and thinking_adaptive. Default max_tokens raised to the per-model maximum; an obsolete beta header removed [11]Simon Willison — llm-anthropic 0.25. Separately, he built a Claude Artifact to edit the datasette.io news YAML with live preview and validation, flagging both markdown syntax and YAML formatting errors before they reach the live site [12]Simon Willison — datasette.io news preview.

Read more

llm-anthropic 0.25

  • New model: claude-opus-4.7, supports thinking_effort: xhigh
  • thinking_display: boolean for displaying extended reasoning (JSON output only)
  • thinking_adaptive: boolean for adaptive thinking behavior
  • Default max_tokens raised to per-model maximum
  • Obsolete beta header removed for older models

Datasette news preview artifact

The datasette.io site maintains its news section as a YAML file in a GitHub repo, which Simon found error-prone to edit directly. He used Claude Artifacts' ability to analyze GitHub repos in conversation to build a custom preview UI that accepts YAML input and renders how entries will appear on the homepage, plus a validation badge surfacing markdown/YAML errors before commit.

Tools: LLM CLI, llm-anthropic plugin, Claude Artifacts, claude.ai, GitHub, Datasette.
AI Models Industry
OpenAI News

OpenAI launches GPT-Rosalind for life sciences research

OpenAI released GPT-Rosalind, its first domain-specialized life sciences model — targeting biochemistry, genomics, and drug discovery workflows. Named after Rosalind Franklin, it ranked above the 95th percentile of human experts on prediction tasks, hit the 84th percentile for sequence generation, and posted the highest published score on the BixBench bioinformatics benchmark [13]OpenAI — Introducing GPT-Rosalind for life sciences research. Launch partners include Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific, via a limited "trusted access program."

Read more

Capabilities

GPT-Rosalind supports evidence synthesis, hypothesis generation, experimental planning, and multi-step research. It can query specialized databases, parse literature, call computational tools, and propose experimental pathways. A companion Life Sciences Codex plugin connects the model to 50+ scientific tools and data sources.

Rollout

Initially a research preview for qualified enterprise customers — not general availability. Positions OpenAI directly against Google's life-sciences push; no public pricing was announced.

OpenAI framed the model as a tool to help scientists move faster through analytically demanding work, not as a replacement for researchers.
Tools: GPT-Rosalind, Codex, BixBench, Life Sciences Research Plugin.
Podcast AI Models
OpenAI

OpenAI Podcast Episode 16: Building AI for Life Sciences

OpenAI's Joy Jiao (research lead) and Yun Wang (product lead) walk through the new biochemistry-focused model series that anchors GPT-Rosalind, a research plugin with 50+ templatized skills, the Ginkgo Bioworks collaboration where GPT-5 designed a biology experiment that produced a non-zero amount of protein on its first try, biosecurity safeguards, and a 10-year vision of autonomous research institutes [14]OpenAI Podcast — Ep 16: Building AI for Life Sciences.

Read more

Life sciences model series & research plugin

At ~01:00, Joy and Yun describe a biochemistry-focused model series anchored on complex research workflows. Starts with genomics and protein understanding, focused on early discovery where more thinking time can break research bottlenecks. Ships across ChatGPT (literature synthesis) and Codex (long-trajectory agentic work). The life sciences research plugin exposes 50+ "templatized repeatable workflows" — cross-evidence match across papers, pathway analysis — as one-click deploys on top of the base models.

One of the taglines was to scale test-time compute to cure all disease. So that is like our team tagline.

Ginkgo Bioworks: GPT-5 designs working biology

At ~05:06: when GPT-5 finished training, the team wasn't sure it could do any biology (training data was mostly math and CS). The Ginkgo collaboration (July 2025) tested whether models could design experiments that actually produce the desired product. First set of designs came back with non-zero protein production — surprising even the team.

The future that me and Joy see is that it's no longer human bottlenecks but rather maybe compute bottlenecks.

Biosecurity and differentiated access

At ~09:10: bio is one of the most severe rising-capability risks. Precursor steps to dangerous pathogens look benign and similar to legitimate research ("help me clone a gene" could be GFP or a toxin). Current stance is risk-averse self-refusal, frustrating professional scientists. Roadmap: differentiated access — verify users at legitimate research institutions or pharmas where reagents are tracked, then unlock fuller capability.

The safest model here would be a model that had no capability — it's very safe, but it's not very good.

Scaling reasoning and the 10-year vision

At ~20:14, Joy frames two scaling axes: parameter growth (GPT-2 → GPT-3 produced emergent properties) and test-time compute on reasoning models. Models can now "think for days or effectively forever" about a problem — reframing data centers as complex reasoning infrastructure (hence Stargate). At ~24:16: she wants to see AI design a new drug or cure a disease within a few years. Nearer-term wins: drug repurposing via mechanistic understanding and personalized medicine via ASO/RNA design.

You have these autonomous labs — mostly robots, all hooked up to AI — autonomous research institutes that are constantly running and curing human disease.
Tools: GPT-Rosalind, GPT-5, Codex, Ginkgo Bioworks robotic lab, Life Sciences Research Plugin, single-cell RNA-seq (virtual cell), antibody binding evals, Stargate.
Industry
OpenAI News

OpenAI ships GPT-5.4-Cyber and scales Trusted Access for Cyber

OpenAI is scaling Trusted Access for Cyber (TAC) — originally launched February 2026 — to thousands of verified individual defenders and hundreds of teams. The centerpiece is GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 with lower refusal boundaries for legitimate defensive work, supporting binary reverse engineering for malware analysis and vulnerability detection [15]OpenAI — Accelerating the cyber defense ecosystem. OpenAI's Codex Security agent has contributed to 3,000+ critical and high-severity vulnerability fixes, and Codex for Open Source has scanned 1,000+ projects.

Read more

GPT-5.4-Cyber and TAC

  • Fine-tuned for defensive security with relaxed refusals for legitimate use cases
  • Binary reverse engineering for malware analysis and vulnerability detection
  • Safeguards against jailbreaks and adversarial prompt injections
  • Rolled out iteratively to vetted security vendors, orgs, researchers
  • Individual access at chatgpt.com/cyber; enterprise direct with OpenAI

Codex Security at ecosystem scale

  • 3,000+ critical and high-severity vulnerabilities fixed to date
  • Codex for Open Source: free scans for 1,000+ open-source projects
  • $10M in API credits committed to TAC participants
  • $1M Cybersecurity Grant Program for AI-defense research
  • Three-pillar framework: democratized access, iterative deployment, ecosystem resilience
Tools: GPT-5.4-Cyber, Codex Security, Codex for Open Source, ChatGPT (chatgpt.com/cyber), Cybersecurity Grant Program.
Podcast AI Future
Y Combinator

YC Interviews Quan Vuong: The GPT Moment for Robotics Is Here

Physical Intelligence co-founder Quan Vuong lays out PI's thesis: one cross-embodiment foundation model trained on ~10 robot platforms beats specialists by 50%[16]YC — The GPT Moment for Robotics Is Here. Nearly all of PI's complex demos — coffee-making, laundry folding, mobile nav — run inference in a real remote data center via "real-time chunking." And he gives a full playbook for the "Cambrian explosion" of vertical robotics startups.

Read more

Cross-embodiment as the foundation

At ~05:04, Vuong walks through the Open Cross-Embodiment (RT-X) result: absorb data from ~10 heterogeneous platforms into a single high-capacity model, and the generalist beats every specialist by 50%. The model learns "how to control any robot," not one specific platform — which sidesteps the drift problem where single-platform data goes stale every 3 months.

There is this joke in robotic grad school that if you want to add two years to your PhD, just work on a new robot platform.

Cloud inference for real-time robot control

At ~23:13, the hot take: almost all of PI's robot evaluations — including the really complex demos — run inference on a model hosted in a real remote data center, not on-device. "Real-time chunking" queries the next chunk 50ms before the current ~100ms queue ends so chunks stitch together smoothly. Vuong says he has never physically seen the Weave or Ultra robots and doesn't know how their data is collected — an intentional decoupling.

Almost all of the robot evaluation that we run at PI today, including the really complicated demo… the model is actually hosted in the cloud. This is not a server in the office. It's a real cloud.

Emergent zero-shot tasks and real deployments

At ~13:08: tasks that last year needed hundreds of hours of data collection are now doable zero-shot. Weave (YC, ex-Apple founders) folds diverse laundry in a real laundromat. Ultra (YC) runs in a real e-commerce warehouse packing real Amazon orders into narrow soft pouches — 100 minutes at 4x speed, minimal intervention, with lighting changes historically brutal for robotics vision.

It still blows my mind to see a robot actually folding laundry because until ChatGPT I didn't know if this would exist even in my entire lifetime.

The Cambrian-explosion playbook

At ~29:18, Vuong names the recipe for vertical robotics startups: (1) understand the existing workflow; (2) use cheap hardware — reactive models compensate for imprecise motion; (3) build data-collection + eval infra; (4) deploy mixed-autonomy with humans correcting failures; (5) reach break-even per robot before scaling.

The equation for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.

Automated research scientist + ML ops

At ~43:37: Vuong's dream side project is an automated robotics research scientist. PI already runs a Claude-based "pre-training on-call" agent babysitting training runs — delivering ~50% improvement in compute utilization. "An embarrassingly large amount of money on API queries."

Tools: Pi-0 / Pi-0.5 (open-sourced), Weave, Ultra, RT-X / Open Cross-Embodiment, PaLM-E, SayCan, real-time chunking, cloud inference API, Claude (pre-training on-call), Claude skills, Open Claw, Obsidian + markdown for agent orchestration, MCP.
Podcast Hot Take
Lenny's Podcast

Lenny's Podcast: Hire barrels, not ammunition

The episode's framing: most post-fundraise hiring surges fail because companies add ammunition (people who need direction) without adding barrels (people who can independently drive an initiative from idea to done). Hiring more without expanding barrels just stacks people behind the same bottlenecks [17]Lenny's Podcast — Hire barrels, not ammunition.

Read more

The definition

Can they take an idea and make it happen? One way or the other, they're going to get your company across that hill. That's a barrel.

The failure mode: a CEO hires aggressively, burn rate spikes, output per unit of time barely moves. The real bottleneck is the small count of people who can take an initiative from inception to success. More bodies = more collaboration tax, same barrels.

The number of people that can independently drive an initiative from inception to success is very limited within any company.
If you hire more people without expanding the number of what I call barrels… all you're doing is stacking people behind the same initiatives.
Developer Tools Hot Take
AI Engineer

Mario Zechner at AI Engineer: Building Pi in a World of Slop

Mario Zechner's AIE Miami talk is a three-act tragedy: why he left Claude Code, why he built Pi (a minimal self-modifying coding agent), and why the wider OSS world is drowning in agent-generated slop [18]Mario Zechner at AI Engineer — Building Pi in a World of Slop. Punchline: "Slow the fuck down. Learn to say no. Fewer features, but the ones that matter."

Read more

Act 1 — why he left Claude Code

The harness kept mutating his context behind his back (changing system prompts, injecting "may or may not be relevant" system reminders), offered zero observability, zero model choice, and only shallow hooks spawning a new process per invocation. Alternatives had their own sins — OpenCode prunes tool output past a token threshold ("lobotomizes the model"), injects LSP errors into edit tool results, stores every message as an individual JSON file, and defaults CORS so any website can hit the local server.

My context wasn't my context. Cloud code is the thing that controls my context.

Terminal Bench's December 2025 leaderboard shows the minimalist Terminal harness — which only sends keystrokes to a tmux session — beats native harnesses across model families.

We are in the fuck around and find out phase of coding agents, and their current form is not their final form.

Act 2 — Pi

Four packages (AI abstraction, agent core = while-loop + tool calling, bespoke TUI framework, the coding agent itself). Four tools (read, write, edit, bash). A tiny system prompt that hands the agent Pi's own handcrafted docs plus example extensions. Extensions are hot-reloaded TypeScript modules shipped via npm. YOLO-by-default on security because he thinks per-call approval dialogs aren't real security. User-built extensions include a 5-minute clone of Anthropic's /bashing, Nico's chatroom of agents talking to each other, and Pi playing NES / Doom. Pi placed sixth on Terminal Bench in October. Pi became the agentic core inside OpenCode; his issue tracker filled with LLM-generated garbage, so he built an auto-close PR filter asking humans to write issues "in your human voice, no longer than a screen of text" — clankers don't read comments. Mitchell turned it into vouch.

You don't need 10,000 tokens to tell them they're a coding agent. They know — because they are coding agents.

Act 3 — the polemic

Agents are "compounding boo-boos with serial learning, no bottlenecks, and delayed pain." 90% of training data is "our old garbage," so every local agent decision pulls in over-abstraction, duplication, backward compatibility, and defense-in-depth — "enterprise-grade complexity within two weeks with just two humans and 10 agents." Review agents ("the ouroboros") don't catch it. 1M-token context windows are "a heck," agentic search also fails, so agents patch locally and break globally — and you can't trust your tests because the agent wrote them.

You know what we call a sufficiently detailed spec? It's a program.
The agent patches locally and fucks up globally. If you see this in your codebase, you're fucked.

Prescription: scope tasks tightly so the agent can find everything it needs, modularize, give it an eval function when possible, let it rip on non-critical and boring work — but for anything that matters, hand-write it.

If you do anything important, write it by hand. Friction is the thing that builds the understanding of the system in your head.
Slow the fuck down. Think about what you're building and why. Don't just build because your agent can do it.
Tools: Pi (coding agent harness), Claude Code, OpenCode, AMP, Factory Droid, Terminal Bench, tmux, npm, TypeScript, LSP, vouch.
AI Tools Industry
AI Engineer

Diego Carpentero at AI Engineer: $1 AI Guardrails with ModernBERT

Diego Carpentero argues a fine-tuned ModernBERT classifier is a cheap, self-hostable defensive layer against six LLM attack vectors: direct prompt injection, indirect/context injection, gibberish-suffix alignment breaks, RAG poisoning, MCP tool-description exploits, and agentic attacks. He hits ~85% accuracy at 35–40ms per classification on the InjecGuard dataset, running on commodity hardware for under $1 [19]Diego Carpentero at AI Engineer — $1 AI Guardrails with ModernBERT.

Read more

Six attack surfaces he surveys

  1. Direct prompt injection — Sydney/Bing case; a Stanford student exfiltrating 40+ confidential rules via "ignore previous instructions"
  2. Indirect/context injection — Einstein Wikipedia edit redirecting LLMs to malware; March 2026 sites embedding prompts to overrule AI ad-review
  3. LLM internals attacks — gibberish suffix tokens from Greedy Coordinate Gradient search on open weights, transferring to black-box models
  4. RAG poisoning — PoisonedRAG paper: 5 malicious chunks in an 8M-document DB are enough
  5. MCP tool-description asymmetry — users see a one-liner, the LLM sees full hidden instructions exfiltrating keys/creds
  6. Agentic attacks — Subby AI clicking "support" links; February 2026 NPM supply-chain attack via GitHub issue titles affecting ~4-5k developers
These attacks, they are no longer the exception, they are now the baseline.

Why ModernBERT

  • Alternating attention — two local 128-token sliding-window layers (64/side) then one global 8192-token layer; ~70% memory reduction with Flash Attention
  • Unpadding + sequence packing — original BERT's Wikipedia training wasted up to 50% of compute on padding
  • Deep-and-narrow — base (~150M params, 22 layers / 768 hidden) or large (28 layers / 1024 hidden)
  • Rotary Positional Encoding (RoPE) with different rotation scales for local vs global attention
  • Flash Attention exploiting GPU on-chip (30+ TB/s) vs off-chip memory hierarchy

Training stack and results

Dataset: InjecGuard — 75k labeled examples from 20 open sources. Hugging Face Datasets with batched map tokenization, dynamic padding, a classification head on the CLS token (optionally mean pooling for long sequences), BF16 (cut training memory ~40%, enabling batch size 64), and StableAdamW. Switching base → large added nearly 6 accuracy points. Final: ~85% accuracy, 35–40ms latency, self-hosted for under $1. Demo on HuggingFace Space correctly classifies the Sydney prompt, Wikipedia redirect, ad-review override, GCG gibberish suffix, and MCP key-exfiltration as unsafe.

In a knowledge database comprising 8 million documents, poisoning only five chunks was enough to be successful in this attack.
We have to build safety mechanisms that protect machines, humans and society.
Tools: ModernBERT (base / large), InjecGuard dataset, Hugging Face Datasets, Flash Attention, RoPE, BF16, StableAdamW, Hugging Face Spaces, PoisonedRAG (paper), GCG attack.
Industry
AI Engineer AI Engineer

AI Engineer Miami: Livestreams from Day 2 & Keynote

Two multi-hour livestream sessions from AI Engineer Miami on Apr 16. Day 2 features talks from Cerebras, OpenCode, Cursor, Arize AI, and more [20]AI Engineer Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!. The keynote track headlines OpenCode, Google DeepMind, and OpenAI [21]AIE Miami Keynote & Talks ft. OpenCode, Google DeepMind, OpenAI, and more!.

Read more

Day 2

Multi-speaker livestream featuring Cerebras (AI inference hardware), OpenCode (open-source coding tools), Cursor (AI code editor), and Arize AI (observability/eval). Transcript was not available on fetch, so specific speaker content and timestamps can't be confirmed beyond the title.

Keynote track

Main keynote-and-talks stream featuring OpenCode, Google DeepMind, and OpenAI. Likely covered high-level announcements, research highlights, and product direction from the largest AI labs present. Transcript also unavailable.

Standalone speaker talks (Mario Zechner, Diego Carpentero) are covered in their own topics above.

Tools: OpenCode, Google DeepMind, OpenAI, Cerebras, Cursor, Arize AI.
AI Models AI Future
Two Minute Papers AICodeKing AI Search

Open-weight surge: Gemma 4, SuperGemma-4, and Ernie Image dethrones Z Image

Google DeepMind's Gemma 4 (2B–31B, Apache 2.0) hit 10M downloads in its first week; the 31B dense model beats some models 10–20x larger on benchmarks, and the 2B runs on a first-gen Nintendo Switch [22]Two Minute Papers — Why DeepMind's New AI Broke The Internet. A community uncensored fine-tune (SuperGemma-4 26B, Jun Song on Hugging Face) runs at 46.2 tok/s on Apple Silicon via MLX and plugs straight into Hermes agent and Open Claw via an OpenAI-compatible endpoint [23]AICodeKing — SuperGemma-4 (26B) UNCENSORED + Hermes, Open Claw, OpenCode. On the image side, Baidu's Ernie Image (and Turbo variant) dethroned Z Image on benchmarks — Q2K GGUFs run in ~3 GB VRAM [24]AI Search — New BEST local AI image generator is here!.

Read more

Gemma 4 (Two Minute Papers)

At ~01:01: 2B runs on a first-gen Nintendo Switch and phones without internet; 31B is a dense (not MoE) model ranking third best open, beating some 10x-larger models and remaining competitive with 20x-larger ones.

Four architectural wins:

  1. Highly curated training data with strict quality filters
  2. Hybrid attention: local sliding window + global attention
  3. Native-resolution image understanding (Gemma 3 squished images into a square; Gemma 4 processes as-is)
  4. Shared KV cache — layers reuse memory from earlier layers instead of recomputing

Context doubled to 256K. License upgraded from restrictive Gemma License to Apache 2.0. At ~07:08, it's pitched as a drop-in for cloud LLMs inside agentic harnesses like Open Claw — "a frontier model just got locked down for a few select clients… that's all right, just plug in Gemma 4 and you're good to go for free."

SuperGemma-4 26B (AICodeKing)

A community uncensored fine-tune of Gemma 4 26B A4B (~25B total, ~3.8B active MoE). Native system prompt + function calling, 256K context. MLX 4-bit V2 release on Hugging Face; creator claims Quick Bench overall 95.8 vs 91.4 and 46.2 tok/s vs 42.5 baseline. Setup: pip install mlx-lm, mlx_lm.server with --port 8080, let MLX auto-detect the bundled chat template (forcing a path "can corrupt responses"). GGUF Q4_K_M (~16.8 GB) available for Windows/Linux. Plugs into Hermes agent or Open Claw through any OpenAI-compatible client path.

Anything that can talk to an OpenAI compatible endpoint can basically use it.

Ernie Image (AI Search)

Baidu's Ernie Image tops published benchmarks over Z Image, Qwen Image, Flux 2 Klein, and edges close to closed Nano Banana 2 on head-to-head prompt tests. Wins: photorealism without the plasticky Flux-era look, complex multi-element prompts, in-image text, infographic layout, manga/comic panels. Weaknesses: human anatomy, highly abstract spatial instructions. Base vs Turbo: Turbo is 3–5x faster with minimal quality sacrifice. Both ~16 GB; full stack with Mistral 3B text encoder + Flux 2 VAE is ~20 GB. Unsloth GGUF quantizations bring VRAM to as low as ~3 GB (Q2K). ComfyUI has built-in Ernie Image workflow templates; with the ComfyUI-GGUF extension (by city96), an 8 GB GPU can run Q6_1 (6.7 GB). CFG=1, steps=8 for Turbo, under 10 seconds per generation.

Ernie looks way more realistic and natural and imperfect.
Tools: Gemma 4 (2B–31B, Apache 2.0), SuperGemma-4 26B Uncensored MLX 4-bit V2 (Jun Song), Hermes agent, Open Claw, MLX-LM server, llama.cpp, LM Studio, Jan, Open Web UI, Ernie Image / Turbo, Unsloth GGUF, ComfyUI, ComfyUI-GGUF (city96), Mistral 3B text encoder, Flux 2 VAE, Nvidia Neotron 3 Super.
AI Tools
Google — Gemini App blog

Gemini app gets "Personal Intelligence" and Nano Banana 2

Google's Gemini app now generates personalized images from simple prompts by pulling context and reference photos directly from the user's Google Photos library — powered by the Nano Banana 2 image model and a new "Personal Intelligence" layer [25]Google — Personalized images in Gemini with Nano Banana 2. Prompts like "Design my dream house" or "Create a claymation image of me and my family enjoying our favorite activity" pull real context instead of forcing a prompt-engineering step.

Read more

How it works

Users connect their Google Photos library via the "+" icon. People and pet labeling from existing Photos organization is used to identify individuals. A Sources button shows which images guided the generated output; users can swap reference photos or request style variations (watercolors, charcoal sketches, oil paintings). Google states the Gemini app does not directly train its models on users' private Google Photos libraries.

Rollout

Rolling out to US Google AI Plus, Pro, and Ultra subscribers.

Tools: Gemini app, Nano Banana 2, Personal Intelligence, Google Photos.
Developer Tools Industry
Ramp Builders

Ramp's unified pipeline for AI token spend: Kafka → ClickHouse

Ramp built a multi-tenant AI usage ingestion pipeline (LiteLLM/OpenRouter webhooks → Kafka → ClickHouse) after a LiteLLM upgrade silently introduced phantom Gemini tokens that inflated costs until token-level auditing caught it. ClickHouse ReplacingMergeTree ordered by (business_id, source, event_id) gives storage-level dedup, costs are stored as Decimal(20,10) to prevent drift, and the system handles tens of thousands of events per minute per customer [26]Ramp Builders — Building a Unified Pipeline for AI Token Spend.

Read more

Architecture

AI gateways (LiteLLM, OpenRouter) → idempotent webhook endpoints → Apache Kafka → ClickHouse → REST analytics API → dashboards. Webhook endpoints:

  • POST /developer/v1/ai-usage/litellm — LiteLLM Standard Logging format
  • POST /developer/v1/ai-usage/openrouter — OpenTelemetry OTLP trace with GenAI semantic conventions

ClickHouse uses ReplacingMergeTree ordered by (business_id, source, event_id) with created_at as the version column — replayed events converge to exactly-once without application logic. Decimal(20,10) prevents rounding drift across millions of monthly events. Multi-tenancy: Row-Level Security scoped to business_id. Kafka absorbs traffic spikes without backpressure. Monitoring: Datadog via StatsD.

Without Ramp's internal per-product tagging and token-level visibility this mystery would have remained unsolved, and Ramp would have had to swallow the additional costs.

Common AI cost leaks they found

  • Phantom reasoning tokens from package upgrades
  • Oversized models doing simple classification
  • Missing prompt caching on repeated inputs
  • Geo pricing premiums (Anthropic's 1.1x US-region multiplier)
  • Runaway autonomous agent loops
  • Abandoned dev API keys from completed projects
  • System prompt bloat accumulating to 8K tokens per request

Structural finding: input tokens run ~10x output tokens in production — prompt efficiency is the primary cost lever. Context: nearly 50% of Ramp customers now pay for at least one AI provider, and average monthly AI token spend across customers grew 13x.

Tools: Apache Kafka, ClickHouse (ReplacingMergeTree), LiteLLM, OpenRouter, OpenTelemetry OTLP, Datadog, StatsD, OAuth2, Row-Level Security, LiteLLM virtual keys, OpenRouter Broadcast.
Industry
The Rundown AI

Allbirds becomes "NewBird AI": from sneakers to GPU-as-a-service

Allbirds ($BIRD) is reinventing itself as NewBird AI, a GPU-as-a-service rental business on long-term contracts. The company sold its sneaker brand to American Exchange Group for $39M in March 2026, then announced a $50M financing deal to buy GPUs in April. The pivot triggered a 600%+ stock surge (~$3 to over $20) despite a $22M market cap — a stark contrast to its $4B IPO-peak valuation in 2021 [27]The Rundown AI — Allbirds ditches sneakers for AI compute.

Read more

The deal

  • Brand asset sale: $39M to American Exchange Group (March 2026)
  • GPU financing: $50M deal announced April 2026
  • Stock surge: 600%+, from ~$3 to over $20
  • Market cap at close: $22M (vs $4B at 2021 IPO peak)
  • Shareholder vote next month: strip public-benefit-corporation status, removing the sustainable-footwear mission from the charter

The Rundown frames it as an opportunistic rebrand riding the compute shortage — comparable to the blockchain-era name changes — not a genuine business thesis.

Hot Take AI Future
AI News & Strategy Daily | Nate B Jones

Nate B Jones: your AI is 50x faster, you're only getting 2x

Nate's thesis: AI models run 10–50x faster than humans, but real-world productivity gains are capped at 2–3x because every tool, API, and file system was designed for human speed, not agent speed. Making models infinitely faster still yields only ~2–3x. The bottleneck isn't inference — it's the entire human-affordance stack wrapped around the model [28]Nate B Jones — Your AI Is 50x Faster. You're Getting 2x..

Read more

The bottleneck

At ~03:00: every timeout, rate limit, auth flow, pagination scheme, and startup sequence was calibrated to human perception and hand-speed. Jeff Dean made the same point at GTC — if an agent is 50x faster, milliseconds lost to tool startup, context switches, and paginated APIs dominate cost. NVIDIA's Billy Deli said inference is now 90% of data-center power, heading to 10–20k tokens/sec/user — consumed by agents, not humans.

We spent a trillion dollars on these agents. We want them to think collectively. We got them to do it. We made the sand to think. Now we're bottlenecking them on tool calls that were designed for humans.

Three layers of rebuild

At ~06:06:

  1. Faster tools in faster languages — Rust/Go/Zig migrations; TypeScript 7 rewritten in Go; Lee Robinson built a 38K-line Rust image compressor using only coding agents (the compiler acts as verification)
  2. Agent-native primitives — persistent containers (OpenAI shipped these in Feb so agents install deps once, never restart); branch FS (copy-on-write filesystem with sub-second branch creation); shared KV caches across agents (reducing latency 3–4x vs text-passing)
  3. Entire stack rebuild around agents as the primary consumer — "You are losing ground by standing still because every model improvement shifts the ratio of the model capability against your human effort to contain the model."
MCP has blinded us to where this needs to go — you can take a human-friendly API and stick an MCP over the top and the agent will make do, but that doesn't mean you don't eat wall clock time.

Four human roles that survive

At ~14:10:

  1. AI tool generalist — the spark who knows which AI to use and drives long-running agentic processes
  2. Pipeline/infrastructure engineer — builds and secures agentic infra
  3. Relationship closer — salespeople stay irreplaceable because "people like doing business with people"; agent-run companies will hire humans to close
  4. The grown-up in the room — someone with the maturity to put the brakes on
I think it's a promotion to the hardest and most valuable job in computing.
Tools: MCP, Rust, Go, Zig, TypeScript 7, OpenAI persistent containers, branch FS (copy-on-write), shared KV cache, Salesforce, SAP, SharePoint, Zendesk.
Hot Take
The Pragmatic Engineer AI News & Strategy Daily | Nate B Jones Real Python

AI and the workforce: DHH on peak programmer, the Red Queen memo, maintainability

DHH argues the golden era of "learned guild" programmers has already peaked: companies treat dev as a cost center, and if AI cuts dev headcount 10x they'll simply take the savings [29]The Pragmatic Engineer — DHH: "We've seen peak programmer". Nate B Jones revisits Toby Luki's Red Queen memo — "stagnation is slow-motion failure" — as the defining document for 2026 workforce restructuring [30]Nate B Jones — How the Red Queen memo exposed who will actually survive. Real Python counters that AI code is great for one-off tasks but the trade-offs shift the moment you have to maintain the output [31]Real Python — AI Code Is Great Until You Have to Maintain It.

Read more

DHH: peak programmer

DHH splits software into two camps: unlimited-scope companies (like his own) that absorb productivity gains by building more, and cost-center shops that pocket the savings. Real constraint value shifts to product management — figuring out what to build, who to talk to, where to focus — a role he admits he historically undervalued.

Nate B Jones: Red Queen

The Red Queen memo (Toby Luki, early 2025) forecast role dissolution, junior talent deprioritization, AI fluency as baseline, and dramatic compensation polarization. Nate says all of it is playing out in 2026.

Stagnation is almost certain… if we do nothing. And stagnation is slow-motion failure.
The volume is at 11, and this is happening faster and faster and faster.

Real Python: maintainability is the tell

If it's a one-way thing, great, cuz then you didn't have to write all that code and it's fantastic. But, as soon as you start maintaining these things, the questions change.

The argument: success stories people share (translating Bootstrap 3 → 4) tend to be tasks nobody touches again. The moment ongoing maintenance enters, readability and the ability to reason about the code become critical — and AI output often falls short.

Industry Hot Take
Fireship Low Level Better Stack Tech Brew

Security week: WordPress supply chain, Defender zero-days, SynthID, Recall

Four separate security stories hit on the same day: a WordPress supply-chain attack via 31 Flippa-acquired plugins [32]Fireship — WordPress supply-chain attack via Flippa, a disgruntled researcher dropping two Windows Defender zero-days after a bug-bounty dispute [33]Low Level — Windows Defender Blue Hammer / Red Sun zero-days, Google DeepMind's "unhackable" SynthID watermark reverse-engineered via a phase-shift attack [34]Better Stack — SynthID watermark cracked, and a researcher showing how malware can still steal data from Windows Recall — with Microsoft declining to call it a vulnerability [35]Tech Brew — Is it time to recall Windows 11?.

Read more

WordPress: 31 plugins backdoored via Flippa

At ~01:01: instead of exploiting a vulnerability, the attacker purchased a portfolio of 31 WordPress plugins on Flippa for a mid-six-figure sum, embedded dormant backdoors ~8 months ago, then activated them. Payloads pulled remote code and modified wp-config.php (containing DB creds + security keys). The C2 domain was resolved through an Ethereum smart contract for instant swapability. 96% of WordPress vulnerabilities originate in its plugin system — PHP scripts with full server privileges and no sandboxing.

The attacker didn't exploit a vulnerability. Instead, they legitimately acquired and took control of a portfolio of plugins by simply purchasing them for money from the original developer on Flippa.

At ~03:04: Cloudflare shipped Mdash, an MIT-licensed WordPress-compatible alternative built on Astro. Each plugin runs in its own sandboxed Cloudflare Worker with capability-based permissions declared in a manifest — directly addressing the full-privilege plugin problem.

Windows Defender: Blue Hammer + Red Sun

A researcher calling themselves "Nightmare Eclipse" released working PoC code for two Windows Defender zero-days — Blue Hammer and Red Sun — after claiming MSRC violated a bug bounty agreement and left them homeless. Both abuse TOCTOU race conditions. Blue Hammer blocks Defender's cloud-file VDM signature update with a fake stub, replaces the VDM file with a symlink pointing to the SAM hive, then lets Defender (running as SYSTEM) snapshot the symlink into a VSS file — the attacker extracts SAM from the snapshot and pass-the-hashes to admin. Red Sun exploits Defender's behavior of rewriting cloud-tagged malicious files before quarantining; target swap + content swap causes Defender to write arbitrary code into System32 and install it as a service. Rust wouldn't have fixed either — these are logic/concurrency issues, not memory safety.

I was not bluffing Microsoft and I'll do it again.

SynthID watermark cracked

Developer Alouch Denny released "reverse synth ID." By analyzing blank Gemini outputs (Gemini White and Black), they isolated the exact Fourier-transform coordinates where SynthID's spread-spectrum signal lives — and discovered the signal is unequal across channels (green 1.0, red 0.85, blue 0.7) and the phase template is near-identical across all images. A "phase-shift attack" targets specific frequency bins and shifts the watermark's phase just enough to destroy coherence — dropping Google's detector confidence 90%+ while preserving 43 dB PSNR (image looks perfect).

The moment you can see the signal in the math, you can basically delete it.

Windows Recall: not a vulnerability, per Microsoft

Windows Recall, which launched publicly in April 2025 after delays, periodically screenshots user activity and makes it AI-searchable. A researcher showed malware can trigger a legitimate Windows security prompt, wait for the user to authenticate, then intercept the vault's contents as they transfer to an unprotected display process. Microsoft's stance: inter-process communication is intentional system behavior, not a vulnerability. Security experts dispute that, noting the attack vector differs from typical short-lived credential exchanges. Windows 11 has faced sustained backlash since 2021 over forced Copilot integrations, Start menu ads, and undisableable AI features — earning the "Microslop" nickname.

Tools: WordPress, Flippa, Mdash (Cloudflare), Astro, Cloudflare Workers, Blue Hammer / Red Sun PoCs, Windows Defender, VDM signature files, SAM hive, VSS, NTLM pass-the-hash, MSRC, SynthID, reverse synth ID, Windows Recall.
Developer Tools Industry
Real Python Better Stack Data Science Weekly

Dev & research notes: ChromaDB, Docker, Data Science Weekly #647

Lighter dev-tooling and research grab-bag: Real Python's ChromaDB vector-math primer (magnitude, dot product, cosine similarity with spaCy embeddings) [36]Real Python — Vector Databases and Embeddings With ChromaDB, Better Stack's three Docker-build speedups that took a 10-minute build to under 3 minutes [37]Better Stack — Your Docker Builds Are Slow (and it's your fault), and Data Science Weekly #647's roundup of geospatial CLIP, Sebastian Raschka on coding-agent components, and Nathan Benaich's April 2026 State of AI [38]Data Science Weekly — Issue 647.

Read more

Real Python: ChromaDB vector foundations

At ~00:00: vectors as ordered numerical arrays; three operations (Euclidean norm, dot product, cosine similarity) demonstrated in pure Python then NumPy (np.linalg.norm, np.dot, @). At ~12:08: spaCy's en_core_web_lg (300K+ embeddings, 300 dims); practical comparisons reveal "cat" vs "dog" = 0.80, "tasty" vs "delicious" = 0.92, "cat" vs "spaceship" = 0.13, "delicious" vs "spaceship" = 0.04.

Cosine similarity is the normalized dot product of two vectors. It isn't influenced by their scale, only their direction.

Better Stack: Docker builds

Three practical fixes: (1) copy package files and install deps before copying source, to preserve the dep-install layer cache; (2) add a .dockerignore (author cut build context from 500 MB to 20 MB); (3) use BuildKit cache mounts (--mount=type=cache) — author's install step dropped from 3 min to 8 seconds.

Put this all together and your builds can drop from like 10 minutes or so to under 3 minutes. Same code, no new tools, just fixing what most people overlook.

Data Science Weekly #647

Links worth pulling:

Tools: ChromaDB, NumPy, spaCy (en_core_web_lg), Docker, BuildKit, .dockerignore, multi-stage builds.

Sources

  1. Blog Introducing Claude Opus 4.7 — Anthropic, Apr 16
  2. YouTube Claude Opus 4.7 in 5 Minutes — Developers Digest, Apr 16
  3. YouTube Opus 4.7 Is GREAT (except the token usage) — Better Stack, Apr 16
  4. YouTube Claude's new Cursor killer just dropped — Theo - t3.gg, Apr 16
  5. YouTube Vibe Coding Gets an Upgrade — The AI Daily Brief, Apr 16
  6. Blog Codex for (almost) everything — OpenAI News, Apr 16
  7. YouTube Codex for (almost) everything — OpenAI, Apr 16
  8. YouTube Claude Opus 4.7 Just Dropped... Or Did It Really? — Nate Herk | AI Automation, Apr 16
  9. Blog Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 — Simon Willison's Weblog, Apr 16
  10. YouTube Claude Ultraplan vs Superpowers: I Found a WINNER and It's Not Even Close — Better Stack, Apr 16
  11. Blog llm-anthropic 0.25 — Simon Willison's Weblog, Apr 16
  12. Blog datasette.io news preview — Simon Willison's Weblog, Apr 16
  13. Blog Introducing GPT-Rosalind for life sciences research — OpenAI News, Apr 16
  14. YouTube Episode 16: Building AI for Life Sciences — OpenAI, Apr 16
  15. Blog Accelerating the cyber defense ecosystem that protects us all — OpenAI News, Apr 16
  16. YouTube The GPT Moment for Robotics Is Here — Y Combinator, Apr 16
  17. YouTube Hire barrels, not ammunition — Lenny's Podcast, Apr 16
  18. YouTube Building pi in a World of Slop — Mario Zechner — AI Engineer, Apr 16
  19. YouTube $1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs – Diego Carpentero — AI Engineer, Apr 16
  20. YouTube AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more! — AI Engineer, Apr 16
  21. YouTube AIE Miami Keynote & Talks ft. OpenCode, Google DeepMind, OpenAI, and more! — AI Engineer, Apr 16
  22. YouTube Why DeepMind's New AI Broke The Internet — Two Minute Papers, Apr 16
  23. YouTube SuperGemma-4 (26B) UNCENSORED + Hermes, OpenClaw, OpenCode: THIS IS SO CRAZY!!! — AICodeKing, Apr 16
  24. YouTube New BEST local AI image generator is here! — AI Search, Apr 16
  25. Blog New ways to create personalized images in the Gemini app — Google — Gemini App blog, Apr 16
  26. Blog Building a Unified Pipeline for AI Token Spend — Ramp Builders, Apr 16
  27. Newsletter Allbirds ditches sneakers for AI compute — The Rundown AI, Apr 16
  28. YouTube Your AI Is 50x Faster. You're Getting 2x. You're Fixing the Wrong Thing. — AI News & Strategy Daily | Nate B Jones, Apr 16
  29. YouTube DHH: "We've seen peak programmer" — The Pragmatic Engineer, Apr 16
  30. YouTube How the Red Queen memo exposed who will actually survive — AI News & Strategy Daily | Nate B Jones, Apr 16
  31. YouTube AI Code Is Great Until You Have to Maintain It — Real Python, Apr 16
  32. YouTube Millions of WordPress sites just got hacked... again — Fireship, Apr 16
  33. YouTube He Leaked Windows Exploits For Revenge — Low Level, Apr 16
  34. YouTube Someone Just Reverse-Engineered Google's "Unhackable" AI Watermark — Better Stack, Apr 16
  35. Newsletter Is it time to recall Windows 11? — Tech Brew, Apr 16
  36. YouTube Vector Databases and Embeddings With ChromaDB: Vectors & Word Embeddings — Real Python, Apr 16
  37. YouTube Your Docker Builds Are Slow… And It's Your Fault — Better Stack, Apr 16
  38. Newsletter Data Science Weekly - Issue 647 — Data Science Weekly, Apr 16