April 27, 2026
OpenAI's April 22 Workspace Agents launch lets any business team describe a recurring workflow in plain English and get a Codex-powered agent that runs across Slack, Drive, Calendar, SharePoint and custom MCP servers — published, scheduled, and embedded inside Slack itself.[1]Nate B Jones — OpenAI's Workspace Agents Nate B Jones argues the company most squeezed isn't Anthropic or Perplexity — it's the lightweight automation layer (Zapier, Make, n8n, Copilot Studio). The catch: Codex executes in the cloud against personal authenticated app connections, so governance is now the product, and the free window closes May 6 before credit-based pricing kicks in.
~01:00 Workspace Agents shipped April 22 as a research preview for ChatGPT Business, Enterprise, Education and Teachers plans (not Plus). Users click "Agents" in the sidebar, describe a workflow, and ChatGPT drafts the profile, selects connected apps, generates skills, writes instructions and produces a preview before publish. Templates exist for product feedback routing, weekly metrics reporting, lead outreach and software review.
~02:00 Connectors at launch: Google Calendar, Drive, Slack, SharePoint, plus custom MCP servers. Agents can be published, scheduled, and run inside Slack via the ChatGPT agent app — they show up where work already happens. Enterprise admins must opt in; not available for customers using enterprise key management. Free until May 6, then credit-based pricing.[1]Nate B Jones — OpenAI's Workspace Agents
~05:03 "A custom GPT is basically a prompt in a suit." Quality lived or died by the prompt author, and most teams abandoned them within weeks. ~06:04 Projects (context-first) were a real improvement — shared workspace, files, instructions, memory — but "still assume a huge human lift." ~07:06 One team's RFP workflow moved into a Workspace Agent: the agent reads the RFP, pulls similar prior responses from SharePoint, drafts a first pass, flags unanswerable fields, and posts the draft + missing pieces to the AE's Slack DM — turning hours of assembly into ~20 minutes of editing.
Custom GPTs made the team carry the product. Projects made the team carry the context. Workspace agents… actually lift the load.
~07:06 Working pattern: recurring (weekly/daily/hourly), clear good vs. bad output, describable in a paragraph, crosses 2–3 tools. Reference build is OpenAI's launch example — a Rippling sales consultant built an opportunity agent without engineers that researches accounts, summarizes Gong calls, and posts deal briefs to Slack, claiming 5–6 hours/week saved per rep. ~12:10 "If the path is known, it gets really interesting. If the path is unknown, you should be careful." It's the wrong product for novel research, one-off polished artifacts, or long-horizon autonomous work.
~14:11 "Most agent products don't fail because the demo is bad. They fail because the security and the governance story are thin." Workspace Agents ship with version history, run/user analytics, compliance API coverage, and admin controls over who can build/publish/use which apps. ~15:11 The landmine: a role-based control allows agents to be published with the creator's personal authenticated app connections, so anyone running the agent may act through the creator's account. Right posture: least privilege, service accounts, scoped access, audit regularly. ~16:11 Workspace Agents are powered by Codex in the cloud — they use tools, run code, remember, continue across steps. The value isn't "an agent can update the CRM" — it's "an agent can update the CRM inside a permission model the company can live with."
~16:11 The default question for automating recurring Slack/docs/calendar/call/ticket workflows is shifting from "go build a Zap" to "build the Workspace Agent first; only move to a dedicated automation platform if it hits a wall." ~20:16 OpenAI is going horizontal — turning Codex + Workspace Agents into "the default OS for the entire work of the corporation." Anthropic is going vertical, with Claude Design as a Figma killer and verticalised hires in finance and HR.
~21:17 Pick one weekly job that eats 5–6 hours, has clear output, crosses 2–3 tools, and has a human reviewer. "If you cannot describe the workflow really simply, the agent will not save you. You are probably asking it to solve ambiguity you have not resolved as a team." Eval by time saved vs. review burden — not by "is it impressive?"
Simon Willison traces the full history of the contractual AGI clause between OpenAI and Microsoft, which effectively ended on April 27, 2026 when the restructured deal made Microsoft's license non-exclusive and tied revenue shares "independent of OpenAI's technology progress."[2]Simon Willison — The deceased AGI clause The provision had defined AGI as "highly autonomous systems that outperform humans at most economically valuable work" and tied Microsoft's IP rights to whether that threshold was ever verified. As of February 2026 OpenAI was still publicly defending the definition and verification process; by late April it was simply gone.
The clause originated in July 2019 when Microsoft became OpenAI's preferred commercialization partner for "pre-AGI technologies" — language implying that AGI achievement would fundamentally alter the partnership. OpenAI's April 2018 Charter defined AGI as "highly autonomous systems that outperform humans at most economically valuable work."[2]Simon Willison — The deceased AGI clause
By December 2024 The Information revealed that AGI had been financially defined as $100 billion in maximum profit potential for the earliest investors. As of October 2025, an independent expert panel was meant to verify any AGI achievement, with Microsoft retaining IP rights until verification or 2030, whichever came first. As late as February 27, 2026 OpenAI confirmed the definition and verification process were still unchanged.
The clause met its end on April 27, 2026 when the restructured Microsoft–OpenAI deal made Microsoft's license non-exclusive and revenue shares were decoupled from OpenAI's technology progress — effectively rendering the AGI verification mechanism moot. Willison stitches together reporting from The Information, TechCrunch, The Verge, and Bloomberg, with a sideways nod to a 2023 Matt Levine hypothetical that had imagined a beneficial superintelligence solving capitalism. The point: a once-consequential trigger quietly became a footnote.
Highly autonomous systems that outperform humans at most economically valuable work… independent of OpenAI's technology progress.
DeepSeek released preview versions of V4, with V4 Pro priced at $1.74/M input and $3.48/M output tokens — vs. GPT-5.5 at $5/$30 and Opus 4.7 at $5/$25 — while landing near GPT-5.4 and Gemini 3.1-Pro on reasoning benchmarks and topping Vals AI's Vibe Code Bench.[3]The Rundown AI — DeepSeek V4 The model ships with a 1M-token context window and runs on Huawei Ascend chips, positioning it as a non-Nvidia option for buyers diversifying infrastructure.[4]Sherwood Snacks — Chips Ahoy! DeepSeek itself acknowledged it still trails frontier reasoning capabilities by three to six months.
On Artificial Analysis's Intelligence Index, V4 Pro lands in the fourth tier alongside Meta's Muse Spark — competitive but not at the absolute frontier.[3]The Rundown AI — DeepSeek V4 What's strategically interesting is the Ascend support — DeepSeek is one of the first credible coding-grade open models with first-class deployment on non-Nvidia silicon, which matters more for Chinese enterprise buyers than for Western developers.
Sherwood notes the market reaction was muted compared to DeepSeek's earlier debut, which had crashed AI and chip stocks earlier in 2026.[4]Sherwood Snacks — Chips Ahoy! By self-reporting the gap rather than overclaiming, DeepSeek sidesteps a repeat of that disruption.
OpenRouter analyzed over a million real requests from users who switched from Opus 4.6 to 4.7 and found the new tokenizer generates 32–45% more tokens than 4.6 — at identical list prices ($5/M input, $25/M output). Real costs went up 12–27% across most prompt-size buckets, with short prompts (under 2K tokens) the lone exception (-1.6% net cost).[5]OpenRouter — Opus 4.7 tokenizer Separately, Better Stack flagged a confirmed Anthropic bug where any prompt containing the uppercase string HERMES.MD triggered overcharging via faulty third-party harness detection.[6]Better Stack — HERMES.MD bug
OpenRouter engineer Justin Summerville used OpenRouter's "QuadChars" tokenizer as a stable baseline to isolate the tokenizer change from other variables. Anthropic disclosed a 1.0–1.35x inflation range; OpenRouter's empirical data puts raw token inflation at 32–45% over Opus 4.6, with the worst inflation hitting shorter prompts (~45% under 2K tokens) and tapering for longer contexts (~32% at 50K–128K).[5]OpenRouter — Opus 4.7 tokenizer
Cost impact by bucket: +27.2% (2K–10K), +25.2% (10K–25K), +21.3% (25K–50K), +11.9–15.3% (very long). Short prompts get cheaper because completion length dropped 62% in that bucket. For 10K+ prompts, Opus 4.7 also generates 13–30% longer completions, partially explaining higher output costs. Prompt caching is the saving grace at scale: at 128K+ inputs, 93% of the extra tokens land in the cached tier, which carries a 90% discount.
A Reddit user on Anthropic's $200 Max plan found their extra usage exhausted while the dashboard showed only 13% weekly usage. Through systematic testing they isolated the trigger: any prompt containing the uppercase string HERMES.MD — the file didn't even need to exist on disk, merely appearing in context (e.g., a git commit message) was sufficient. Anthropic confirmed the bug and attributed it to "third-party harness detection" — logic intended to identify when Claude is being used via an external harness — and offered the user a refund.[6]Better Stack — HERMES.MD bug
Sorry this was a bug in that third-party harness detection. — This only means the bug was in how it detected a third-party harness, not the fact that it will charge you more for one.
Microsoft released VibeVoice-ASR, a 17.3 GB Whisper-style speech-to-text model with built-in speaker diarization, under the MIT license. Simon Willison ran it locally and processed 99.8 minutes of audio in 524.79 seconds (~8.7 minutes) — peak memory 30.44 GB, throughput 38.585 tok/s.[7]Simon Willison — VibeVoice Native diarization makes it materially more useful than Whisper for meetings and podcasts.
Willison drove the model with the mlx-audio Python package (by Prince Canuma) via uv, and also tested the mlx-community/VibeVoice-ASR-4bit quantized version (5.71 GB) for a lighter footprint. Peak memory hit 30.44 GB during inference and 61.5 GB during prefill. The default max-tokens setting of 8192 only covers about 25 minutes of audio, so longer recordings need to be chunked into 60-minute segments.[7]Simon Willison — VibeVoice
Google Meet is rolling out real-time speech translation to mobile after a desktop-only debut. Two participants can speak different languages in real time, with translated output synthesized in each speaker's own voice — what Willison calls "the ultimate sci-fi translation app." Six languages at launch (English, Spanish, French, German, Portuguese, Italian); reliability is alpha-grade — testing between an iPhone and an iPad failed for him.[8]Simon Willison — Google Meet translation
Anthropic appointed Theo Hourmouzis (formerly SVP for ANZ + ASEAN at Snowflake) as General Manager for Australia and New Zealand, and officially opened its Sydney office — following recent expansions into Tokyo, Bengaluru, and Seoul.[9]Anthropic — Sydney office The announcement coincided with a cluster of new partnerships: a multi-year Canva deal integrating Canva Design Engine into Claude Design by Anthropic Labs, a multi-year Xero platform collaboration, and a Claude for Nonprofits partnership with YMCA South Australia (~1,250 staff across 65+ locations).
Existing relationships with Commonwealth Bank and Quantium were highlighted alongside AI-for-Science research collaborations with Australian National University, Murdoch Children's Research Institute, the Garvan Institute of Medical Research, and Curtin University.[9]Anthropic — Sydney office
Dwarkesh argues Anthropic should not be reassured by the Pentagon's claim that AI red lines on mass surveillance are unnecessary because such surveillance "is already illegal." His core argument: under the third-party doctrine, mass surveillance is largely already legal — it's just impractical. AI removes the cost barrier, and the Snowden record shows the government routinely uses secret legal interpretations to do exactly what it publicly disavows.[10]Dwarkesh — Pentagon's AI promise
Americans have no Fourth Amendment protection over data shared with banks, ISPs, phone carriers, or email providers; the government can purchase and read it in bulk without a warrant. Dwarkesh cites ~100 million CCTV cameras in the US and a $30 billion estimate to process every one of them. Because a given level of AI capability gets ~10x cheaper each year, "by 2030 it would cost less to monitor every nook and cranny of the country than to remodel the White House." The infrastructure for mass surveillance already exists; AI removes the cost barrier.[10]Dwarkesh — Pentagon's AI promise
The 2013 Snowden revelations showed the NSA — part of the Department of War — used the 2001 Patriot Act to justify collecting every American phone record under a secret court order, on the theory that some subset might be relevant to a future investigation. Dwarkesh frames the current Department of War / Anthropic dispute over usage policies as "an early version of what will be the highest stakes negotiations in human history."
It would be incredibly naive to take that at face value. No government is going to call what they are doing mass surveillance. For them, it will always have a different euphemism.
Dwarkesh's "what I've been thinking" post threads five distinct arguments. The most provocative: the common definition of intelligence ("the ability to achieve goals across domains") is actually a definition of power — and AI's tight verification loops make it disproportionately bad at the kind of slow, century-long scientific revolutions that matter most.[11]Dwarkesh — Weekly post
Five hyperscalers own 70%+ of global AI compute, and much of that is reserved for OpenAI, Anthropic, and GDM. As inference demand scales, ordinary applications and smaller players may simply be priced out — concentration of capability becomes concentration of access.[11]Dwarkesh — Weekly post
What actually unlocked long-horizon coding agents — model change, scaffolding, RLHF signal? Are models becoming more sample-efficient, or just getting better data? Why does the memory/sample-efficiency tradeoff in attention exist? The KV cache figure for Llama 3 70B is 320 KB per token vs. 0.075 bits per token for weight information — a 35-million-fold difference.
Power is more the product of having authority and trust to get people collaborating, rather than galaxy brain scheming capability.
Powerful leaders aren't necessarily the most cognitively capable; they have authority and social trust. Garrett Jones's national-IQ research shows IQ correlates with economic outcomes — but the mechanism is spillover through cooperation, not raw individual problem-solving. An AI that's "intelligent" in the abstract may still lack the social substrate that makes power actually work.
Big breakthroughs often can't be verified for decades. Copernicus's 1543 model was less accurate than Ptolemy's; stellar parallax wasn't measured until 1838 (295 years later). Mercury's 43-arcsec/century precession led scientists to postulate a fictitious "planet Vulcan" before Einstein resolved it in 1915. Prout's hypothesis (atomic weights as integer multiples of hydrogen) was pursued for a century despite contradictory chlorine measurements (35.5) — and turned out to be directionally right once isotopes were discovered. AI optimizing for verifiable results will systematically underinvest in correct-but-unverifiable ideas.
Darwin and Wallace independently arrived at natural selection because Lyell's 1830 Principles of Geology had just supplied the deep-time scaffolding the theory required — not because either was a unique genius. Evolution's evidence is "circumstantial, retrospective, cumulative." The next AI-enabled science breakthroughs may similarly depend on conceptual scaffolding we haven't built yet.
Google DeepMind researcher Cassidy Hardin walks through Gemma 4, the open-weights family that landed the prior week — four sizes (E2B, E4B, 26B MoE, 31B dense), Apache 2.0 licensing, native multimodality (text/vision/audio), and architectural advances in attention, mixture-of-experts, per-layer embeddings, and variable-resolution vision encoding.[12]Cassidy Hardin — Gemma 4 architecture The 31B dense model placed #3 globally on LM Arena, outperforming models 20× its size.
~00:14 Gemma 4 launched the prior week. The lineup spans four sizes: two compact "effective" models (E2B and E4B) optimized for on-device use on phones, iPads, and laptops; a 26B mixture-of-experts model (the first MoE in the Gemma family) requiring only 3.9B active parameters; and a 31B dense flagship.
~01:17 Apache 2.0 licensing, "deliberately done in order to make our models more accessible for the everyday developer." The 31B and 26B both rank top-six on LM Arena, with the 31B at #3 globally.
~04:19 Architecture: 5:1 ratio of local-to-global attention layers (4:1 in E2B), with sliding windows of 512 tokens (small) and 1,024 (large), final layer always global. ~05:19 Grouped-query attention groups 2 queries per KV head locally, 8 globally; global KV head dimension doubled to 512 to preserve quality. ~06:19 The MoE has one shared router expert (3× the size of regular experts) plus 128 small feed-forward experts, 8 activated per pass.
~08:20 Per-layer embeddings (PLE) — a dedicated embedding table per layer stored in flash memory rather than VRAM — unlock significant on-device gains. PLE embedding dimension is 256; E2B has 35 layers, E4B has 42.
~10:22 Multimodality: 31B and 26B use a 550M-parameter vision encoder; E2B/E4B use a smaller 150M one. Variable aspect ratios and resolutions (5 settings) replace Gemma 3's pan-and-scan. Images split into 16×16 patches, then 3×3 patch grids pool into single embeddings — a 280-token budget yields 2,520 patches. Audio (E2B/E4B only) uses a 35M-parameter conformer fed by a mel-spectrogram tokenizer.
~18:27 Distribution: self-host via Hugging Face, Kaggle, or Ollama, or use cloud hosting on AI Studio and Vertex for the larger 31B and 26B.
By requiring that we no longer need to store this additional embedding table in VRAM and we can store it in flash memory, we're able to get incredible improvements on the inference side.
GitHub's MCP lead Sam Morrow recounts a year of building the GitHub MCP server: cutting tools and tokens to fight context bloat (~49% input reduction, >75% output reduction on list_pull_requests), embracing OAuth 2.1 with PKCE, rejecting Dynamic Client Registration, running a stateless Redis-backed server, hitting ~7–8M tool calls/week, and learning that defaults always win, agents hallucinate scopes, and the "lethal trifecta" is unsolved.[13]Sam Morrow — GitHub MCP at scale
~01:08 The local MCP was open-sourced in April of the prior year, drew >100 contributed tools, and became the most-starred repo of the week.
~02:09 They quickly hit the LangChain-documented problem: more tools and more context make agents worse — confused, forgetful, blowing context windows. ~03:11 Morrow added "tool sets" (groupings users could toggle), dynamic tool selection, and an unreleased RAG-style semantic tool search. The punchline: nearly every user just stuck with defaults.
~05:12 Trimming tools to the general case cut initial context load ~49%; the default config now exposes ~40 tools. ~06:13 Output tokens were also cut — list_pull_requests dropped >75% just by tailoring the response shape. Tool-call success rate is now >95%; remaining failures are mostly agents hallucinating which repos they have write permission on.
~08:13 OAuth 2.1 with PKCE was contributed back to GitHub's authorization server. They explicitly rejected Dynamic Client Registration: "if you implement it properly, it's hard not to have unbounded growth of app databases" — and there's no reliable app identity. They expect MCP's emerging Client ID Metadata approach to be a better path. ~10:14 The Invariant Labs prompt-injection exfil attack? Real, but not unique to GitHub or MCP — it's the "lethal trifecta" problem (per Simon Willison) affecting all agent setups, and not yet solved.
~13:16 The architecture is fully stateless — Redis for session storage, no session affinity, brand-new MCP server instance built on every single request. ~16:19 Outlook: thousands of tools per server will be normal, server discovery should be automatic, tool use will get more compositional (Cloudflare's code mode, Anthropic's tool-search API in Claude Code, OpenAI's similar API). ~17:20 Closing numbers: 11M+ Docker downloads, 126 contributors, 2,300+ issues/PRs (>7 per day for a year), 4,000 forks, ~30,000 stars, ~8M tool calls/week.
I fully expect that thousands of tools will be normal very soon. I'll probably reverse many of the fewer-tools decisions.
Anthropic forward-deployed engineer Karan Sampath argues that enterprise MCP adoption stalls until the community builds gateways — a middle layer between MCP clients and dozens-to-hundreds of internal MCP servers handling auth, RBAC, observability, secured tunnels, sub-registries, and CLI scaffolding. The pitch: "gateways are all you need." Bless one platform as the root of trust so any team can ship its own MCP server without a separate security review, and cleanly separate the agent harness from the data layer.[14]Karan Sampath — Enterprise MCP gateways
~02:09 The "three-headed hydra" of missing enterprise table stakes: observability (who's using which MCP and which tools), access control (scoping servers/tools to the right groups), and security (verifying servers, preventing exfiltration, letting untrusted clients touch private data). The official MCP registry has thousands of servers and is growing fast — but registries don't solve auth, observability, or credentials.
~04:11 Concrete failure mode: every team can now build MCP servers via coding agents, but security teams are overloaded and can't triage which to allow; C-suites ask why agents underperform. ~05:11 The fix: bless one platform as root of trust so MCP development is decentralized while governance stays centralized.
~07:13 Anatomy of a gateway: auth, RBAC, proxy/router (clients only see the gateway, servers only trust the gateway), secured tunnel, sub-registry of internal MCP servers, and a CLI an agent can use to scaffold new servers.
~11:14 Free-lunch follow-ons: surface invariance (one gateway plugs into claude.ai, Claude Code, and Claude Code work simultaneously instead of configuring 40 servers per client), encrypted connections, faster iteration without repeated security reviews, pluggable credentials (company-wide, team-wide, or service accounts swappable per server), scalability from tens to hundreds of thousands of agents. ~15:16 Future vision: separate the agent harness from where data lives — same gateway can be wired to Anthropic's managed agents or used in-house with the Claude Agent SDK.
The goal for any security team is to bless one platform. We want to separate the agent harness from where your data lives.
Latent Space hosts Alessio and Swyx interview Applied Intuition co-founders Qasar Younis (CEO) and Peter Ludwig (CTO) about building physical AI for vehicles, machinery, and defense. Central thesis: physical machines today resemble the pre-Android phone market — fragmented operating systems prevent modern AI from running across devices, so Applied Intuition is consolidating the OS layer while building simulation, autonomy models, and neural sim to deploy intelligence onto cars, trucks, mining/construction/agriculture equipment, and defense systems.[15]Latent Space — Applied Intuition ~$15B valuation, ~1,000 engineers (83% of headcount), 40+ ex-founders on staff, 18 of the top 20 non-Chinese global automakers as customers.
~02:00 18 of the top 20 non-Chinese global automakers are customers; verticals span cars, trucks, construction, mining, agriculture, and defense (land, air, sea). ~04:01 They run L4 driverless trucks in Japan today, started in YC, did simulation/data infra first (often confused with Scale AI), then expanded to 30+ products. Stack has gone through ~4 complete evolutions on a 2-year cadence.
~09:04 (1) Simulation and simulation tooling (including RL infrastructure); (2) Operating systems — schedulers, memory management, middleware, message passing, reliable networking — built because nothing in market was good enough for safety-critical AI workloads; (3) Autonomy models including world models and the human-machine teaming layer (voice, cabin awareness, driver-state monitoring). They explicitly do not make sensors or chips.
~14:06 On lidar: hands-down useful in R&D for per-pixel depth, then often removed in production once the camera model has learned depth from lidar–camera pairs. C++ dominates with some Rust; assembly when needed.
~19:11 Larry Page got Google into Android because the ~50 phone OSes made it impossible to ship Google products everywhere. Physical machines are pre-Android today. Applied's OS handles real-time sensor latencies, redundancy/fail-safes (e.g., cosmic-ray bit flips), and reliable over-the-air updates so non-Tesla automakers can finally update safety-critical modules without dealership visits.
~29:30 End-to-end autonomy models take sensor data in and emit control signals out, but training/RL on them requires simulating sensor data — Applied's "neural simulation" is described as a hybrid of Gaussian splatting and diffusion methods. Performance is everything: "cheap and fast enough or you can't get worthwhile results." ~33:32 Verification has shifted from binary pass/fail (Euro NCAP) to statistical "how many nines." They cite Cruise as a regulator-communication failure piled on top of a tech incident, and credit Waymo with setting a responsible benchmark.
~45:36 Offboard is unconstrained (data center, large models, no latency); onboard is distillation-heavy with millisecond budgets and power/thermal limits. "In the physical AI world we're not really constrained right now by the intelligence of the models. It's actually deploying them on the hardware." Gemma-class ~2B models can run embedded; autonomy itself is 100% in-house. Legacy mining/agriculture autonomy used hand-coded systems with RTK GPS (1–2 cm accuracy) for 20 years.
~54:42 Interviews are "way harder than they've ever been" but allow selective AI-tool use; bimodal distribution where AI-fluent engineers have an enormous productivity gap. Cursor was the dominant tool, then Claude Code took over; internal leaderboard. ~58:43 Founder advice: pick a small, deep problem space; technology compounds (citing Waymo's $126B as proof the human brain underweights compounding); don't blindly copy mature-company strategies; first-principles thinking.
Physical machines today are more akin to the state of the phone market before Android and iOS existed.
Dan Shipper interviews Kieran (GM of Cora and creator of Compound Engineering) about a mental model for human-AI collaboration. Compound Engineering's original four steps — plan, work, review, compound — gave way to a six-step "sandwich": brainstorm/ideate at the top, plan/work/review automated in the middle, polish/elevate at the bottom. Humans are the bread; AI is the filling.[16]Every — The AI Sandwich[17]Every — Compound Engineering plugin
~02:03 Compound Engineering's original four steps: (1) plan — make requirements crisp; (2) work — agent implements; (3) review — catch issues, iterate; (4) compound — write learnings back into the repo so future agents avoid the same mistakes. The compound step is the "most powerful piece" — agents read accumulated learnings on future runs, so errors aren't repeated.[17]Every — Compound Engineering plugin
~05:08 Working with Trevin Chow (a product person), Kieran realized two more steps mattered most: a brainstorm step (problem-framing dialogue) and an ideate step (going wide with diverse perspectives) — both before planning. The realization: the human should think hard at the top of the stack and hand off planning entirely.
~07:09 The other endpoint is validation. Browser automation can confirm tests pass, but a human clicking around catches the "this doesn't feel good" polish work. Without that final pass, everything becomes slop.
~14:19 Any local problem ("my knee hurts") has a larger frame around it (take Advil → stretch IT band → stop running on hard surfaces). Humans are good at flipping frames; agents are not. Even if agents simulate millions of personas to automate ideation, the final yes/no needs to be yours if the output is to be yours.
Humans are the bread in the sandwich and the AI is in the middle. If you want it to be your own, you cannot fully automate everything. It's like art.
~22:30 Dan's bar for AGI: when it's economically profitable to run an agent 24/7 that never turns off, autonomously picking next tasks at varying time horizons (5 minutes vs. 4 days). "We're not even close — likely needs fundamental architecture changes for better online learning." ~25:33 All work sits on a rote-to-art spectrum; the rote stuff gets automated; humans move toward the more creative parts. The frame keeps moving.
Theo, a self-described long-time Markdown advocate, reacts to a blog post arguing Markdown is broken and concedes the point. Concretely: ambiguous syntax (multiple ways to write bold, italics, headings, horizontal rules, lists), real-world parser CVEs (a 6.9-severity ReDoS in markdown-it), inline HTML as an escape hatch that turns every Markdown parser into an HTML parser too, context-sensitive grammar (footnotes/refs require global resolution), and now LLMs and agents using Markdown as their lingua franca — so every parser bug becomes a potential prompt-injection / XSS / DoS vector at scale.[18]Theo — Markdown is broken
~05:03 For bold alone, CommonMark accepts double-star, double-underscore, and <b> HTML tags. Two ways to write headings (ATX vs. setext), two ways to write horizontal rules (one of which collides with setext headings), two ways to write unordered lists, ordered lists that don't care about your numbering. Theo agrees with the article: "literally the C++ of markup languages."
~07:04 Markdown parsing has spawned its own class of vulnerabilities: ReDoS (regular-expression denial of service). Theo cites a CVSS 6.9 CVE in markdown-it where a string of 65,553 stars in a link can hang the parser. Inline HTML — Markdown's "inline assembly" — multiplies XSS attack surface. He audits his own blog live and is embarrassed: an iframe for a YouTube embed in the second paragraph, an <img> stuffed inside Markdown link brackets to make a clickable image.
~13:12 Reference-style links and footnotes mean a token's meaning depends on declarations elsewhere in the document. [^1] looks like a footnote token in isolation, but if a matching definition exists later, the same syntax renders as a link instead. "That breaks purely context-free parsing assumptions."
~17:16 Proposed alternatives all fail: plain text ("can't show it to somebody that doesn't know what a null pointer dereferences"), MDX ("so busy trying to be HTML, it forgets it needs to be legible"), reStructuredText ("wonderful if you only read it and never write it"). The article's prescription, which Theo endorses: a custom-built tool with a sane unambiguous syntax, a real build system, no inline HTML (replaced by well-defined shortcodes), formal compile-time hooks, and a trivially parseable grammar. ~20:19 Historical context: in 2012, Jeff Atwood begged John Gruber to bless a Markdown standard. That plea went nowhere; the fragmentation between flavors is the direct result.
~11:09 The AI angle: "I've seen a lot of attempts to rethink how we parse Markdown because all of the agents spit out Markdown. We want to make things that are better, faster, more reliable, and all of them are just full of exploits, edge cases, vulnerabilities, and more."
This language is literally the C++ of markup languages. — I did not realize how much I hated Markdown and I use it every day.
Y Combinator dropped three Requests for Startups in a single day. Headline thesis across all three: AI is collapsing the cost of services, diagnostics, and field-level intelligence — so the next wave isn't tooling for incumbents, it's replacing the work itself.[19]YC RFS — AI-Native Services[20]YC RFS — Personalized Medicine[21]YC RFS — Low-Pesticide Ag
YC's argument: services like accounting, tax, and administration followed a three-phase arc — manual labor → software → AI co-pilots. Most 2023–2025 startups built the co-pilot layer. YC's current interest is the next step: AI-native companies that skip the software model entirely and simply deliver the service outcome. Total spend on services is many times larger than spend on software, and much of services spend is already outsourced — making it structurally easier to replace with an AI-native product than embedded in-house workflows. Target verticals: insurance brokerage, accounting, tax, audit, compliance, healthcare administration. Many will embed a human professional alongside the AI for edge cases and trust.[19]YC RFS — AI-Native Services
Instead of giving you a tool, they just do the work.
Three convergent trends: (1) intelligent agents can analyze patient-specific data — genomics, EHR, wearables, diagnostics — at individual scale; (2) genome sequencing has fallen faster than Moore's Law and new early-detection diagnostics are coming online; (3) the cost of printing N-of-1 genetic medicines (e.g., personalized mRNA) is collapsing, with the FDA showing greater openness to N-of-1 trials. YC wants founders across the full stack: data ingestion, agent harnesses, therapeutic design, delivery logistics.[20]YC RFS — Personalized Medicine
Modern farming is on a chemical treadmill — pests evolve resistance, new pipelines are slow and expensive, margins erode. YC argues four shifts make this solvable: (1) computer vision can identify individual weeds and pests at field scale; (2) cheap sensors and cameras enable broad deployment; (3) precision robotics can act per-plant; (4) microbes, peptides, and RNA-based solutions can replace whole classes of synthetic chemicals. The benchmark: cut pesticide use by 90% while increasing yield. "That's not just a good business, that's a generational company."[21]YC RFS — Low-Pesticide Ag
Nate Herk packs 32 Claude Code tactics into 16 minutes — context hygiene with /init, /context, /compact; sub-agents on Haiku for cost; agent teams that share task lists; Git-worktree parallelism; ultra think for max thinking budget; the Context7 MCP for live library docs.[22]Nate Herk — 32 Claude Code tricks Separately, AICodeKing walks through free-claude-code, a 14k-star GitHub proxy that swaps Anthropic for NVIDIA NIM (free, 40 req/min), OpenRouter, DeepSeek, LM Studio, Llama.cpp, or Ollama — same Claude Code CLI, no API key needed.[23]AICodeKing — free Claude Code proxy
~00:00 Session setup: /init on every project to auto-generate CLAUDE.md; /status for a persistent terminal status line; /context for token-share breakdown; /compact at ~60% context. Always start in plan mode (Shift+Tab).
~03:01 Prompting: treat Claude like a junior dev — give it problems, not commands. Build verification steps into the to-do list ("don't move on until 95% confident"). Use /rewind to roll back. Push back on mediocre output: "Scrap that. Do a more elegant version" — Claude often returns a dramatically better result.
~05:03 Skills, CLAUDE.md, MCP economics: build custom skills as Markdown files in .claude/skills/; cap CLAUDE.md at 150–200 lines and route to external files. When tight on tokens, replace MCP servers with hardcoded API endpoints — MCP loads its full tool-definition list into context.
~05:03 Sub-agents and agent teams: run sub-agents on cheaper Haiku for high-volume tasks (e.g. scraping hundreds of articles); use Git worktrees (claude --worktree <feature-name>) to run 3–5 parallel sessions on the same project.
~09:05 Visual capabilities + remote: build-screenshot-refine loops; Chrome DevTools integration for front-end QA; feed inspiration screenshots with HTML/CSS to copy designs faithfully. Host on a VPS for always-on sessions; new feature lets you remote-control local sessions from your phone with code never leaving the machine.
~09:05 Power user: /hooks for notification sounds; explicit allowlist + denylist instead of --dangerously-skip-permissions; ultra think for ~32K-token thinking budget; the Context7 MCP for version-specific live library docs.
Anything in the deny list is going to take priority over anything in the allow list.
The free-claude-code proxy is a drop-in: point Claude Code's ANTHROPIC_BASE_URL at localhost:8082 and route to NVIDIA NIM (free, 40 req/min), OpenRouter free models, DeepSeek (cheap), or fully local via LM Studio / Llama.cpp / Ollama. Setup needs UV + Python 3.14; install via uvx, run fc0-init, edit .env, launch the proxy. Per-model mapping lets Claude's Opus/Sonnet/Haiku tiers route to different providers; rolling-window throttling and 429 backoff smooth free-tier limits. Includes Discord/Telegram bot mode for remote sessions with voice-note transcription.[23]AICodeKing — free Claude Code proxy
The catch: you're not using real Claude — quality depends entirely on the backend model. Free-tier providers are rate-limited; local models need adequate hardware; OpenRouter's free model list is unstable. If exposed on a network, set ANTHROPIC_AUTH_TOKEN or anyone can use the proxy.
It is not just free claude. It is not real cloud for free. The real value is choice.
Github Awesome's HN Show #5 cycles through 34 new projects in 14 minutes. The dominant theme is the agent-era tooling stack: isolated microVM sandboxes, credential brokers, MCP-aware knowledge bases, AI-aware web frameworks, encrypted vector DBs, and memory systems modeled on the Ebbinghaus forgetting curve.[24]Github Awesome — HN Show #5
/plain optimize, /plain upgrade).Other entries cover panic-lock-style Touch ID disablers, Linux keyboard backlight notifications, Lean 4 polynomial inequality tactics via CVXPY, a transformer trained in HyperTalk on a 1989 Mac SE/30, a privacy-preserving encrypted vector DB (Xtrace), a sub-second microVM (SmallVM), context-engineering org-knowledge enforcement, and a CLI to export the macOS SF Symbols set as bit-perfect SVG/PDF/PNG. Full set: 34 projects.
Two complementary signals on how agent-era teaching is evolving. Matt Pocock's agent-skills repo takes a minimalist approach — small, single-purpose skill files for TDD, GitHub triage, and vertical-slice architecture, plus a language.md that defines architectural terms and explicitly forbids synonym substitution.[25]Github Awesome — Matt Pocock skills Google and Kaggle relaunch their free 5-day GenAI Intensive Course (June 15–19, 2026) with a new "vibe coding" framing — natural language as the primary programming interface — after the November 2025 first run reached 1.5M learners.[26]Google — Kaggle Vibe Coding course
The repo prioritizes small, sharp, single-purpose skill files over large catch-all prompt files. Core skill areas: TDD workflow, GitHub issue triage, and breaking architecture into vertical slices. language.md acts as a controlled vocabulary — defines architectural terms and explicitly forbids synonym substitution, keeping the AI behaving as a precision tool rather than a free-ranging chatbot.[25]Github Awesome — Matt Pocock skills
Free, 5-day, fully online, June 15–19, 2026. Conceptual deep dives + hands-on coding + capstone project. Goal: design, build, and deploy production-ready AI agent systems ("10x agents"). Led by Anant Nawalgaria (Group PM, founder of the GenAI Intensive program) and Frank Guan (PM Head for AI Agents) at Google. Registration on Kaggle.[26]Google — Kaggle Vibe Coding course
China's National Development and Reform Commission ordered Meta to unwind its $2 billion acquisition of Manus, a Chinese-founded AI agent startup that had relocated to Singapore in summer 2025. Manus CEO Xiao Hong and Chief Scientist Ji Yichao were summoned to Beijing in March and barred from leaving the country. The decision signals that relocating to Singapore — a popular "China-shedding" strategy — no longer shields Chinese-founded startups from Beijing's oversight.[27]Tech Brew — China blocks Manus
Meta announced the Manus acquisition in December 2025. China's Ministry of Commerce opened a probe in January 2026 over potential violations of export-control and cross-border tech-transfer regulations; by March, the founders were under regulatory review. Meta's response: the deal "complied fully with applicable law." Whether Beijing can enforce the order against a company no longer headquartered in China remains legally unclear.[27]Tech Brew — China blocks Manus
The blockade fits broader US–China AI escalation: White House accusations of "industrial-scale" AI model theft via distillation, House bills tightening chip export controls, and Beijing's new requirement that government approve before its tech firms accept US investment. Separately, Tech Brew flagged a Meta deal to harness solar power beamed from space at night for AI data centers (no terms or partner disclosed).
Intel shares surged 23.6% — their best single day since October 29, 1987 — after Q1 earnings beat expectations and Q2 guidance came in at $13.8–$14.8B vs. $13B consensus, under CEO Lip-Bu Tan. The Philadelphia Semiconductor Index (SOXX) closed up for an 18th consecutive day, with Arm, AMD, Qualcomm, and Nvidia all participating.[4]Sherwood Snacks — Chips Ahoy!
Other items in the same newsletter:
Three short rants from engineering veterans. The original author of Airflow XComs (now at Prefect) admits the feature was a weekend metadata hack people pushed gigabytes through, and the new Airflow major version is still built on it.[28]Prefect — XComs were a hack Martin Kleppmann argues a study split junior engineers by AI use — the AI group produced output faster but learned little to nothing.[29]Pragmatic Eng — Kleppmann Real Python defends clean code as a managed spectrum, not a binary.[30]Real Python — Clean Code
The Prefect speaker designed XComs (cross-communication) as a lightweight way to pass small metadata — like file names — between Airflow tasks. Users started pushing dataframes and gigabytes through them anyway, and by then the pattern was entrenched. As Airflow prepares a new major version, it's still built on this same weekend-prototype protocol. The contrast: Prefect was designed from the start to handle data natively.[28]Prefect — XComs were a hack
People would start moving dataframes, gigabytes of data from one task to another — but by then it was sort of, you know, the horse was out of the barn.
"We ask them to write essays because we want them to go through a thought process which helps them learn something." When AI short-circuits the process, the artifact looks correct but the learning is gone. The cited study on junior engineers found the AI group produced work faster but with little to no learning gain. The pedagogical implication: the right level of friction is what creates durable understanding; assessments and workflows need redesigning so students still do the cognitive work.[29]Pragmatic Eng — Kleppmann
Real Python pushes back on "nobody cares about your clean code": clean code and technical debt exist on a spectrum that requires active management. Businesses pressure devs to ship fast, but unchecked dirty code accumulates debt that eventually kills velocity. Pragmatism (ship now) vs. purism (do it right) is a balance, not a religion — neither abandon maintainability nor gold-plate for reuse that never happens.[30]Real Python — Clean Code
Short clips and demos worth a minute each.
Better Stack pitches TanStack Start as a client-first alternative to Next.js's server-first React Server Components. Instead of sprinkling use client to opt into the client, you fetch a server component the same way you fetch JSON — through a server function via a normal route loader. Three new functions to learn; "composite components" (slot-style) prevent the common Next.js pitfall of server components wrapping client components.[31]Better Stack — TanStack Start
Better Stack's pitch: zero-code-change instrumentation via eBPF and OpenTelemetry, 80x compression on stored telemetry, and an AI co-pilot for incident resolution.[32]Better Stack — Pitch Bonus history short: in 1988, Cornell grad student Robert Morris created the first internet worm to count connected machines; with no skip-already-infected logic, it reinfected hosts indefinitely and crashed the early internet — making Morris the first person convicted under the Computer Fraud and Abuse Act.[33]Better Stack — Morris Worm
Arjay walks through six DB types and when to reach for each: relational (Postgres/MySQL/SQLite — default), key-value (Redis/DynamoDB — fast lookups, hot-key risk), wide column (Cassandra/Bigtable — heavy writes), object storage (S3 — files), vector (Pinecone/pgvector — RAG semantic search), and graph (Neo4j — only when relationships are the entire product).[34]Arjay — 6 databases Companion short on slow writes: write-behind caches respond after writing to cache only and sync to DB asynchronously — lower latency in exchange for non-zero data-loss risk if the cache or background job fails.[35]Arjay — Slow writes
marimo's "Eat Your Own Tokens" video walks through a recent paper claim — fine-tuning an LLM on its own unlabeled outputs (no RL, no labels) can still improve performance — and demonstrates the underlying intuition with a synthetic noisy-label classification experiment in a marimo notebook. Even with 30% true labels (70% noise), the model reaches ~70% accuracy on clean test data; with 10% signal it still hits ~70% given enough volume. The principle: random noise cancels at scale, leaving only signal as a learnable gradient.[36]marimo — self-distillation paradox Separately, marimo released a nested-table widget via the Wiggly Stuff library, with reactive cell updates on selection.[37]marimo — Nested tables
Brief promotional clip showing a legal professional using ChatGPT to clear a case-file backlog — a use case ad rather than substantive content.[38]OpenAI — Ritu Case Files