May 16, 2026
Simon Willison ran a custom Python tool over the OpenClaw README's Git history and discovered the project has cycled through six names in roughly six months — Warelay, CLAWDIS, CLAWDBOT, Clawdbot, Moltbot, and finally OpenClaw[1]Simon Willison: Warelay -> OpenClaw.
The script (first_line_history.py, built for a PyCon US 2026 lightning talk) walks every change to the README's H1. The name progression maps to the project's drift in scope: started as "Warelay — WhatsApp Relay CLI," ended as "OpenClaw — Personal AI Assistant." This is the same project Tejas Kumar, Ben from Zo Computer, and Vincent of the OpenClaw Foundation referenced at AI Engineer Singapore — and Peter (its original creator) is now rebuilding it on the Codex foundation at OpenAI.
Simon Willison quotes Julia Evans making the case that most "CSS is broken" complaints stem from incomplete knowledge rather than flawed design — with Tailwind cited as the exemplar of dodging the underlying skill[2]Simon Willison: Quoting Julia Evans.
Evans' argument: many of the perceived limitations — centering being the canonical example — have already been solved in the language. The friction is that most developers stopped learning CSS once utility frameworks abstracted it away. Implicit point: mastering fundamentals beats endlessly compounding abstractions on top of abstractions.
CSS is hard because it's solving a hard problem!
Google DeepMind is prototyping an AI-powered mouse pointer that uses Gemini to understand what you're pointing at and act inline — point at an image to edit, a table to convert to a chart, a PDF to summarize — without switching apps or detouring into a chatbot window[3]Google DeepMind: Reimagining the mouse pointer for the AI era.
Adrien Baranes and Rob Marchant lay out four interaction principles: maintain workflow across apps, capture visual context automatically, support natural shorthand ("Fix this"), and transform raw pixels into actionable entities (dates, locations, structured data). Demos include scaling recipe ingredients, finding map locations, summarizing PDFs for an email, and turning data tables into pie charts. The underlying argument echoes Connor of Hyperspell at AIE Singapore Day 2: AI capabilities should be ambient and contextual, not gated behind a separate UI surface.
AI capabilities should work across all apps, not force users into 'AI detours' between them.
Nvidia's Jensen Huang told Carnegie Mellon grads to run, not walk, toward AI[4]Morning Brew: Graduation speakers can't stop talking about AI. Magic Johnson told Stillman and Tuskegee grads to grind hard and master it. Delta CEO Ed Bastian got applause at Emory by admitting he drafted his speech with AI, decided it lacked "soul or warmth," and rewrote it by hand. A real estate exec who called AI "the next industrial revolution" got booed at UCF's College of Arts and Humanities.
The vibe gap between technologists and humanities grads is the actual story. Young creatives at UCF told the New York Times that Gloria Caulfield's optimism fell flat because they're worried AI is replacing them. Country singer Eric Church bypassed AI entirely with a viral "six strings" speech at UNC Chapel Hill, using a guitar metaphor to talk faith, family, marriage, community, ambition, resilience, and personal authenticity. Pairs neatly with yesterday's Morning Brew on Gen Z pivoting to the trades.
Day 1 of AI Engineer Singapore — Sherry's 65 Labs grassroots event with OpenAI and ZAI as diamond sponsors — ran ~10 hours and packed in 30+ talks[5]AI Engineer: AIE Singapore Day 1. Singapore's Foreign Minister Vivian Balakrishnan opened by demoing a personal AI agent he built on a Raspberry Pi. Stripe revealed 91% of its engineers ship AI code daily. OpenAI's Tibo said Codex auto-review has cut internal approvals 20× and previewed the post-chat future. Swix closed by announcing Cognition's Singapore Asia HQ and declaring Singapore "the agentic nation in the making."
~40:09 Dr. Vivian Balakrishnan (Singapore Minister for Foreign Affairs). Calls himself a "retired eye surgeon" tinkerer. Built a personal second brain using NanoClaw on a Raspberry Pi with 8GB RAM, communicating via WhatsApp through Baileys, with Nemean for graph-based memory, Ollama for embeddings, Whisper for speech, and Obsidian over iCloud as the UX. Three takeaways: (1) you can outsource computation, memory, replication, and dissemination — but not personal understanding or accountability; (2) value is created at the ground level workflow-by-workflow (cites Neil Lawrence: "tools matter more than models"); (3) "you cannot govern a technology that you have only been briefed on." Hot takes: tokens are subsidized; sympathetic to Yann LeCun on LLMs alone not being the path; Singapore should be at the frontier of deployment at scale, not model development.
You cannot govern a technology that you have only been briefed on.
~70:01 Gabriel Cohen, NanoClaw creator. 30k stars in 3 months. Agents operate in the "red zone" — can be turned into double agents at any moment by prompt injection. Three principles: isolation (containerize), keep credentials outside the agent (proxy through a vault with literal authorization: bearer placeholder swap), and human-in-the-loop approval where the tool call happens inside the agent but execution happens outside. Hot take: "instructions are not for security" — telling the agent "never run drop database" just signals it has done so before.
~181:09 Vran Yukich (Daytona). OWASP and OpenAI both say prompt injection cannot be fully prevented. KO Security found 341 bad skills in OpenClaud marketplace (growing to 800+); Snyk found 13% of skills had serious problems with 76 clearly malicious; an academic paper checked 98k skills and found 157 malicious. A real sandbox keeps secrets outside the agent, restricts network via allow-list proxy, logs everything, and intercepts model calls via a gateway.
~84:12 Tibo (OpenAI Codex lead, remote). Singapore is top-5 globally for Codex adoption. Walked through GPT-5.1-codex-max → 5.5 evolution: end-to-end RL for compaction across context windows, Windows-native, 30% fewer thinking tokens, 1M context at 5.4, computer use and token efficiency at 5.5. 4M+ weekly active Codex users, 99.9% availability. Big new feature: auto review uses a second agent to verify the first against original intent — reduced approval prompts inside OpenAI by 20×. Sea Limited going all-in on Codex; Nvidia rolled out to 45k employees in 2 weeks. OpenAI rebuilt OpenClaw on the Codex foundation with original creator Peter. Roadmap: Chronicle (screen-following memory), massive multi-agent systems, and post-chat interfaces — "chat feels like a default we inherited from LLMs."
~107:31 Dr. Fang Yang (GovTech Singapore). Building a sovereign agentic harness for ~150k officers: MCP gateway, agent identity, memory, observability, and a skills platform. Car analogy: "AI models are like car engines — they need a harness to be truly useful and trustworthy."
~233:59 Mark Doyle (Stripe). Stripe processes ~2% of world GDP. 91% of Stripe engineers merge AI-written code daily; AI PRs up 500% YoY. "Minions" are Stripe's one-shot agents — Slack-native, run on remote 64–128GB devboxes against Stripe's 90GB, 300M-line monorepo. Architecture: analyzer → coder → LLM judge (only original prompt + git diff, no poisoned context) → diagnostic feedback. ~65% one-shot merge rate, 3,000 PRs/week merged. Takeaways: prefer deterministic instructions over "PLEASE PLEASE" prompts; dev tools (Sorbet, linters) are critical infra; build agents in Slack for visibility.
~190:30 Vishant & Rohan (Greptile). Analyzed 5M PRs. 27.6% of April 2026 PRs show strong evidence of being fully vibe-coded and climbing. Some bots (Claude, Codex) have lower revert rates than humans, especially on larger PRs. Bots produce fewer critical bugs on average but differently distributed — Cursor background agents produce more N+1 query errors, Claude agents more missing-tenant-check errors. Devin and Claude get merged in fewer review rounds than humans.
~199:40 Yunong Zhang (Sonar). CRAP benchmark turns human PR reviews into executable tests. Finding: current AI reviewers catch only 41.5% of what humans flag, but AI is strong at robustness/testing and weak at maintainability/design. Recommend layered review with AI first, humans on specific categories.
~485:37 Jackman Eng (Prime Intellect). Recursive Language Models — pass context as a reference like a Jupyter DataFrame, let the model use Python loops and sub-agent delegation instead of autoregressive compaction. Ramp Labs trained a small Qwen RLM that beat Opus 4.6 on Excel retrieval at lower cost and latency.
~161:04 Jimmy Lai (Vercel, Next.js). Next.js went from 4M to 42M downloads/week. 60% of Next.js docs now served as markdown — software is becoming the primary user of software. "Stop chatting with agents" — having 10 chats gives dopamine but bottlenecks you; invest in correct prompts, evals, and safeguards to scale to 100 background agents. On forking frameworks: "You can fork Next.js over a weekend with a bunch of tokens. Just because you can doesn't mean that you should" — you own React Shell-level security responses forever. Closing line: "AI made creation really cheap, but ownership is much more expensive than you think."
~221:47 Max Buckley (xAI). Existential talk pinned to Nov 24 2025 — the day Claude 4.5 Opus released. Proof-of-work signaling is collapsing: a well-written LinkedIn message now signals LLM use, not eloquence; typos signal intentionality. GitHub commits are up 14× YoY (2026 vs 2025), which itself was 4× on 2024. Pre-Nov 2025 assumptions (code is hard, slow, expensive, every feature has opportunity cost) are dead. Now: build all 30 ideas, eval, revert the rest. "The question is no longer can you build it. The question is what should exist."
~211:11 Eugene Cheah (Featherless AI / RWKV). Live-demoed Qwen 2.7B and Gemma 3 1B running on a laptop building a web game with Cline. "Open models on laptops have crossed GPT-4 on coding." The two models he ran already surpass GPT-4. Featherless raised a $120M Series A. "The problem is not the models, it's us. Just build."
~332:03 Rio Liu (Cursor, Head of Design). Bill Atkinson and Alan Kay frame: design and code were once one craft. Cursor 3 was rewritten from scratch — file-centric to agent-native, "black box vs glass" choice landing firmly on glass: every action, artifact, plan visible and editable. Built "Baby Cursor 3" as a prototyping environment in about a month, then rewrote the UI in React. Now they use Cursor to build Cursor; designers go back to code instead of Figma. "No black boxes, stay glass."
~354:13 Ashen (Figma, Figma Weave). Most agentic tools have chat-left/artifact-right — an interface for convergence, not divergence. Demoed a multiplayer canvas where audience members joined to co-edit AI-generated mini-games via prompts. Argues for embodied presence of agents (cursors with body language) and multimodal input.
~398:08 Andrew Tan (Groq Cloud). Custom LPU silicon, 800k active devs, 7× token growth last year, 10 data centers, ~15 inference regions. Global load balancers share queue-time info every 100ms.
~410:56 Daria Soboleva (Cerebras). MoE 101. DeepSeek V3 (671B total / 37B active) showed MoE's promise, but GPU all-to-all comms kills it. Cerebras CS-2: 44GB on-chip SRAM vs B200's 126MB. Their BTA (Batch Tiling on Attention) decouples attention batch from FFN batch — restores theoretical MoE speedup so 671B MoE runs at the speed of 37B dense. "MoE is the fastest way to AGI."
~450:39 Taska (ZAI / GLM). GLM 5.1 close to Opus 4.6, used inside Claude Code, Cursor, Kilo Code, OpenClaud. Hot takes on long-horizon tasks: long horizon ≠ long context (GLM 5.1 has 200k); what matters is depth of meaningful improvements, not duration. Three failure modes on long runs: drift from goal (use checklists), error accumulation (checkpoints with self-verification), and inability to pivot ("models never give up sometimes"). 48-hour hackathon: 7 of 9 winners ran GLM 4.1 overnight while sleeping.
~366:34 Salem (Menlo Research). Open-source humanoid robot K-Scale "Esimov" — 200+ factories worldwide reached out wanting to build it, $1M in preorders in 2 days. Custom robot processing unit runs Asimov's three laws as a functional safety model locally with git-hash burnt to CRC; safety policies as community-decided distributed consensus "like Bitcoin."
~377:34 Alberto (Reactor, ex-Luma AI co-founder, ex-Apple Vision Pro). Demoed real-time world model generation at 30fps controllable from keyboard — including a live-generated video of "Jensen walking through Nvidia HQ." World models = long-term memory + real-time + causal thinking + interactivity = a state machine, not a passive generator.
~388:21 Jan Liphardt (Openmind). Building Android-equivalent open OS for embodied AI. Wiener's Cybernetics framing: first revolution devalued the human arm, second devalues the brain; the ultimate frontier is caregiving. Long-term care residents in the US average 2 minutes of social interaction per day. Eval metric: "smiles, tears, trust, and memories."
~497:51 Michelle Julia (Blue Labs). Three research findings on AI relationships: models cooperate fine in self-interested games but fail at coordination games; Bayesian agents are aggressive (80% max surplus), humans are fair, LLMs are concessionary (accept every deal); ACL paper on state-vs-trait — who you are right now matters more than your generic personality.
~577:48 Swix / Sean Wang (Cognition advisor, AI Engineer founder, Singaporean). AIE now serves 1.5M unique devs/month with 9k live-streaming. Cognition made three non-obvious bets: choose code (king modality), bridge sync and async, focus on enterprise. Spicy Singapore APAC reveals: local bank with 2 million lines of undocumented COBOL with no engineers owning it; $100M AI budgets; 600-developer rollouts; loans run on spreadsheets by transient business analysts. Cognition is making Singapore its Asia HQ and acquired Havana. Closing argument: Singapore has a 4× demand-to-supply gap for AI engineering talent growing 40% YoY. "I've given up on the government… it is only when we the people of Singapore decide to take matters into our own hands."
Vincent of marimo and Isaac Flath demo Marimo Pair — a CLI coding agent (open code with Kimi K2.5) driving a hosted Marimo notebook in a cloud sandbox[6]marimo: Python Variables: Now with 100% Agent and RLMs - with Isaac Flath. They build algorithmic art, generate ad-hoc anywidget canvas widgets on the fly, summarize Winnie-the-Pooh across two parallel agents, and explore a WoW bot-detection dataset. The conversation widens into agentic coding hygiene: notebooks counter the "ivory tower" drift of vibe coding, RLMs (Recursive Language Models) use a live Python REPL as durable agent memory, and a worse/faster model keeps you honest.
~00:00 Marimo Pair demo. A local open code CLI running Kimi K2.5 connects via uvx marimo to a remote notebook. Agent reads live notebook context (all globals, including a slider widget) and writes code into a scratchpad before pasting cells in. Reads slider value (7), moves it to 2 on command. npx skills installs the Marimo Pair skill; the pair skill is already outpacing the older notebook skill in usage.
~07:04 Sandbox caveats. Open code itself still touches the local FS to read skill files; for stricter isolation use the pi agent with whitelisted read/write. When Vincent uploads a CSV to the cloud notebook, the agent initially hallucinates against a local path before realizing the file lives server-side.
~14:08 Algorithmic art + Exa-powered library discovery. Agent builds a Clifford attractor scatter with two sliders. Then Vincent asks for a 2D slider from wiggly-stuff — a niche library Kimi K2.5 has never seen. Open code's web fetch (powered by Exa) finds the LLM-friendly markdown docs, installs it, and wires it up. Hot take: "A lot of people are sleeping on the fact that these widgets can be generated on the fly now."
~24:16 Custom anywidgets on demand. Vincent prompts the agent to build a black-and-white canvas drawing widget exposing the drawing as a PIL Image. Kimi writes the JS, installs deps, produces a working widget. For bespoke analyses where existing widget Lego bricks don't fit, the model can manufacture the brick itself in seconds.
~30:21 RLMs: why a live REPL beats one-shot scripts. Demo: load Winnie-the-Pooh from Project Gutenberg, split into 11 chapters via regex, then spawn a second agent in parallel that summarizes each chapter into a dict. The second agent's context window stays at ~20k while the first is at ~70k — concrete offloading. Vincent's reframing: "You don't have to imagine what the pipeline steps are. You just read from memory."
~45:33 When the agent runs too far ahead. On a WoW bot-detection dataset the agent goes wild producing charts. Isaac's framing of the failure mode: "It's really easy to go from very careful architected stuff to find out an hour later you're actually just kind of devolved into vibe coding." Vincent's life hack: switch the notebook to column layout and give each column a separate exploration theme (sessionization, ML-based bots, rule-based heuristics, guild effects).
~54:46 Hot takes on harnesses. Isaac uses Codex day-to-day, Pi for experimentation, reads PI harness source when learning techniques. Strong opinion: "If things are not going well for you, I don't think your problem is that you weren't using the right CLI harness… it's not that you're using the wrong hammer, it's you're holding it wrong." Vincent counter: deliberately uses a worse model like Kimi K2.5 because Claude Code and Codex are so good they tempt blind trust. Pay-per-token also disciplines him out of "token maxing."
~59:48 Learning vs shipping. Isaac: "You can listen to someone speak Spanish for an hour, but it doesn't mean that you're going to be speaking it afterwards." For a commercial system, know enough to vet — and please learn auth/CSRF basics before letting an agent touch them. Karpathy quote referenced: "An LLM might be able to do the thinking but I want to be able to do the understanding."
Marlene Mhangami (Microsoft / GitHub) argues AI productivity gains hinge on clean codebases[7]AI Engineer: Beyond Code Coverage — Functionality Testing with Playwright. Cites the Stanford study of 120,000 developers from last year's AIE: under unchecked AI in messy codebases, PR throughput rose but rework consumed the gains — net ~1% effective improvement. The fix: behavior-focused TDD with Playwright and coding agents that test real functionality, not implementation details.
~00:15 GitHub Octoverse: 2025 saw ~1B commits (a record); 2026 is on pace for ~14B, with a growing share co-authored by agents. ~02:16 Stanford study finding: clean codebases amplify AI; messy codebases amplify entropy.
~04:18 Red/green/refactor in Simon Willison's flavor: write a failing test for a new feature, get it to pass quickly, then refactor. ~06:19 2014's DHH "TDD is dead" critique still applies: overindexing on unit tests means renaming a calculate method breaks tests even when behavior is intact. Test stable contracts (APIs, exported modules) and end-to-end behavior. In the AI era, agents also tend to generate self-affirming unit tests that pass without validating system behavior.
~07:20 Playwright + AI. End-to-end browser testing for Python/TypeScript/C#, headed or headless. Integration options: Playwright MCP server, the CLI, and Playwright agents (planner, generator, healer). ~10:24 Live demo at fictional Tail Spin Toys: GitHub Copilot CLI plus Microsoft's WorkIQ skill pulls the PM email from M365 into the terminal, then implements search bar and category/price filters via red/green TDD — failing tests first, then code, then Playwright drives the browser to verify all hands-off.
~15:33 Best practices. Attach Playwright screenshots to PRs; commit before letting the agent fix tests so it can recover context; one Playwright test per feature; headless mode for background runs. Q&A: responsive testing works across viewports; native Mac/iOS apps not yet supported.
Chris Lovejoy (Notius Labs, formerly Anterior and Tandem) argues that winning in vertical AI is an organizational problem, not a model problem[8]AI Engineer: Chris Lovejoy — How to Leverage Domain Expertise. Three roles for domain experts — Oracle, Evaluator, Architect — with explicit decision criteria for choosing and evolving between them. Gartner says ~50% of generative AI projects were abandoned last year; most failures map to not hiring, mis-hiring, or mis-fitting domain expertise.
~03:09 The three mistakes. Not hiring domain experts (or hiring them too late); hiring the wrong kind; not fitting them into the org properly. Appraising AI quality requires judgment rooted in domain expertise — formal (doctors, lawyers) or informal.
~05:10 Oracle / Evaluator / Architect. Oracle embeds expertise directly — reviewing outputs, tweaking prompts, iterating manually. Evaluator defines metrics and builds quality measurement systems (user metrics, expert panels, LLM-as-judge), collaborates with engineers. Architect designs self-improving systems with minimal human-in-the-loop.
~08:11 Decision guide. Can you measure performance in objective metrics? If no → Oracle. If yes → is manual iteration fast enough? If yes → Evaluator. If no → Architect. Startups should typically start at Oracle and evolve.
~10:11 Granola (Oracle). First employee Joe (writer/journalist) wrote all the prompts and remains the primary quality gatekeeper at billion-dollar scale. Works because there's no objectively perfect meeting note.
~12:11 Tandem (Decentralized Oracle). Started with one doctor reviewing notes; scaled to many doctors each owning specific customer relationships with thousands of prompt variations.
~14:13 Anterior (Oracle → Evaluator → Architect). Lovejoy walked it himself — started Oracle, evolved to Evaluator (review dashboard + clinician panel), then Architect because prior-auth rules vary so widely. Each progression compounded.
~16:15 Three leverage principles. Designate a single principal domain expert who is accountable — avoid consensus-by-committee. Give them ownership and seats at decision-making tables, not advisory-only roles (cites failure mode where two senior clinicians without clear ownership both left within 12–18 months). Hire for breadth so they can evolve through the Oracle-to-Architect progression rather than needing replacement.
Stephen Chin (Neo4j) argues agents today are trapped over siloed enterprise data and lack the context to make grounded decisions[9]AI Engineer: Stephen Chin — Connecting the Dots with Context Graphs. Context graphs — knowledge graphs + vector embeddings + short/long-term/reasoning memory + decision traces — are the structural fix. Gartner added context graphs to the AI hype cycle; Foundation Capital pegged it as a $3T startup opportunity.
~00:14 Agents review PRs and make decisions but operate over disparate, siloed data — Slack threads, customer notes, enterprise systems. The "red pill" alternative is a system of reasoning that connects enterprise data, prior decision traces, and tool-call reasoning into one consolidated view.
~04:16 Graphs vs vanilla RAG. Healthcare example "What was the care plan for Andre Jenkins' emphysema?" Baseline LLM: generic medical advice. Vector RAG: slightly more context, still generic. Graph-grounded: pulls patient history, prior diagnoses, operations — recommends specific interventions like smoking cessation and pulmonary rehab.
~06:17 Three memory layers. Short-term (current pipeline state + conversation). Long-term (entities, processes, customer history across tasks). Reasoning traces (why decisions were made — provenance, compliance, debugging). Graphs are well-suited: relationships are first-class, multi-hop traversal is performant, and graph embeddings (fast RP) + community algorithms (Louvain) give explainable navigation.
~09:17 Demo 1: Lenny's Podcast memory. Open-source Neo4j agent memory package loads Lenny Rachitsky's episodes; one query pulls all locations mentioned in an episode and renders them on a map.
~12:19 Demo 2: Financial services loan approval. Open-source demo models people, organizations, decisions, transactions, approvals, policies, risk factors. Pulls from support tickets, CRM, and business data via 10 MCP tools, generates OpenAI embeddings, populates a Neo4j context graph. Asking whether "Jessica Norris" should be approved surfaces her bank account, margin trades, the Cypher queries used, the graph traversed, a prior rejection, and an auditable recommendation.
David Reich on Dwarkesh: between 100,000 and 50,000 years ago there's a clear cultural explosion — representational art, bead necklaces, cave paintings — but his 2016 genome search found no selective sweep more recent than 400,000–500,000 years ago[10]Dwarkesh Patel: Why did humans suddenly start making art 50,000 years ago? — David Reich. Any biological adaptation was likely polygenic, not a single "art switch."
Reich's lab searched for a locus where all people alive today share a common ancestor more recently than the broader human divergence — exactly the signature you'd expect for a single key mutation behind the cultural revolution. They found nothing more recent than 400,000–500,000 years ago. Reich calls it "a crazy result." Most plausible explanation: many small mutations shifting in the same direction to move the population toward a new cognitive set point, rather than one high-frequency selective sweep.
It could be that there's biological adaptation in this period, but it's polygenic.
Two independent evaluators — XPODW and the UK AI Security Institute — benchmarked Claude Mythos preview against GPT-5.5 and other frontier models on a full attack chain (recon → credential theft → lateral movement → web app exploitation → privilege escalation → C2 persistence → infrastructure compromise → full network takeover)[11]Nate B Jones: Anthropic's Mythos Just Beat OpenAI's GPT-5.5 At Real Hacking. Mythos advanced further on the same token budget than any other model tested — including GPT-5.5, which itself blows past prior cyber benchmarks.
~11:05 The key implication is economic: cheaper vulnerability discovery scales both defenders and attackers. XPODW notes Mythos is strong at source-code audits, native code vulnerability discovery, and reverse engineering, but still needs validation infrastructure (mixed judgment, overstates relevance). Anthropic's Glass Wing and OpenAI's Daybreak are the respective industry responses — early access for trusted defenders.
Mythos winning here is not Mythos beating a weak baseline. It's actually outrunning an extremely strong model on a task where token spend is a metric that matters.
We need to prepare for a world where Mythos-like models will be out and loose by December and we should be hardening up our defenses in the meantime.
Dario Amodei has acknowledged Anthropic planned for 10× growth and got 80×[11]Nate B Jones: AI News & Strategy Daily. In April, Anthropic cut third-party tools (Open Claw and others) off from consuming personal Claude subscriptions, pushing them to per-token API billing. The revised stance now allows some third-party agent use within a monthly cap. Ramp payment data has Anthropic ahead of OpenAI on verified business customers — both approaching ~$30B annualized revenue — but the developer goodwill hit is real.
~05:00 After Claude Code's December 2025 breakout drove explosive agentic adoption, Anthropic has been forced to restrict. Simpler all-you-can-eat messaging is what won Claude early developer loyalty; the "do the math" approach hurts goodwill. OpenAI gains accordingly.
Anthropic is out of computers.
They had planned for 10× growth in a year, and they're over 80×. And those aren't my words, that's actually straight from Dario Amodei.
Simple math wins. And most people don't like to do math — even most developers don't like to do math.
AWS announced AI agents can operate desktop applications inside managed Amazon WorkSpaces with centralized permissions, logging, auditing, screenshots, and metrics[11]Nate B Jones: AI News & Strategy Daily. Targets the vast amount of enterprise work trapped in ERPs, mainframe UIs, internal admin consoles, and legacy tools with no API — insurance, healthcare, finance, procurement, trade settlement.
~17:06 Caveat: desktop automation bypasses clean integration boundaries, so governance is critical. Start read-only or in draft mode — let agents collect, prepare, and draft before granting write access. Pairs with the AWS WorkSpaces / OpenAI computer-use direction.
The 'we have no API' excuse is getting weaker and weaker and weaker by the month.
Do you remember in 2025, AI was terrible at this. We laughed at it. We said it was bad — and it got good. And that is because of the scaling laws.
Interfaze is a new AI model with a hybrid architecture — task-specific encoders feed a transformer orchestrator instead of a monolithic transformer — designed for reliable, deterministic structured outputs[12]Better Stack: The BEST AI Tool for Reliable Deterministic Outputs (Interfaze). A specialized CNN handles vision/OCR; a deep stack handles audio/speech. On the Structured Output Benchmark (SOB, which tests content correctness inside the JSON, not just format validity) Interfaze beats Gemini 3 Flash and GPT-5.4 Mini.
~02:01 SOB goes beyond "valid JSON" to checking the values are right. Examples: data from complex charts, multilingual transcription. ~03:02 Guardrails are configurable per use case rather than binary on/off — avoids over-refusals. $1.50/M tokens, $20 free credits.
~05:03 OCR stress test. Better Stack ran Interfaze on declassified Pentagon UFO documents — including pages with white text on black backgrounds and handwritten notes. Returns bounding boxes and confidence scores. Results were mixed on the most degraded pages but pulled coherent handwritten phrases ("thought it was a balloon," "gradually ascending following a path," "similar to the trajectory of a bullet"). "I think this OCR did a better job than I as a human."
xAI shipped Grok Build, a terminal CLI coding agent with TUI, headless mode, sub-agents, plan mode, and ACP plugin support — a direct competitor to Claude Code, Codex CLI, and Gemini CLI[13]AICodeKing: Grok Build + FULLY Free Unlimited APIs. The unlock: custom OpenAI-compatible model providers via ~/.grok/config.toml let users swap in GLM Coding Plan, Nvidia NIM, or OpenRouter free models instead of the gated Super Grok subscription.
~01:03 Install: curl -fssl x.ai/cli/install.sh | bash. Auth via browser; headless via GROK_CODE_XAI_API_KEY. Compatible with AGENTS.md and Claude Code instruction files. Reads Claude Code skills from .grok/skills or ~/.grok/skills. Plugins for skills, agents, hooks, MCP, and LSP.
~03:04 Plan mode blocks write tools until the agent produces a plan the user can review or rewrite. Always-approve via --always-approve or in config. Sub-agents spin parallel child sessions for concurrent investigation (e.g., checking deploys + ranking slow endpoints + pulling query plans).
~07:05 Custom model setup. Define a model block in config.toml (internal name, model ID, base URL, env var for API key). GLM Coding Plan (Zhipu) — GLM-5.1 / GLM-5 Turbo at api.zi.ai/api/paas/v4; coding-plan subscribers must use the separate coding endpoint. Nvidia NIM at integrate.api.nvidia.com/v1 — Qwen-3-72B-A35B-Instruct recommended for coding, NeMo Tron for reasoning; free dev/test access. OpenRouter free — :free suffix on model IDs, 50 free reqs/day without credits, 1,000/day after $10 deposit.
~13:11 Tiered strategy: GLM Coding Plan as cheap daily driver; Nvidia NIM for strong open models; OpenRouter free for benchmarking; Grok or premium only for hard debugging or architecture.
Artem builds a personal morning dashboard inside Obsidian using HTML/CSS/JS embedded via the DataView plugin, then has Claude read that dashboard as dynamic context every morning[16]Artem Zhutov: Turn Obsidian Into Your Claude Code Command Center. The result is a living command center that stays in sync with goals, experiments, sleep/energy data, and daily notes — eliminating the need to re-explain context each session.
~01:30 Plain markdown felt too limiting; HTML supports tables, custom components, images, rich layouts.
~03:04 Solution: embed a JS component file inside an Obsidian note. The component reads live DataView queries — edits to underlying daily notes update the dashboard in real time. Required: enable "Inline JavaScript Queries" and "JavaScript Queries" in DataView.
~05:06 For Claude: ask it to read the dashboard's JS file. Claude sees the same DataView-rendered tables you see, aligning both on the same information. Package as a Claude skill for one-shot context loading each day.
~06:40 Live prototype: Claude generates a new Obsidian note with injected CSS and a DataView JS block rendering a goals base — built in under 5 minutes. Goals marked done disappear automatically, keeping Claude's context fresh without manual updates.
OpenAI CFO Sarah Friar said an AI tool (referred to as "Kodak" in the clip — likely Claude) was instrumental in managing what's described as the largest private fundraise in history — a fully non-technical use case proving the binding constraint is creativity, not capability[17]Nate B Jones: Sarah Friar on the Tool That Scaled Her Fundraise.
It's a question of how creative are you? Because it's not capable of everything, but like almost everything.
Suno CEO Mikey Schulman tells Sequoia that 90% of Suno users create music on any given day — a dramatic flip from the pre-AI world where almost everyone was a pure consumer[18]Sequoia Capital: 'Creation is actually the entertaining bit.' Suno's Mikey Schulman. People aren't making music to share or perform — they're making it for the intrinsic satisfaction of being creative. Creation itself is the entertainment.
Y Combinator's India event drew over 25,000 applications, celebrating the next generation of Indian founders shaping the country's AI ecosystem[19]Y Combinator: Thanks for a great time, India. A signal worth tracking against AIE Singapore's "agentic nation" framing.
Quick Real Python clip: keep "how would I test this?" in mind while writing code to avoid the double-refactor trap — where you have to refactor the code just to enable writing tests, before you can do the actual intended refactor[20]Real Python: Write Python Code That's Testable.
There's nothing like having to maintain someone else's code to teach you these lessons.