April 30, 2026
GitHub's reliability has cratered to 86.75% real uptime according to independent trackers, with daily outages, ~2,800 merged PRs silently reverted in a single incident, and an RCE vulnerability that could access millions of repos.[1]Theo - t3.gg Mitchell Hashimoto — GitHub user #1299, creator of Vagrant and Terraform — announced he's moving his 50K-star Ghostty project off the platform after 18 years.[2]Fireship Meanwhile, a name-squatter on the tanstack NPM package went unaddressed for months despite creator Tanner Linsley's reports, and the squatted package is now shipping .env-stealing malware.[1]Theo - t3.gg
~02:00 Theo walks through an independent uptime tracker showing GitHub at 86.75% — meaning roughly 3+ hours of downtime per day. A catastrophic merge queue incident silently reverted ~2,800 PRs that had already been merged, with no notification to affected developers. GitHub COO Kyle's public response used inflated denominators to make 0.07% sound small, with zero apology.
"I would fire Kyle over this response. It's the most pathetic, disingenuous, minimizing non-apology I've ever seen from a company at this scale."
~10:00 GitHub has no CEO — the last one left and was never replaced. The platform reports to a Microsoft VP who simultaneously runs Azure DevOps and Copilot and sits on Atlassian's board. Product and engineering are completely siloed with no shared leadership. Theo argues this structural problem is the root cause of every reliability issue.
~03:00 A Wiz researcher discovered a remote code execution vulnerability where unsanitized git push options could break out of internal headers and execute arbitrary code. Fireship also covers a botnet attack on GitHub's code search and a broader pattern of AI agents hammering the infrastructure.[2]Fireship
Mitchell Hashimoto's emotional blog post — where he documented daily outages in a personal journal — signals a tipping point. When the creator of some of the most important infrastructure tools in the industry publicly breaks up with your platform, the crisis is real. A related kernel exploit (CVE-2026-31431) affecting every Linux distro since 2017 also dropped today, adding to the infrastructure security anxiety.[3]Better Stack — Linux Exploit
Four Magnificent Seven companies — Alphabet, Amazon, Meta, and Microsoft — all beat earnings expectations, but the headline is AI capex: Microsoft announced $190B for the year, Meta raised guidance to $125–145B, and Alphabet's capex jumped to $35.7B in a single quarter.[4]Morning Brew Google Cloud hit $20B revenue (vs. $18.4B expected), and Alphabet stock rose 7% after-hours on what Sundar Pichai called its strongest consumer AI quarter ever.[5]Sherwood Snacks
Profit jumped 81%, driven by AI investments. Google Cloud's $20B quarter crushed the $18.4B estimate on surging Gemini and AI infrastructure adoption. Alphabet also booked $37.7B in equity gains from stakes in Anthropic and SpaceX.[5]Sherwood Snacks
Beat on revenue (ads up 33% to $55B) but stock fell ~6% after-hours. Investors balked at Meta raising capex guidance to $125–145B — above the $122.6B consensus. When asked about the spending plan, Zuckerberg said he doesn't have "a very precise plan," only "a sense of the shape." Meta also warned that its youth safety lawsuits "may ultimately result in a material loss."[4]Morning Brew
Microsoft cited persistent high-priced memory while announcing $190B yearly capex. Copilot added 5M paid users (total: 20M), but investors worry about demand. Amazon's AWS hit $37.6B (beat $36.7B estimate) with 28% YoY growth. Advertising also beat at $17.24B.[4]Morning Brew
The Fed held rates steady at 3.5–3.75% with four dissenters — the most since 1992. Jerome Powell held his final press conference as Fed chair, saying inflation from the Iran war "hasn't even peaked yet." Powell will remain as a governor while under investigation, becoming the first chair since 1948 to vacate the throne but stay on the board.[6]Morning Brew — Powell April is on track to be tech stocks' best month since the start of the Covid pandemic, with the Nasdaq up 14%.[4]Morning Brew
Venture-subsidized flat-fee AI pricing is giving way to usage-based billing as agentic token consumption explodes. GitHub Copilot announced model multipliers revealing ~6x price hikes for frontier models, Anthropic is rationing compute and pushing Claude Code users to API billing, and Goldman Sachs reports AI inference costs approaching 10% of headcount at some companies.[7]AI Daily Brief
~05:00 Anthropic is showing signs of severe infrastructure strain: outages, metered compute, withholding their largest model, accidental API charges, and org-level bans. The host argues this is a direct consequence of agentic usage — Claude Code sessions consume orders of magnitude more tokens than chat conversations.
~08:00 GitHub announced consumption-based billing for Copilot with model multipliers. Using frontier models for coding will cost roughly 6x the base rate. The flat $10/month era is effectively over for power users.
~15:00 Goldman Sachs reports inference costs nearing 10% of employee headcount costs at some organizations. This challenges the assumption that AI will be radically cheaper than human labor and suggests the transition will be slower and more expensive than bulls projected.
~20:00 Practical recommendations: (1) audit AI spending leaks, (2) run cheap model bake-offs to find the cheapest model that meets quality thresholds, (3) appoint a "model sommelier" role, (4) build escape hatch architectures to swap providers, (5) create AI cost scoreboards for visibility.
Three trillion-parameter open-weights models — Kimi K2.6, MiMo V2.5 Pro, and DeepSeek V4 Pro — now score 52–54 on the Artificial Analysis Intelligence Index, within striking distance of proprietary leaders GPT-5.5 (60), Gemini 3.1 Pro (57), and Claude Opus 4.7 (57).[8]Artificial Analysis — Open Weights A year ago the gap was 22 vs. 35. Separately, xAI launched Grok 4.3 with a 37.5% input price cut and 321-point ELO gain on agentic benchmarks.[9]Artificial Analysis — Grok 4.3
All three leaders use trillion-parameter MoE architectures with permissive licenses: Kimi K2.6 (1T total / 32B active, 256K context), MiMo V2.5 Pro (1T / 42B active, 1M context), DeepSeek V4 Pro (1.6T / 49B active, 1M context). All originate from Chinese AI labs — in fact, all top-10 open-weights models are now Chinese. Open weights dominate the price-performance Pareto frontier: 9 of 13 frontier models are open-source, offering comparable performance at half to one-sixth the cost.[8]Artificial Analysis — Open Weights
Significant gaps remain in specialized tasks: HLE reasoning (open 34–36% vs. GPT-5.5 44%), CritPt physics (open 4–12% vs. GPT-5.5 27%), and TerminalBench Hard coding (open 43–46% vs. GPT-5.5 61%). Hallucination is also worse — DeepSeek V4 Pro scores -10 on Omniscience vs. +20 to +33 for proprietary competitors.[8]Artificial Analysis — Open Weights
xAI's new model scores 53 on the Intelligence Index with dramatic price cuts (input -37.5%, output -58.3%). The standout improvement is agentic task performance: GDPval-AA jumped 321 ELO points. It trails GPT-5.5 by 276 ELO on the same benchmark.[9]Artificial Analysis — Grok 4.3
Google launched TPU 8T (training) and 8I (inference) chips. Per-chip they lag Nvidia, but the 4x4x4 cube topology scales to a 9,600-chip super pod delivering 121 exaflops FP4 — exceeding Nvidia's Rubin pod — with the Virgo network extending to 1M chips.[10]Caleb Writes Code On the model side, Gemma 4 Coder now powers an official offline coding app for Mac[11]AICodeKing and Gemini 3.1 Pro topped benchmarks.[12]Nate B Jones
~01:00 The TPU 8 super pod uses a cube topology (4x4x4) that connects 9,600 chips without the switch overhead of Nvidia's NVLink/InfiniBand setup. Anthropic and Meta both have multi-billion dollar TPU deals, making Google's custom silicon an increasingly serious alternative to Nvidia's dominance.[10]Caleb Writes Code
~01:00 Gemma Chat is an open-source Electron app for Apple Silicon Macs running Gemma 4 entirely locally via MLX. It offers a full offline coding experience with file generation, live preview, an agent loop, and local Whisper voice input — no API keys or cloud required. Built by a Google AI Studio employee.[11]AICodeKing
Nate B Jones argues that employees stuck with underperforming corporate AI defaults (primarily Copilot) absorb invisible productivity losses in 30-minute chunks that never appear as a company line item. The fix isn't "replace the default" — it's identifying the specific jobs where it fails and requesting a specialist tool only for those. He estimates 80%+ of traditional orgs operate on an interchangeability assumption that is increasingly wrong.[13]Nate B Jones
~06:00 Pick one recurring 30-minute job, run it through both the corporate default and a challenger tool with identical inputs, track time/rework/quality/audience over a week. This produces real evidence rather than vendor-demo metrics. Then translate the ask by org level: IC asks for a seat, manager asks for a pilot, director commissions a structured test, exec recognizes retention risk.
~15:00 For engineering teams, Claude and ChatGPT/Codex are the standout choices in 2026 — both ship fast and have strong harnesses around their models. Gemini has a capable model but lacks the surrounding tooling. Shipping cadence and surrounding harness matter as much as raw model capability. The broader point: Snap's CEO noted that "software is not a moat" — ecosystems and platforms are.[14]Lenny's Podcast
A cluster of videos dropped offering deep dives into Claude Code best practices. Matt Pocock shared five daily-driver skills — "Grill Me" (forces the agent to interview you before coding), "Write a PRD," "PRD to Issues" (vertical slicing), "TDD" (red-green-refactor loop), and "Improve Architecture" (identifies shallow modules).[15]Matt Pocock — 5 Skills He also argued strongly against running /init, calling auto-generated CLAUDE.md files a context budget trap.[16]Matt Pocock — Never /init
~01:00 The argument: /init generates bloated CLAUDE.md files that waste context, go stale quickly, and hurt agent performance. An LLM can hold roughly 500 meaningful instructions — every line in CLAUDE.md is a tradeoff. Rare or narrow guidance belongs in skills (progressive disclosure), not the root config. A separate video reviews a new experimental /init that attempts to address these concerns.[17]Matt Pocock — /init Review
~01:00 AI accelerates software entropy, turning codebases into "balls of mud" faster than ever. The "Improve Codebase Architecture" skill defines a shared vocabulary (modules, interfaces, seams, adapters, depth, locality, leverage), identifies shallow modules, then uses parallel sub-agents to design multiple radically different interfaces before recommending the strongest. Live demo on a 1,500-commit codebase identifies 6 deepening opportunities.[18]Matt Pocock — De-Slop
Companion explainers cover context window fundamentals (the "lost in the middle" problem, /clear vs /compact)[19]Matt Pocock — Context Windows and the agent vs. workflow distinction (agents are LLM-driven loops; workflows are predetermined code paths — most real systems mix both).[20]Matt Pocock — Agents
Real Python covered the "Ralph Loop" — a pattern for making Claude Code persist on tasks until completion rather than stopping at first failure. OpenAI's Codex CLI also adopted this concept with their /goal command.[21]Real Python — Ralph Loop
The Zig programming language maintains one of the strictest anti-LLM policies in open source: no AI-generated content in issues, PRs, or bug tracker comments. The rationale isn't productivity — it's relationship economics. Loris Cro (VP of Community, Zig Foundation) argues that maintainers should "bet on the contributor, not on the contents of their first PR."[22]Simon Willison — Zig Policy Separately, Zig creator Andrew Kelley claims LLM code is detectable: "It's like when a smoker walks into the room — everybody who doesn't smoke instantly knows it."[23]Simon Willison — Andrew Kelley
The Zig project's reasoning goes deeper than code quality. They invest significant review effort in mentoring new contributors, viewing each PR as an opportunity to grow the community. If a PR is primarily LLM-authored, the reviewer gains nothing — they could have used their own AI tools to solve the problem directly. The project cares about growing trusted, long-term contributors, not maximizing code throughput.
"We try our best to help new contributors to get their work in — because it's strategically beneficial, not merely ethical."
Notably, Bun — a JavaScript runtime written in Zig that was acquired by Anthropic — maintains its own Zig fork, creating an interesting tension between the ecosystem's anti-AI stance and one of its most prominent downstream users being an AI company.
OpenAI launched Advanced Account Security — an opt-in hardened mode for ChatGPT/Codex requiring passkeys or hardware security keys, disabling password login and email/SMS recovery, shortening sessions, and automatically excluding conversations from training.[24]OpenAI Codex CLI 0.128.0 shipped a /goal command that keeps the agent looping until completion or token budget exhaustion.[25]Simon Willison — Codex /goal The UK AI Security Institute evaluated GPT-5.5's cyber capabilities and found it comparable to Claude Mythos — but unlike Mythos, GPT-5.5 is generally available now.[26]Simon Willison — GPT-5.5 Cyber
Designed for journalists, officials, dissidents, and security-conscious users. Requires FIDO-compliant passkeys or physical security keys (partnered with Yubico for discounted YubiKey bundles). If you lose all recovery methods, OpenAI Support cannot help — that's the tradeoff. Members of Trusted Access for Cyber accessing permissive models must enable it by June 1, 2026.[24]OpenAI
An OpenAI case study shows Virgin Atlantic using Codex for database migrations, code refactoring, and test coverage. VP of Data Richard Masters claims 78–80% codebase reduction and says "things don't get delayed when we're using Codex."[27]OpenAI — Virgin Atlantic
Anthropic analyzed 1 million claude.ai conversations and found ~38,000 guidance-seeking interactions across nine domains. Over 75% concentrated in four areas: health/wellness (27%), professional/career (26%), relationships (12%), and personal finance (11%). Overall sycophancy rate was 9%, but spirituality (38%) and relationships (25%) showed dramatically higher rates.[28]Anthropic Research
Relationship guidance was the domain where users pushed back against Claude most frequently (21% of conversations). The research team used Claude Sonnet 4.5 as an automated classifier to identify sycophantic behavior — excessive agreement rather than honest feedback.
To combat sycophancy, Anthropic created synthetic relationship guidance training data and tested improvements using "stress-testing" — prefilling models with previous sycophantic conversations. The result: Opus 4.7 and Mythos Preview show approximately 50% reduction in relationship-domain sycophancy compared to Opus 4.6, with improvements generalizing across all domains.
The bigger question the paper raises: what constitutes "good" AI guidance, and how should safety work in high-stakes domains like legal, medical, and financial advice?
Netflix open-sourced VOID, a video AI framework that doesn't just erase actors from scenes — it rewrites the causal effects of their presence. Remove a person sitting on a couch and the cushion un-dents; remove someone holding a door and it closes. The model uses a two-pass architecture with SAM 3 for segmentation.[29]Better Stack — Netflix VOID
~02:00 VOID processes video in two passes: first identifying all objects and their interactions via SAM 3, then generating replacement frames that account for physical causality. The La La Land demo was near-flawless. Requires H100 GPU + SAM 3 + Gemini API. The implications for post-production are significant — reshoots caused by actor departures or contractual disputes could become optional.
Matt Pocock released Sand Castle (ai-hero/sand-castle), a TypeScript library for running Claude Code or Codex agents fully AFK in Docker sandboxes. GitHub Issues serve as a backlog, with a planner → parallel implementers → reviewer → merger multi-agent pipeline.[30]Matt Pocock — Sand Castle In adjacent developer tools news, Simon Willison implemented Matt Webb's proposal for RSS/Atom feeds as distribution for vibe-coded apps[31]Simon Willison — RSS Vibe Apps and Cursor published an official cookbook with copy-paste MCP templates for connecting to live databases and AWS infrastructure.[32]Github Awesome — Cursor Cookbook
~02:00 The pipeline: a planner agent reads GitHub Issues and breaks them into tasks, parallel implementer agents work in isolated Docker sandboxes, a reviewer agent checks the output, and a merger agent handles the PR. Agent-agnostic — works with Claude Code or Codex. Designed for running overnight while you're AFK.
Matt Webb observed that "when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent" — shipping tools now resembles posting on blogs rather than launching websites. Willison implemented the idea by adding an Atom feed to his tools page.[31]Simon Willison — RSS Vibe Apps
David Gomes from Cursor explains how they deleted 12–15K lines of TypeScript implementing git worktrees and replaced it with ~200 lines of markdown. The new approach uses two Cursor primitives — skills (markdown instructions) and sub-agents — to achieve nearly the same functionality with dramatically less maintenance burden.[33]AI Engineer — Cursor
~00:14 Git worktrees let Cursor users run multiple agents in parallel on isolated checkouts. The original implementation (shipped with Cursor 2.0 in October 2025) required complex code for worktree management, agent isolation, setup scripts, best-of-N judging, and cleanup.
~04:15 The team realized they could express the entire feature as markdown skill files that instruct agents how to manage worktrees. The deletion PR removed ~15K lines and replaced them with ~200 lines of markdown.
~12:22 The new approach has real downsides: agents can drift out of their assigned worktree during long sessions ("vibes-based" isolation), it feels slower, and discoverability is worse. They're addressing reliability through Braintrust evals and RL training on their Composer model.
"Markdown is basically the new code."
~17:27 Cursor 3.0 will have a more native worktree experience in its agent window, plus non-git parallelization primitives. Server-side prompt iteration means Cursor can update skills without shipping new client versions.
Danilo Campos from PostHog presents six failure modes of autonomous coding agents learned from running the PostHog Wizard at 15K integrations/month: model rot, bad architecture patterns, excessive improvisation, human error in config, security holes, and overengineering with code. His key insight: the wizard is 90% markdown files. "Code is a depreciating asset" — the value lives in well-written prose.[34]AI Engineer — PostHog
~02:15 Model rot: Solved by feeding fresh markdown docs into context on demand rather than relying on training data. ~05:16 Bad architecture: Solved with "model airplanes" — thin reference implementations the agent can follow. ~07:17 Excessive improvisation: Solved by breadcrumbing — feeding instructions one step at a time instead of dumping everything upfront.
~10:18 Human error: Solved by interrogating the agent after every run: "what could we have done better?" ~12:18 Security: Solved by locking .env access to key-presence checks and writes only. ~14:19 Overengineering: The wizard is 90% markdown, 8% markdown delivery tools, 2% harness code.
"If you write great prose today and tomorrow an even better model drops, it's going to be able to take that prose and do even more with it."
"An agent is an octopus. It can wriggle. It can squeeze into tight corners. You do not want to overconstrain the agent."
Thor Schaeff and Philipp Schmid from Google DeepMind lead a hands-on workshop building coding agents with the new Gemini Interactions API (replacing generate content) and real-time conversational agents with the Live API. The Interactions API brings server-side state management, 2–3x better cache hit rates, and an interface closer to industry standards.[35]AI Engineer — Google DeepMind
~09:28 Launched in beta in December, the Interactions API is a unified replacement for generate content. It moves away from Google's proto-specific, gRPC-heavy patterns toward something resembling OpenAI's chat completions. Server-side state management means implicit caching improvements — startups using it today see 2–3x better cache rates.
~17:36 The workshop walks through building a coding agent with tool use (read/write files, bash commands) and an agentic loop. Key insight on skills: "The importance when creating skills is it should be either something the model cannot do reliably or if you have some personal preferences on how to do a certain workflow."
~48:16 Thor demos the Gemini 3.1 Flash Live API — a native audio model with real-time WebSocket/WebRTC streaming. The architecture uses ephemeral tokens for security and has production partners including LiveKit. The live demo has the AI singing and beatboxing with the audience.
"What I can guarantee you is that the Gemini 3 Flash has never seen any code of the Interactions API because the model was trained before we even released the API."
Arik Friedman, Senior Principal Data Scientist at Atlassian, joins the Analytics Power Hour to discuss how data professionals can avoid major mistakes by trusting their intuition, understanding the difference between accuracy and precision, and building verification habits. The episode opens with a cautionary tale about a too-good metric at Atlassian that turned out to be a measurement bug.[36]Analytics Power Hour
~03:07 A product team rolled out a UI change and the A/B test showed active users went up significantly. Arik had a nagging feeling the growth curves looked suspicious but found no smoking gun, so he kept quiet. Months later, another analyst discovered the increase was entirely due to a measurement bug — the new UI was double-counting interactions.
~06:08 Arik argues that data professionals should be more opinionated, even without hard evidence. "Our intuition is part of our expert opinion and we should sometimes just go with it." The key practice: always ask "does this make sense?" before presenting results.
~17:15 Citing John Tukey: "It's far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." The example: saying "about 8 million" is accurate even if not precise — and it's far more useful than a precise number answering the wrong question. Co-host Mo provides a counterpoint about contexts where precision signals credibility.
"It's far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise." — John Tukey
~49:44 Recommendations: An Introduction to Statistical Learning (ISL) textbook, Hamel Husain's "Revenge of the Data Scientist" essay, Monarch budgeting app, Cassie Kozyrkov's vibe coding series, and Katie Milkman's cognitive bias checklist.