Gemma 4 ships, Claude Code leaks

AI Models

AI Search

Gemma 4: Google's open multimodal family, phone to 31B

Google released Gemma 4, a four-size open-weights family (E2B, E4B, 24B MoE, 31B dense) under Apache 2.0, built from the same research as Gemini 3 but shrunk to run on phones, Raspberry Pis, and edge devices.^{[1]AI Search — Gemma 4, Claude Code leak, Wan 2.7, Qwen} It's natively multimodal (text, image, audio), ships with 128K context on the small models and 256K on the larger ones, and is trained on 140+ languages. On AI Search's plot of size vs. performance, the 31B dense model sits near Kimi K2.5 Thinking — which is roughly 35x larger.

The four-size lineup

E2B / E4B — 2B and 4B params. For phones, Raspberry Pi, edge. Real-time voice + camera analysis with minimal latency.
24B MoE — mixture of experts, only 3.8B active params per query, optimizing for efficiency.
31B dense — the flagship, best raw quality in the family.

Performance and availability

AI Search highlights Google's own benchmark chart showing Gemma 4 at roughly Kimi K2.5 Thinking parity ~03:10 despite the 35x size gap. Missing from the comparison: GLM 5.1. Models are live on Hugging Face, GGML quants already exist, and Google AI Studio lets you pick the 26B or 31B model from a dropdown for free testing.

You can now run a very performant AI model just offline on your phone or your computer... It's an incredibly capable open model.

Tools: Gemma 4, Hugging Face, Google AI Studio, GGML, Gemini 3

Industry AI Tools

AI Search

Claude Code source leaks — Buddy, Kairos, and “undercover mode”

On March 31, Anthropic shipped an NPM package that accidentally included a source map — unwrapping ~500,000 readable lines of TypeScript across ~2,000 files for Claude Code, their agentic coding framework.^{[1]AI Search — Claude Code leak segment} Copies were mirrored across repos within hours; DMCA takedowns followed, but the internet had already done its thing. The interesting part isn't that it leaked — it's what the leak revealed about unreleased features and the prompt engineering behind the harness.

Three hidden features found inside

Buddy — a Tamagotchi-style virtual pet that sits next to the input box and reacts to what you're coding. Cat Buddy is shown in the leak screenshots ~23:20.
Kairos — an always-on background agent mode. Claude keeps working overnight, consolidates memories (“dreaming”), and reacts to GitHub / Slack events while you're away. Notably similar to OpenCode's persistent-agent pattern.
Undercover mode — a system prompt that forces Claude to strip any Anthropic internal info from commits and PRs, write commit messages like a human, and never reveal model codenames or internal tooling.

You are operating undercover in a public open-source repo. Your commit messages, PR titles, etc. must not contain any Anthropic internal information. Do not blow your cover.

AI Search's wry observation: undercover mode is probably what caused the leak in the first place — it was the mechanism trying to hide unreleased features from public commits, which made those features interesting enough to surface when the source map went public.

Why this matters beyond the gossip

AI Search's read ~25:00: the model itself didn't leak — no weights, no training data. What leaked is the harness. And the harness is increasingly where competitive edge lives: how tasks get broken down, how tools are called, how memory and context get managed. The Theo “Opus jumps from 77% in Claude Code to 93% in Cursor on the same model” point from later in the month is the same phenomenon seen from the outside; this leak is the same phenomenon from the inside.

AI Search cut the segment short to flag a real-world event — “as I record this, I just noticed a black car with black windows parked outside my house... I just noticed a red dot on my forehead right now” — probably bit, probably not, but either way, funny.

Tools: Claude Code, NPM source maps, Buddy, Kairos, undercover mode, OpenCode

AI Models

AI Search

Alibaba's Qwen blitz: 3.5 Omni and 3.6 Plus with a 1M context

Alibaba shipped two major Qwen releases the same week. Qwen 3.5 Omni is a true omnimodal (text, image, audio, video) with benchmarks that edge Gemini 3.1 Pro on some audio-visual tasks.^{[1]AI Search — Qwen 3.5 Omni / 3.6 Plus} Qwen 3.6 Plus gets a 1M-token context window (~700K words) and significantly improved agentic coding — though AI Search flags the comparison charts as cherry-picked: they omit GLM 5.1, Opus 4.6, and Gemini 3.1 Pro.

Qwen 3.5 Omni: video-to-code demos

Two flavors — the higher-quality Plus and a lower-latency Realtime version. The demo that actually shows its omnimodal teeth ~28:55: feed it a video of someone playing Snake, and with no text prompt, it codes a playable clone. Then feed it a follow-up video with mouse-drawn regions on-screen saying “make this area a spring theme, this one summer” — and it edits the existing game correctly.

Same trick for a product-interface demo: draw boxes on a video, narrate what each box should show, and Qwen 3.5 Omni outputs working HTML.

Qwen 3.6 Plus: 1M context + agentic coding

A sibling release rather than a replacement. Key deltas from 3.5:

1M token context window (up from the previous generation's limit).
Major agentic coding jump — AI Search explicitly recommends pairing it with OpenCode or Claude Code.
Still multimodal — feed it a physics video and get a PDF of lecture notes back.
More Qwen 3.6 variants are on deck, including open-sourced ones according to Alibaba's tweets.

Both models are live on QwenChat. The model dropdown at top-left is where you select either release.

Tools: Qwen 3.5 Omni Plus / Realtime, Qwen 3.6 Plus, QwenChat, OpenCode, Claude Code

AI Tools

AI Search

Wan 2.7: Alibaba's video + image update, still behind Seedance

Alibaba dropped Wan 2.7 in two flavors: a video generator with native audio and customizable characters (up to 5 reference images), and Wan 2.7 Image — a unified generator/editor tuned for realistic faces, precise hex-code color control, and text rendering in 12+ languages.^{[1]AI Search — Wan 2.7 video + image} AI Search's honest take: the video quality still trails ByteDance's Seedance 2.0 and Kling 3, so if Wan 2.7 stays closed-source and paid, it's hard to recommend over the competition.

Wan 2.7 Video

Multimodal conditioning (text, image, audio, video as reference inputs), up to 5 character reference images, custom voices, and native audio generation synced to the video. Alibaba claims improvements in fidelity, motion stability, and prompt adherence over 2.2 (the current open-source version). No announcement yet on whether 2.7 will be open-sourced — the previous open-source tier stops at 2.2.

Wan 2.7 Image: the better story

Realistic faces — more bone-structure / eye-shape / age / ethnicity variation than the uncanny-perfect look of Nano Banana and predecessors.
Hex-code color control — up to 8 hex codes in a single prompt, applied accurately. Useful for brand asset generation.
Text rendering — claims up to 3,000 tokens of text in 12+ languages, plus charts and tables.
Multi-image generation — up to 12 consistent images from one prompt, aimed at storyboards, storybooks, and product catalogs.

Both live at wan.video. Select “image” then pick 2.7 or 2.7 Pro.

Tools: Wan 2.7 / 2.7 Pro, wan.video, Seedance 2.0, Kling 3, Nano Banana

AI Tools

AI Search

The voice cloning arms race: LongCat Audio DiT and OmniVoice

Two state-of-the-art voice cloners dropped the same week. Meituan's LongCat Audio DiT (yes, the food-delivery company) ships as a 3.5B parameter model with best-in-class error rate and similarity scores; a smaller 6GB version runs on consumer GPUs.^{[1]AI Search — LongCat + OmniVoice segment} OmniVoice then ups the ante: 600+ languages supported, cross-lingual voice cloning (clone an English voice, have it speak Japanese), and text-prompt voice synthesis (“female elderly British accent” → a voice you've never heard but sounds right). Total install size: ~3GB, works on Apple Silicon.

LongCat Audio DiT

Demos cover fast accent replication — Aussie male (“fair dinkum”) and Indian female — with tone carried across to the generated speech, not just pitch ~12:50. Two model sizes: a 3.5B / 15.3GB flagship and a 6GB efficient variant. Both fit comfortably on a modern consumer GPU.

OmniVoice: cross-lingual + prompt-based

The more interesting capability is decoupling identity from language. AI Search's demo ~35:30 takes a 4-second English clip (“Hey, look, a flying pig”) and has the same voice speak Japanese, Korean, Russian, and French. Separately, OmniVoice supports prompt-only synthesis — no reference audio, just “male high-pitched Indian accent” or “female elderly British.” And inline tags ([laughter], [gasps], [surprise]) are followed faithfully.

It's really good at cloning the exact tone and style of someone's voice.

~3GB install, GitHub-available, runs on both Nvidia and Apple Silicon. Accessibility to this kind of quality-at-this-size is new this quarter.

Tools: LongCat Audio DiT (Meituan), OmniVoice, Hugging Face Spaces

AI Models AI Tools

AI Search

GLM-5V Turbo: ZAI's vision-coding model beats Opus 4.6 on design

ZAI released GLM-5V Turbo, a vision-enabled coding model that accepts images, videos, and documents as reference inputs — sketches, wireframes, or full video recordings of a website get turned into working code, including animations.^{[1]AI Search — GLM-5V Turbo segment} On design-specific benchmarks, it reportedly outperforms both Kimi K2.5 and Claude Opus 4.6 — notable because those are the presumed ceilings for frontier coding in early 2026.

What “vision coding” actually unlocks

Three concrete demos from the release page ~42:10:

Rough hand-sketch + layout notes → fully functional app respecting the sketch's layout.
Multiple reference images dictating style + wireframe → matching styled app.
Full video recording of an existing website → clone with animations preserved.

GLM-5V Turbo is available via API, connects to OpenCode and Claude Code, and lives in the ZAI chat interface under the model dropdown.

This new GLM-5V Turbo even outperforms Kimi K2.5 as well as Claude Opus 4.6 for most of these benchmarks.

Caveat — these are ZAI's own benchmarks and the methodology for design-specific eval isn't as standardized as SWE-bench, so apples-to-apples is a stretch.

Tools: GLM-5V Turbo, ZAI chat, OpenCode, Claude Code

AI Future

AI Search

Open video research drops: Netflix VOID, TokenDial, Hydra, VGG-RPO

Five video/world-model research releases landed the same week, most with code. Netflix — unusually for them — open-sourced VOID, a prompt-based video object deletion model that physically re-renders the scene after removal.^{[1]AI Search — video research roundup} Add TokenDial (slider-based video editing), Hydra (memory-persistent world models), VGG-RPO (3D-aware video diffusion), a generative G-buffer renderer for AAA-game restyling, and Hand X (a fine-grained humanoid hand-motion dataset) — and this single week has more open-video-model releases than most full months.

Netflix VOID

Video Object and Interaction Deletion. Delete an object via prompt and VOID fills in the gap while preserving physical plausibility — remove the ball from a bowling video and the pins stay upright; remove a car and the other car doesn't crash. Two-pass model, ~22GB total, needs a high-end GPU. Code is open.

TokenDial

Video editing via slider controls, not just prompts. “Make the explosion smokier” with a strength slider; “make the campfire bluer”; “make this person older.” Also works for motion intensity — dial up dancing or driving speed. Hugging Face demo + training code + models are promised.

Hydra: hybrid memory for world models

Solves the persistence problem in AI-generated video: pan the camera away, then back, and the character/scene stays consistent. Built on a new 59,000-clip dataset (HM World) designed with subjects constantly entering and leaving view. Model compresses scenes into memory tokens; generating new frames searches memory for relevant pieces. Code + training code released.

VGG-RPO: 3D-aware diffusion

Google research adding “latent geometry” to video diffusion models — inferring surface positions and 3D structure instead of just 2D pixels, so camera motion feels real and objects stop morphing. No open weights yet, technical paper only.

Generative World Renderer

Pulls G-buffer data (RGB, depth, normals, albedo, metallic/roughness) from AAA games and restyles gameplay via text prompt — turn Wukong gameplay into a sand environment, or Cyberpunk neon, or add fireflies / fog / fire. Uses Nvidia Cosmos Transfer (7B, 29GB) + Wan 2.1 video. Code open.

Hand X

A dataset, not a model. Fine-grained humanoid hand motion capture with richly annotated prompts specifying finger positions, palm/wrist orientation, and inter-finger contacts. Useful for training robots (e.g., Unitree G1) in simulation environments like Nvidia Isaac Gym. Open dataset.

Tools: Netflix VOID, TokenDial, Hydra / HM World, VGG-RPO, Nvidia Cosmos Transfer, Wan 2.1, Hand X, Nvidia Isaac Gym, Unitree G1

AI Future AI Tools

AI Search

Open image/3D research drops: DreamLight, PS Designer, SeeThrough, LGTM

On the image + 3D side, ByteDance shipped DreamLight — a 0.39B image generator + editor that runs locally on an iPhone 17 Pro in ~3 seconds for a 1024x1024 image.^{[1]AI Search — image research roundup} Add Gen-Searcher (web-grounded image generation), PS Designer (agentic Photoshop-file output with layers), SeeThrough (anime image decomposition into editable layers + depth), and LGTM (fewer-Gaussian 3D reconstruction at 4K) — and the shape of the week becomes clear: on-device inference + research code hitting the same week as the closed-source frontier releases.

DreamLight: on-phone image gen

0.39B parameters, iPhone 17 Pro, 3 seconds for 1024x1024 at both generation and editing ~18:35. Quality clearly lags frontier models on skin and fur detail, but outperforms same-size and even larger predecessors. Model release is planned, code button is live on the project page.

Gen-Searcher: web-grounded image generation

A model-agnostic framework that searches the web for reference images before generating — so you can ask for a specific anime character, a dated infographic with correct temperature ranges, or a named building with the right architect and completion year, and it'll actually be accurate instead of hallucinated. Outperforms base models on accuracy/visual-correctness benchmarks for science, pop culture, and news. Code is open.

PS Designer: agentic Photoshop files

Four-agent system (asset collector → graphic planner → tool executor → feedback loop). Output isn't a flat PNG but a layered Photoshop file you can edit. Visual quality beats direct image-model competitors on the posters + graphic-design comparison reel. Code + models promised.

SeeThrough: anime scene decomposition

Takes a single anime image and decomposes it into editable transparent layers (chair, table, hair, tail, arms, etc.) plus depth. Fills in occluded portions — if something is behind the character, SeeThrough reconstructs it. Output feeds into tools that can then animate or re-light the character. Works with 8GB VRAM using 4-bit quant. Code open.

LGTM: Less Gaussians, Texture More

4K 3D-scene reconstruction from a handful of images, solving the resolution-scaling problem for 3D Gaussian splatting. Instead of adding more Gaussians as resolution rises, LGTM uses fewer Gaussians each carrying a texture patch — fewer compute blow-ups, much better fine detail. Technical paper released; code “coming soon.”

Tools: DreamLight (ByteDance), Gen-Searcher, PS Designer, SeeThrough, LGTM, 3D Gaussian splatting

Hot Take Developer Tools

Arjay McCandless

Hot Take: “Bro added logging” — why your CPU is on fire

Arjay McCandless's 60-second sketch: a junior engineer logs the entire database response on every request (“smaller than 6MB at least”) and the servers melt.^{[2]Arjay McCandless — Bro added logging} The payoff line crystallizes a real rule: log a request ID, a response code, a timestamp — maybe one or two sample rows — and never the entire output.

The full setup: production CPU load spikes because every request now re-serializes the database response into structured logs — and pays for it twice, once on CPU to format the payload and once on the logging service to store and search it. Junior defends the decision: “I thought you said logging was good. Makes debugging in production easier.”

It does help, but you still have to use it sparingly. Only use it for things you really need like a request ID, a response code, and a time stamp. It's fine to log some information about the data like the number of rows or maybe one or two rows as a sample, but you should never just be logging the entire output.

Why this matters in an AI-coding context: this exact failure mode is a default output pattern for coding agents. Tell a model “add logging” and it'll helpfully log the entire response body. Logging sparingly is learned taste; generating logging code without taste is a line of business.

Podcast Industry

The Pragmatic Engineer

Pragmatic Engineer teaser: the 30-hour Travis Kalanick CTO interview

A 45-second teaser clip from The Pragmatic Engineer previews an upcoming episode about a candidate's 30-hour one-on-one interview with Travis Kalanick for the Uber CTO role.^{[3]The Pragmatic Engineer — 30-hour Kalanick interview} The interviewee describes Kalanick writing out a whiteboard agenda from memory: a long list covering hiring, firing, communications, org design; a shorter list of engineering-specific topics (code quality, QA, design); and a third list of “five things he wants to see in an engineering team and the culture of an engineering team.”

A short teaser clip, not the full episode. The substantive moment: Kalanick didn't arrive with a scripted interview; he whiteboarded his agenda live from memory. Three lists:

General: hiring and firing, communications, org design — “a really long list.”
Engineering-specific: code quality, QA, design — shorter.
Culture: the five things Kalanick wants to see in an engineering team and the culture of an engineering team.

He committed over 30 hours interviewing me one-on-one.

Worth tracking for the full interview drop — 30-hour founder-CEO-to-CTO interview loops are unusual enough that the methodology alone is content.

Tools: Whiteboard interviews, founder-led hiring

Gemma 4: Google's open multimodal family, phone to 31B

The four-size lineup

Performance and availability

Claude Code source leaks — Buddy, Kairos, and “undercover mode”

Three hidden features found inside

Why this matters beyond the gossip

Alibaba's Qwen blitz: 3.5 Omni and 3.6 Plus with a 1M context

Qwen 3.5 Omni: video-to-code demos

Qwen 3.6 Plus: 1M context + agentic coding

Wan 2.7: Alibaba's video + image update, still behind Seedance

Wan 2.7 Video

Wan 2.7 Image: the better story

The voice cloning arms race: LongCat Audio DiT and OmniVoice

LongCat Audio DiT

OmniVoice: cross-lingual + prompt-based

GLM-5V Turbo: ZAI's vision-coding model beats Opus 4.6 on design

What “vision coding” actually unlocks

Open video research drops: Netflix VOID, TokenDial, Hydra, VGG-RPO

Netflix VOID

TokenDial

Hydra: hybrid memory for world models

VGG-RPO: 3D-aware diffusion

Generative World Renderer

Hand X

Open image/3D research drops: DreamLight, PS Designer, SeeThrough, LGTM

DreamLight: on-phone image gen

Gen-Searcher: web-grounded image generation

PS Designer: agentic Photoshop files

SeeThrough: anime scene decomposition

LGTM: Less Gaussians, Texture More

Hot Take: “Bro added logging” — why your CPU is on fire

Pragmatic Engineer teaser: the 30-hour Travis Kalanick CTO interview

Sources