April 5, 2026
Google released Gemma 4, a four-size open-weights family (E2B, E4B, 24B MoE, 31B dense) under Apache 2.0, built from the same research as Gemini 3 but shrunk to run on phones, Raspberry Pis, and edge devices.[1]AI Search — Gemma 4, Claude Code leak, Wan 2.7, Qwen It's natively multimodal (text, image, audio), ships with 128K context on the small models and 256K on the larger ones, and is trained on 140+ languages. On AI Search's plot of size vs. performance, the 31B dense model sits near Kimi K2.5 Thinking — which is roughly 35x larger.
AI Search highlights Google's own benchmark chart showing Gemma 4 at roughly Kimi K2.5 Thinking parity ~03:10 despite the 35x size gap. Missing from the comparison: GLM 5.1. Models are live on Hugging Face, GGML quants already exist, and Google AI Studio lets you pick the 26B or 31B model from a dropdown for free testing.
You can now run a very performant AI model just offline on your phone or your computer... It's an incredibly capable open model.
On March 31, Anthropic shipped an NPM package that accidentally included a source map — unwrapping ~500,000 readable lines of TypeScript across ~2,000 files for Claude Code, their agentic coding framework.[1]AI Search — Claude Code leak segment Copies were mirrored across repos within hours; DMCA takedowns followed, but the internet had already done its thing. The interesting part isn't that it leaked — it's what the leak revealed about unreleased features and the prompt engineering behind the harness.
You are operating undercover in a public open-source repo. Your commit messages, PR titles, etc. must not contain any Anthropic internal information. Do not blow your cover.
AI Search's wry observation: undercover mode is probably what caused the leak in the first place — it was the mechanism trying to hide unreleased features from public commits, which made those features interesting enough to surface when the source map went public.
AI Search's read ~25:00: the model itself didn't leak — no weights, no training data. What leaked is the harness. And the harness is increasingly where competitive edge lives: how tasks get broken down, how tools are called, how memory and context get managed. The Theo “Opus jumps from 77% in Claude Code to 93% in Cursor on the same model” point from later in the month is the same phenomenon seen from the outside; this leak is the same phenomenon from the inside.
AI Search cut the segment short to flag a real-world event — “as I record this, I just noticed a black car with black windows parked outside my house... I just noticed a red dot on my forehead right now” — probably bit, probably not, but either way, funny.
Alibaba shipped two major Qwen releases the same week. Qwen 3.5 Omni is a true omnimodal (text, image, audio, video) with benchmarks that edge Gemini 3.1 Pro on some audio-visual tasks.[1]AI Search — Qwen 3.5 Omni / 3.6 Plus Qwen 3.6 Plus gets a 1M-token context window (~700K words) and significantly improved agentic coding — though AI Search flags the comparison charts as cherry-picked: they omit GLM 5.1, Opus 4.6, and Gemini 3.1 Pro.
Two flavors — the higher-quality Plus and a lower-latency Realtime version. The demo that actually shows its omnimodal teeth ~28:55: feed it a video of someone playing Snake, and with no text prompt, it codes a playable clone. Then feed it a follow-up video with mouse-drawn regions on-screen saying “make this area a spring theme, this one summer” — and it edits the existing game correctly.
Same trick for a product-interface demo: draw boxes on a video, narrate what each box should show, and Qwen 3.5 Omni outputs working HTML.
A sibling release rather than a replacement. Key deltas from 3.5:
Both models are live on QwenChat. The model dropdown at top-left is where you select either release.
Alibaba dropped Wan 2.7 in two flavors: a video generator with native audio and customizable characters (up to 5 reference images), and Wan 2.7 Image — a unified generator/editor tuned for realistic faces, precise hex-code color control, and text rendering in 12+ languages.[1]AI Search — Wan 2.7 video + image AI Search's honest take: the video quality still trails ByteDance's Seedance 2.0 and Kling 3, so if Wan 2.7 stays closed-source and paid, it's hard to recommend over the competition.
Multimodal conditioning (text, image, audio, video as reference inputs), up to 5 character reference images, custom voices, and native audio generation synced to the video. Alibaba claims improvements in fidelity, motion stability, and prompt adherence over 2.2 (the current open-source version). No announcement yet on whether 2.7 will be open-sourced — the previous open-source tier stops at 2.2.
Both live at wan.video. Select “image” then pick 2.7 or 2.7 Pro.
Two state-of-the-art voice cloners dropped the same week. Meituan's LongCat Audio DiT (yes, the food-delivery company) ships as a 3.5B parameter model with best-in-class error rate and similarity scores; a smaller 6GB version runs on consumer GPUs.[1]AI Search — LongCat + OmniVoice segment OmniVoice then ups the ante: 600+ languages supported, cross-lingual voice cloning (clone an English voice, have it speak Japanese), and text-prompt voice synthesis (“female elderly British accent” → a voice you've never heard but sounds right). Total install size: ~3GB, works on Apple Silicon.
Demos cover fast accent replication — Aussie male (“fair dinkum”) and Indian female — with tone carried across to the generated speech, not just pitch ~12:50. Two model sizes: a 3.5B / 15.3GB flagship and a 6GB efficient variant. Both fit comfortably on a modern consumer GPU.
The more interesting capability is decoupling identity from language. AI Search's demo ~35:30 takes a 4-second English clip (“Hey, look, a flying pig”) and has the same voice speak Japanese, Korean, Russian, and French. Separately, OmniVoice supports prompt-only synthesis — no reference audio, just “male high-pitched Indian accent” or “female elderly British.” And inline tags ([laughter], [gasps], [surprise]) are followed faithfully.
It's really good at cloning the exact tone and style of someone's voice.
~3GB install, GitHub-available, runs on both Nvidia and Apple Silicon. Accessibility to this kind of quality-at-this-size is new this quarter.
ZAI released GLM-5V Turbo, a vision-enabled coding model that accepts images, videos, and documents as reference inputs — sketches, wireframes, or full video recordings of a website get turned into working code, including animations.[1]AI Search — GLM-5V Turbo segment On design-specific benchmarks, it reportedly outperforms both Kimi K2.5 and Claude Opus 4.6 — notable because those are the presumed ceilings for frontier coding in early 2026.
Three concrete demos from the release page ~42:10:
GLM-5V Turbo is available via API, connects to OpenCode and Claude Code, and lives in the ZAI chat interface under the model dropdown.
This new GLM-5V Turbo even outperforms Kimi K2.5 as well as Claude Opus 4.6 for most of these benchmarks.
Caveat — these are ZAI's own benchmarks and the methodology for design-specific eval isn't as standardized as SWE-bench, so apples-to-apples is a stretch.
Five video/world-model research releases landed the same week, most with code. Netflix — unusually for them — open-sourced VOID, a prompt-based video object deletion model that physically re-renders the scene after removal.[1]AI Search — video research roundup Add TokenDial (slider-based video editing), Hydra (memory-persistent world models), VGG-RPO (3D-aware video diffusion), a generative G-buffer renderer for AAA-game restyling, and Hand X (a fine-grained humanoid hand-motion dataset) — and this single week has more open-video-model releases than most full months.
Video Object and Interaction Deletion. Delete an object via prompt and VOID fills in the gap while preserving physical plausibility — remove the ball from a bowling video and the pins stay upright; remove a car and the other car doesn't crash. Two-pass model, ~22GB total, needs a high-end GPU. Code is open.
Video editing via slider controls, not just prompts. “Make the explosion smokier” with a strength slider; “make the campfire bluer”; “make this person older.” Also works for motion intensity — dial up dancing or driving speed. Hugging Face demo + training code + models are promised.
Solves the persistence problem in AI-generated video: pan the camera away, then back, and the character/scene stays consistent. Built on a new 59,000-clip dataset (HM World) designed with subjects constantly entering and leaving view. Model compresses scenes into memory tokens; generating new frames searches memory for relevant pieces. Code + training code released.
Google research adding “latent geometry” to video diffusion models — inferring surface positions and 3D structure instead of just 2D pixels, so camera motion feels real and objects stop morphing. No open weights yet, technical paper only.
Pulls G-buffer data (RGB, depth, normals, albedo, metallic/roughness) from AAA games and restyles gameplay via text prompt — turn Wukong gameplay into a sand environment, or Cyberpunk neon, or add fireflies / fog / fire. Uses Nvidia Cosmos Transfer (7B, 29GB) + Wan 2.1 video. Code open.
A dataset, not a model. Fine-grained humanoid hand motion capture with richly annotated prompts specifying finger positions, palm/wrist orientation, and inter-finger contacts. Useful for training robots (e.g., Unitree G1) in simulation environments like Nvidia Isaac Gym. Open dataset.
On the image + 3D side, ByteDance shipped DreamLight — a 0.39B image generator + editor that runs locally on an iPhone 17 Pro in ~3 seconds for a 1024x1024 image.[1]AI Search — image research roundup Add Gen-Searcher (web-grounded image generation), PS Designer (agentic Photoshop-file output with layers), SeeThrough (anime image decomposition into editable layers + depth), and LGTM (fewer-Gaussian 3D reconstruction at 4K) — and the shape of the week becomes clear: on-device inference + research code hitting the same week as the closed-source frontier releases.
0.39B parameters, iPhone 17 Pro, 3 seconds for 1024x1024 at both generation and editing ~18:35. Quality clearly lags frontier models on skin and fur detail, but outperforms same-size and even larger predecessors. Model release is planned, code button is live on the project page.
A model-agnostic framework that searches the web for reference images before generating — so you can ask for a specific anime character, a dated infographic with correct temperature ranges, or a named building with the right architect and completion year, and it'll actually be accurate instead of hallucinated. Outperforms base models on accuracy/visual-correctness benchmarks for science, pop culture, and news. Code is open.
Four-agent system (asset collector → graphic planner → tool executor → feedback loop). Output isn't a flat PNG but a layered Photoshop file you can edit. Visual quality beats direct image-model competitors on the posters + graphic-design comparison reel. Code + models promised.
Takes a single anime image and decomposes it into editable transparent layers (chair, table, hair, tail, arms, etc.) plus depth. Fills in occluded portions — if something is behind the character, SeeThrough reconstructs it. Output feeds into tools that can then animate or re-light the character. Works with 8GB VRAM using 4-bit quant. Code open.
4K 3D-scene reconstruction from a handful of images, solving the resolution-scaling problem for 3D Gaussian splatting. Instead of adding more Gaussians as resolution rises, LGTM uses fewer Gaussians each carrying a texture patch — fewer compute blow-ups, much better fine detail. Technical paper released; code “coming soon.”
Arjay McCandless's 60-second sketch: a junior engineer logs the entire database response on every request (“smaller than 6MB at least”) and the servers melt.[2]Arjay McCandless — Bro added logging The payoff line crystallizes a real rule: log a request ID, a response code, a timestamp — maybe one or two sample rows — and never the entire output.
The full setup: production CPU load spikes because every request now re-serializes the database response into structured logs — and pays for it twice, once on CPU to format the payload and once on the logging service to store and search it. Junior defends the decision: “I thought you said logging was good. Makes debugging in production easier.”
It does help, but you still have to use it sparingly. Only use it for things you really need like a request ID, a response code, and a time stamp. It's fine to log some information about the data like the number of rows or maybe one or two rows as a sample, but you should never just be logging the entire output.
Why this matters in an AI-coding context: this exact failure mode is a default output pattern for coding agents. Tell a model “add logging” and it'll helpfully log the entire response body. Logging sparingly is learned taste; generating logging code without taste is a line of business.
A 45-second teaser clip from The Pragmatic Engineer previews an upcoming episode about a candidate's 30-hour one-on-one interview with Travis Kalanick for the Uber CTO role.[3]The Pragmatic Engineer — 30-hour Kalanick interview The interviewee describes Kalanick writing out a whiteboard agenda from memory: a long list covering hiring, firing, communications, org design; a shorter list of engineering-specific topics (code quality, QA, design); and a third list of “five things he wants to see in an engineering team and the culture of an engineering team.”
A short teaser clip, not the full episode. The substantive moment: Kalanick didn't arrive with a scripted interview; he whiteboarded his agenda live from memory. Three lists:
He committed over 30 hours interviewing me one-on-one.
Worth tracking for the full interview drop — 30-hour founder-CEO-to-CTO interview loops are unusual enough that the methodology alone is content.