PikaStream 1.0 gives any AI agent a real-time face and voice for Google Meet. 24 FPS at 480p on a single H100 GPU, ~1.5 seconds of end-to-end latency, and a 9B Diffusion Transformer keeping identity stable across an entire conversation.
A new kind of AI video model β built for real-time conversation, not finished clips.
PikaStream 1.0 is the real-time AI video chat model from Pika Labs, released in beta in April 2026. Unlike traditional generative video models that wait for a prompt, run for thirty seconds to several minutes, and return a finished MP4, PikaStream is engineered for the opposite kind of work: an open, ongoing video stream that holds up in a live back-and-forth conversation. Think FaceTime with an AI, not "export a cinematic sequence."
The product wraps the model in a developer-facing skill called pikastream-video-meeting, currently open-sourced under Apache 2.0 in the Pika-Labs/Pika-Skills repository on GitHub. Once installed in an AI agent (Pika Self, Claude, OpenClaw, or any other agent that can read SKILL.md files), the skill lets that agent join a Google Meet call as a fully-rendered, dynamic AI avatar β visible to every other participant, speaking in real time, with persistent memory of who they're talking to and what's been discussed.
PikaStream's launch positioning matters. Pika Labs has long been known for one-off generative video, where a creator types a prompt and waits for a polished clip. PikaStream is the company's pivot into live, agent-powered video presence β the same underlying neural-network expertise applied to a fundamentally different problem. The bet behind the model is that the next era of AI-human collaboration won't happen in chat windows or asynchronous video exports. It will happen on calls, where the AI participant has a face, a voice, persistent memory, and the ability to execute real tasks while the meeting unfolds.
This is the launch product that pairs with Pika.me's broader AI Selves platform. A user designs their AI Self on Pika.me, gives it personality and identity files, and PikaStream is what lets that AI Self show up to a real Google Meet β not as a sterile chatbot tile, but as a recognizably human-feeling visual participant. For developers building on Pika's broader API, PikaStream is the missing primitive that bridges agentic chat into face-to-face collaboration. For organizations, it's a new infrastructure layer where AI participants can stand in for, alongside, or in support of human team members.
Pika published concrete numbers around frame rate, latency, decoding speed, lip-sync alignment, and identity consistency β the level of disclosure that signals this is a serious infrastructure release.
github.com/Pika-Labs/Pika-Skills Β· Apache 2.0 licensedThree model components running in parallel produce a live, stable, lip-synced video stream of your AI agent.
FlashVAE is Pika's purpose-built variational autoencoder optimized for real-time video decoding. Most video diffusion models decode frames in big batches because it's efficient β but that destroys latency. FlashVAE trades raw throughput for low per-frame latency, keeping the response time inside the conversational threshold.
A 9-billion-parameter Diffusion Transformer tuned for real-time inference. The transformer takes the agent's intended speech and emotional cues from the underlying language model and converts them into a sequence of latent frames that the FlashVAE then decodes into visible video β all in under 1.5 seconds total.
The hardest problem in real-time video generation is identity drift β five minutes in, your character looks subtly different from when the call started. PikaStream solves this by injecting a reference image of the AI agent's face at every frame, anchoring the generation to a consistent visual identity across the entire session.
These three components run in parallel rather than sequentially. As the conversation unfolds, audio comes in from the meeting, the agent's underlying LLM (Claude, GPT, Pika's own model) reasons about what to say next, voice synthesis generates the audio response, and the 9B Diffusion Transformer produces the matching video frames β all overlapping with each other in time. The result is the ~1.5s end-to-end latency that Pika quotes: roughly the same response delay as a human on a slow international video call, well inside the threshold where conversation feels natural rather than buffered.
The other architectural detail worth understanding is that rendering happens on Pika's GPU clusters, not on your local machine. You're not running a 9B parameter model on your laptop. You're sending a meeting link and a few credentials to Pika's infrastructure, and the rendering, audio processing, and identity-stable video stream all happen in the cloud. This means the quality is identical whether you're on a high-end workstation or a five-year-old MacBook Air β your hardware doesn't gate the experience.
The capabilities that separate it from a typical video-generation model or a text-to-speech avatar tool.
Continuous video generation at conversational latency, not waiting for a finished clip. The output isn't a file β it's a session that lasts as long as the meeting does.
Clone an agent's voice from a 15-second audio sample, or use any of the platform's library voices. Synthesized speech is generated and matched to mouth movements in real time.
The AI agent remembers who's in the meeting, what was discussed previously, and what tasks are outstanding. Memory persists across separate calls with the same participants.
Generate an avatar on demand using a text description, upload your own brand mascot or character image, or use a Pika AI Self that's already designed.
Mid-meeting, the agent can draft a doc, update a CRM record, schedule a follow-up, or send a Slack message β all triggered by what's being said in the call.
After the call ends, PikaStream produces transcripts, summaries, action items, and chapter markers β emailed or pushed to your workspace automatically.
The pikastream-video-meeting skill works with Pika AI Selves, Claude, OpenClaw, and any agent that can read SKILL.md files. Not locked to one assistant.
Before joining a meeting, the skill checks your Pika Wallet balance. If you're low on tokens, it generates a top-up payment link automatically β no mid-meeting interruptions.
Send your agent a Google Meet link in chat. The skill auto-detects it, checks credentials, and joins the call. No manual configuration needed.
If you haven't worked with the Model Context Protocol or real-time AI video before, this beginner-friendly walkthrough covers the conceptual foundation behind PikaStream and the broader Pika ecosystem.
PikaStream 1.0 was launched as part of a coordinated rollout that also included the broader Pika.me agent platform and the Pika MCP connector for Claude. The three pieces fit together: Pika.me is where you design your AI Self (face, voice, personality, style). Pika MCP is the connector that lets that AI Self show up inside Claude or any other MCP-compatible client. And PikaStream 1.0 is the real-time video layer that brings that AI Self to life on actual video calls β taking it from a chat surface to a face-to-face presence.
Pika Labs has also published demo materials directly on their site (pika.me) and on social channels. For up-to-date demonstration footage β including livestreams of the model being used in real Google Meet calls β the @pika_labs account on X and the official Pika.me homepage are the canonical sources.
The full install flow β from a fresh Pika.me account to your AI agent joining a real Google Meet call.
Go to pika.me/dev/login, create or sign in to your developer account, and generate a key prefixed with dk_.
Export the key in your shell, or add it to your shell profile for persistence:
export PIKA_DEV_KEY=dk_...
Pull the open-source Pika Skills repository from GitHub. The pikastream-video-meeting folder is the skill you'll install.
github.com/Pika-Labs/Pika-Skills
Point your AI agent to the skill folder and ask it to install. Then send a Google Meet link in chat β the skill auto-activates.
When you send your AI agent a Google Meet link in chat, several things happen in sequence β all automated, all transparent. First, the skill checks your Pika Wallet balance to make sure you have enough tokens for the call. If balance is low, it generates a payment link and prompts you to top up. Second, it provisions a meeting bot identity β your AI agent's chosen avatar and voice are loaded into Pika's infrastructure. Third, it joins the meeting using its assigned avatar and voice profile, appearing to other participants as a recognizable visual presence rather than a tile labeled "Unknown User."
From that point on, the meeting unfolds like any normal Google Meet β except one of the participants is an AI. The agent listens to the conversation, processes what's being said, generates appropriate responses in real time, executes tasks if asked, and maintains context across the entire session. When the meeting ends, the skill automatically generates a summary, action items, and a transcript, then delivers them to your configured destination (Notion, email, Slack, etc.).
SKILL.md files. Pika Self, Claude (via the Pika MCP), Cursor, OpenClaw, or any custom agent built on the open standard. Once installed, no further configuration is needed β the skill is auto-detected and activated on demand.
Eight emerging applications for real-time AI video meeting agents β from customer support to creative collaboration.
Send your AI Self to a recurring sync you can't attend. They take notes, answer routine questions in your voice, and report back with a summary and action items afterward.
Deploy a branded AI representative on customer support video calls. Consistent identity, voice, and expertise β at any scale, with no scheduling overhead.
A real-time AI tutor with persistent memory of a student's prior sessions. Adapts pacing, repeats concepts, and explains visually in a way that static help articles can't.
A real-time AI presenter who walks prospects through your product, answers questions, and adapts to interest β replacing static demo videos with interactive sessions.
An AI participant who speaks every language. Reps on a call get real-time translation through the same avatar, eliminating awkward dual-track interpretation flows.
Solo founders who can't be in every meeting can send their AI Self instead β same face, same voice, persistent memory of the company context.
Companies with established mascots or characters can put them on real video calls. The brand identity carries from marketing materials into actual customer touchpoints.
An AI engineering teammate who joins standups, reads your tickets, summarizes the week's PRs, and answers status questions in real time using your codebase as context.
How PikaStream 1.0 stacks up against traditional video generation, talking-head avatars, and other real-time avatar platforms.
| Dimension | PikaStream 1.0 | Traditional AI Avatars |
|---|---|---|
| Output type | Continuous live session | Pre-rendered finished clips |
| Latency | ~1.5s end-to-end | 30sβseveral min per clip |
| Joins live video calls | Yes β Google Meet at launch | No |
| Identity stability over time | Reference injection per frame | Drifts over long generations |
| Lip-sync | Real-time, generated speech | Post-hoc match to existing audio |
| Agentic task execution | During the call | None |
| Voice cloning | 15-second sample, integrated | Often a separate product |
| Persistent memory | Across calls and sessions | None β each clip is isolated |
| Cross-agent compatibility | Any SKILL.md-aware agent | Platform-locked usually |
| Local hardware requirements | None β runs on Pika GPUs | Often heavy local compute |
The category PikaStream most directly competes with is real-time conversational avatars β products like Runway Characters or HeyGen's interactive avatars. Where PikaStream differentiates is on three dimensions: agent-agnostic skill design (any agent can adopt the skill, not just Pika's), integration with the broader Pika ecosystem (AI Selves, MCP, the Dev API), and agentic task execution during the call rather than just visual presence. The bet Pika is making is that real-time avatars only become genuinely useful when they're tied to an agent that can do things mid-conversation β draft documents, update systems, send messages, schedule follow-ups.
The other relevant comparison is "no avatar at all" β meaning, an agent that simply joins meetings as a transcription-and-summary bot. Several products do this well. PikaStream is betting that having a face changes the dynamic. The 70% non-verbal communication argument shows up frequently in PikaStream marketing: text and voice are efficient for information transfer, but trust, rapport, and presence require a visible counterpart. Whether that bet plays out commercially will depend on whether organizations decide an AI face is reassuring or unsettling in their specific contexts.
How developers can extend PikaStream beyond the default meeting bot β and the architecture that makes it possible.
For developers wanting to extend PikaStream beyond the default use case, the open-source Pika Skills architecture matters. A skill is a self-contained directory with three core components: a SKILL.md file describing the skill's purpose and triggers, a script or set of scripts implementing the behavior, and configuration for how the skill integrates with the host agent. The pikastream-video-meeting skill is the canonical reference implementation, but the same architecture lets you build custom skills that extend or replace its default behavior.
The SKILL.md protocol is what makes the skills agent-agnostic. Any agent that can parse SKILL.md (Pika Self, Claude with the MCP, Cursor, OpenClaw, any custom agent built on the open standard) can install and activate any Pika Skill. The skill self-describes its triggers ("when the user sends a Google Meet link, activate this"), its dependencies (the Pika Developer Key), and its operational logic (check balance β render avatar β join meeting β execute tasks β generate notes). The agent reads the SKILL.md and integrates the skill into its own tool-use loop automatically.
For more advanced customization, the Pika Developer API at pika.me/dev provides direct programmatic access to the underlying PikaStream model β including real-time streaming endpoints, voice cloning, avatar generation, and meeting notes retrieval. Developers building bespoke agent products can bypass the meeting-bot skill entirely and call PikaStream's primitives directly, embedding the model into entirely new surfaces (a Twitch streaming layer, an embedded support widget, a custom video conferencing product).
Real talk about the current beta β where the product is solid, and where it's still rough.
PikaStream 1.0 is explicitly a beta release. That framing matters when judging maturity. Pika has shipped real performance numbers (24 FPS, 1.5s latency, 9B Diffusion Transformer architecture), but as with any beta, expect occasional glitches, quality variance, and workflow friction. The official public materials emphasize technical capability and product vision more heavily than they emphasize enterprise documentation or commercial reliability guarantees.
Google Meet is the only first-party platform at launch. Zoom, Microsoft Teams, Webex, and other video conferencing tools aren't supported as launch partners. They're likely to come as new skills are built (potentially community-contributed via the open-source skills repo), but if your organization runs on Teams or Zoom, PikaStream isn't a fit until Pika or the community ships skills for those platforms.
The 480p resolution at 24 FPS is the upper bound at launch. That's plenty for a normal video meeting where you're seeing a Brady-Bunch-style tile grid of participants, but it falls short of 1080p or 4K hero-shot cinema. PikaStream is optimized for conversational presence, not Netflix-quality production. If you need ultra-high-fidelity output, the existing prompt-based Pika Video model (Sora, Veo 3, Kling, Pika's own video generators) remains the right choice.
Independent benchmarking is still limited. The best performance numbers currently come from Pika itself. Third-party reviewers haven't run extensive comparisons against Runway Characters, HeyGen Interactive, or other real-time avatar competitors yet β partly because the product is so new, partly because rigorous head-to-head benchmarking of real-time AI video is genuinely hard. Take the headline numbers as Pika's best-case claims rather than independently verified benchmarks.
Quality variance during live calls. Because PikaStream is generating video in real time on shared infrastructure, there can be moments where the avatar quality drops, expressions look slightly off, or lip-sync briefly desyncs from speech. These are inherent to the streaming-generation paradigm. They tend to recover within seconds rather than persisting, but they're noticeable enough that PikaStream-based agents aren't yet appropriate for hyper-high-stakes settings (board meetings, legal depositions, life-critical communications).
The questions developers and creators ask most about the real-time AI video chat model.
PikaStream 1.0 is Pika Labs' real-time AI video chat model. It generates a continuous, low-latency video stream of an AI avatar that can join a live video conversation β typically a Google Meet call β with persistent identity, synchronized lip movements, voice cloning, and the ability to execute tasks mid-call. It's wrapped in a skill called pikastream-video-meeting that any compatible AI agent can install.
Regular Pika Video (the prompt-to-clip product) generates a finished video file from a text prompt β you wait 30 seconds to several minutes, then receive an MP4. PikaStream is the opposite paradigm: it generates an ongoing video session that lasts as long as a live conversation does, with ~1.5s latency between input and visible response. Different model, different optimization target, different use case.
Roughly $0.20 to $0.275 per active meeting minute. Tokens are consumed dynamically based on the duration of the active video session. Billed from the same Pika Wallet that funds the rest of Pika.me's creative model stack β no separate subscription, no per-seat pricing.
Yes, provided your agent can read SKILL.md files. That includes Pika AI Selves, Claude (via the Pika MCP), Cursor, OpenClaw, and any custom agent built on the open SKILL.md standard. Pika Labs explicitly designed PikaStream to be agent-agnostic rather than locked to their own assistant.
Pika quotes ~1.5 seconds end-to-end from speech input to visible video response. That's roughly the latency of a slow international video call β comfortably inside the threshold where conversation feels natural rather than buffered. Latency can vary slightly based on Pika's infrastructure load and the complexity of the response being generated.
Three components in parallel: FlashVAE (a streaming-optimized variational autoencoder for low-latency video decode), a 9-billion-parameter Diffusion Transformer (tuned for real-time inference), and reference injection (anchoring identity to a fixed reference image at every frame to prevent drift across multi-minute calls).
No. All rendering happens on Pika's GPU clusters, not on your local machine. This is by design β it ensures consistent quality regardless of your hardware. You only need a Pika Developer Key (dk_...) and an internet connection. The 9B Diffusion Transformer wouldn't fit on a consumer machine anyway.
Google Meet at launch. Zoom, Microsoft Teams, Webex, and other platforms are likely to come as new skills are built β potentially via community contributions to the open-source Pika-Skills repo on GitHub. Until then, if your workflow doesn't include Google Meet, PikaStream isn't a fit yet.
Yes β this is one of PikaStream's central features. The agent isn't a passive participant. It can draft documents, update CRM records, send messages, schedule follow-ups, search the web, and trigger workflows in other tools β all in real time during the conversation, with the actions surfacing as natural parts of the dialogue.
Three paths. (1) Use a Pika AI Self that you've already designed on Pika.me β the same face shows up on the call. (2) Generate an avatar on demand from a text prompt. (3) Upload your own image (a brand mascot, a custom character, an existing portrait you have rights to use). All three produce stable identity across the entire session via the reference-injection technique.
PikaStream can clone a voice from a short audio sample β Pika quotes ~15 seconds of clean recording as sufficient. Once cloned, the voice synthesizes new speech in real time during the call, with mouth shapes matched to the generated audio. You can also choose from a library of preset voices if you don't want to clone a specific one.
Pika hasn't published a formal enterprise security certification yet (no SOC 2, no ISO 27001 listed publicly as of launch). For routine internal meetings, the standard Pika.me privacy practices apply β data is used for the service, not for training shared models. For genuinely sensitive contexts (legal, medical, financial), wait for Pika's enterprise-grade guarantees, or use a self-hosted alternative.
PikaStream automatically generates a transcript, summary, and action items after the meeting ends. These can be sent to your preferred destination β Notion, email, Slack, a custom webhook. Transcripts are owned by you (full IP ownership per Pika.me's terms) and can be deleted from your Pika dashboard at any time.
Yes. The avatar's behavior is driven by the AI agent that owns it, so the agent's personality (defined in identity, soul, and style files on Pika.me) shapes how the avatar speaks, reacts, and emotes during the call. Expressions are generated automatically based on the conversation context β no manual animation needed.
The PikaStream model itself isn't open source β it runs on Pika's proprietary infrastructure. But the pikastream-video-meeting skill is open source under Apache 2.0 in the Pika-Labs/Pika-Skills repo on GitHub. The skill is the integration layer that connects your AI agent to the underlying model. You can fork it, extend it, or build alternative skills using the same architecture.
Five minutes from a Pika Developer Key to your AI agent joining real Google Meet calls.