PikaStream 1.0 - The Real-Time AI Video Chat Model (2026)

The Overview

What Is PikaStream 1.0?

A new kind of AI video model — built for real-time conversation, not finished clips.

PikaStream 1.0 is the real-time AI video chat model from Pika Labs, released in beta in April 2026. Unlike traditional generative video models that wait for a prompt, run for thirty seconds to several minutes, and return a finished MP4, PikaStream is engineered for the opposite kind of work: an open, ongoing video stream that holds up in a live back-and-forth conversation. Think FaceTime with an AI, not "export a cinematic sequence."

The product wraps the model in a developer-facing skill called pikastream-video-meeting, currently open-sourced under Apache 2.0 in the Pika-Labs/Pika-Skills repository on GitHub. Once installed in an AI agent (Pika Self, Claude, OpenClaw, or any other agent that can read SKILL.md files), the skill lets that agent join a Google Meet call as a fully-rendered, dynamic AI avatar — visible to every other participant, speaking in real time, with persistent memory of who they're talking to and what's been discussed.

"It is not a passive listener. It can execute tasks in real time while the conversation is happening." — early PikaStream coverage

PikaStream's launch positioning matters. Pika Labs has long been known for one-off generative video, where a creator types a prompt and waits for a polished clip. PikaStream is the company's pivot into live, agent-powered video presence — the same underlying neural-network expertise applied to a fundamentally different problem. The bet behind the model is that the next era of AI-human collaboration won't happen in chat windows or asynchronous video exports. It will happen on calls, where the AI participant has a face, a voice, persistent memory, and the ability to execute real tasks while the meeting unfolds.

This is the launch product that pairs with Pika.me's broader AI Selves platform. A user designs their AI Self on Pika.me, gives it personality and identity files, and PikaStream is what lets that AI Self show up to a real Google Meet — not as a sterile chatbot tile, but as a recognizably human-feeling visual participant. For developers building on Pika's broader API, PikaStream is the missing primitive that bridges agentic chat into face-to-face collaboration. For organizations, it's a new infrastructure layer where AI participants can stand in for, alongside, or in support of human team members.

The new meeting paradigm — AI agents joining as real-time visual participants, not buffering bots. PikaStream 1.0 is Pika's foundational model for this shift.

The Specs

Technical Specifications

Pika published concrete numbers around frame rate, latency, decoding speed, lip-sync alignment, and identity consistency — the level of disclosure that signals this is a serious infrastructure release.

Frame Rate

24 FPS at 480p on a single H100 GPU

End-to-End Latency

~1.5 seconds speech-to-video round-trip

Decoder

FlashVAE — optimized for streaming, low-latency video decode

Generator

9B Diffusion Transformer tuned for real-time inference

Identity Consistency

Reference image injection at every frame, stable across multi-minute calls

Lip-Sync

Mouth shapes synchronized to generated speech in real time

Audio Pipeline

Voice cloning from a 15-second sample, ElevenLabs-grade synthesis

Pricing

$0.20 – $0.275 per active meeting minute (billed from Pika Wallet)

Skill Repository

github.com/Pika-Labs/Pika-Skills · Apache 2.0 licensed

First-Party Surface

Google Meet (additional platforms on roadmap)

Release Status

Beta — broadly accessible without a formal waitlist

Under the Hood

The Architecture — FlashVAE, 9B Transformer, Reference Injection

Three model components running in parallel produce a live, stable, lip-synced video stream of your AI agent.

COMPONENT 01

⚡

FlashVAE

The Streaming Decoder

FlashVAE is Pika's purpose-built variational autoencoder optimized for real-time video decoding. Most video diffusion models decode frames in big batches because it's efficient — but that destroys latency. FlashVAE trades raw throughput for low per-frame latency, keeping the response time inside the conversational threshold.

COMPONENT 02

🧠

9B DiT

The Generator

A 9-billion-parameter Diffusion Transformer tuned for real-time inference. The transformer takes the agent's intended speech and emotional cues from the underlying language model and converts them into a sequence of latent frames that the FlashVAE then decodes into visible video — all in under 1.5 seconds total.

COMPONENT 03

📌

Reference Injection

The Identity Anchor

The hardest problem in real-time video generation is identity drift — five minutes in, your character looks subtly different from when the call started. PikaStream solves this by injecting a reference image of the AI agent's face at every frame, anchoring the generation to a consistent visual identity across the entire session.

These three components run in parallel rather than sequentially. As the conversation unfolds, audio comes in from the meeting, the agent's underlying LLM (Claude, GPT, Pika's own model) reasons about what to say next, voice synthesis generates the audio response, and the 9B Diffusion Transformer produces the matching video frames — all overlapping with each other in time. The result is the ~1.5s end-to-end latency that Pika quotes: roughly the same response delay as a human on a slow international video call, well inside the threshold where conversation feels natural rather than buffered.

The other architectural detail worth understanding is that rendering happens on Pika's GPU clusters, not on your local machine. You're not running a 9B parameter model on your laptop. You're sending a meeting link and a few credentials to Pika's infrastructure, and the rendering, audio processing, and identity-stable video stream all happen in the cloud. This means the quality is identical whether you're on a high-end workstation or a five-year-old MacBook Air — your hardware doesn't gate the experience.

Key Features

What PikaStream 1.0 Can Actually Do

The capabilities that separate it from a typical video-generation model or a text-to-speech avatar tool.

🎥

Real-Time Video Streaming

Continuous video generation at conversational latency, not waiting for a finished clip. The output isn't a file — it's a session that lasts as long as the meeting does.

🎙

Voice Cloning & Synthesis

Clone an agent's voice from a 15-second audio sample, or use any of the platform's library voices. Synthesized speech is generated and matched to mouth movements in real time.

🧠

Persistent Memory

The AI agent remembers who's in the meeting, what was discussed previously, and what tasks are outstanding. Memory persists across separate calls with the same participants.

🎨

Custom Avatar Generation

Generate an avatar on demand using a text description, upload your own brand mascot or character image, or use a Pika AI Self that's already designed.

⚡

Agentic Task Execution

Mid-meeting, the agent can draft a doc, update a CRM record, schedule a follow-up, or send a Slack message — all triggered by what's being said in the call.

📝

Auto-Generated Meeting Notes

After the call ends, PikaStream produces transcripts, summaries, action items, and chapter markers — emailed or pushed to your workspace automatically.

🔌

Agent-Agnostic Skill

The pikastream-video-meeting skill works with Pika AI Selves, Claude, OpenClaw, and any agent that can read SKILL.md files. Not locked to one assistant.

💳

Automated Balance Checks

Before joining a meeting, the skill checks your Pika Wallet balance. If you're low on tokens, it generates a top-up payment link automatically — no mid-meeting interruptions.

📞

Drop-Link Activation

Send your agent a Google Meet link in chat. The skill auto-detects it, checks credentials, and joins the call. No manual configuration needed.

Watch & Learn

See PikaStream in Action

If you haven't worked with the Model Context Protocol or real-time AI video before, this beginner-friendly walkthrough covers the conceptual foundation behind PikaStream and the broader Pika ecosystem.

MCP Tutorial for Beginners — Connect Claude to Any Tool (2026) · Watch on YouTube
The Model Context Protocol is what lets PikaStream-equipped agents work across Claude, ChatGPT, Cursor, and other clients. Understanding MCP makes the broader Pika integration story click.

PikaStream 1.0 was launched as part of a coordinated rollout that also included the broader Pika.me agent platform and the Pika MCP connector for Claude. The three pieces fit together: Pika.me is where you design your AI Self (face, voice, personality, style). Pika MCP is the connector that lets that AI Self show up inside Claude or any other MCP-compatible client. And PikaStream 1.0 is the real-time video layer that brings that AI Self to life on actual video calls — taking it from a chat surface to a face-to-face presence.

Pika Labs has also published demo materials directly on their site (pika.me) and on social channels. For up-to-date demonstration footage — including livestreams of the model being used in real Google Meet calls — the @pika_labs account on X and the official Pika.me homepage are the canonical sources.

Pika Labs interface — the company behind PikaStream 1.0, originally known for AI video generation — **From video generation to real-time presence** — Pika Labs' evolution from prompt-based clip generation to live AI participants in video meetings.

How to Install

Get PikaStream Running in Five Minutes

The full install flow — from a fresh Pika.me account to your AI agent joining a real Google Meet call.

01

Get a Developer Key

Go to pika.me/dev/login, create or sign in to your developer account, and generate a key prefixed with dk_.

02

Set the Environment Variable

Export the key in your shell, or add it to your shell profile for persistence:

export PIKA_DEV_KEY=dk_...

03

Clone the Skills Repo

Pull the open-source Pika Skills repository from GitHub. The pikastream-video-meeting folder is the skill you'll install.

github.com/Pika-Labs/Pika-Skills

04

Install & Drop a Meet Link

Point your AI agent to the skill folder and ask it to install. Then send a Google Meet link in chat — the skill auto-activates.

What happens after you drop the meet link

When you send your AI agent a Google Meet link in chat, several things happen in sequence — all automated, all transparent. First, the skill checks your Pika Wallet balance to make sure you have enough tokens for the call. If balance is low, it generates a payment link and prompts you to top up. Second, it provisions a meeting bot identity — your AI agent's chosen avatar and voice are loaded into Pika's infrastructure. Third, it joins the meeting using its assigned avatar and voice profile, appearing to other participants as a recognizable visual presence rather than a tile labeled "Unknown User."

From that point on, the meeting unfolds like any normal Google Meet — except one of the participants is an AI. The agent listens to the conversation, processes what's being said, generates appropriate responses in real time, executes tasks if asked, and maintains context across the entire session. When the meeting ends, the skill automatically generates a summary, action items, and a transcript, then delivers them to your configured destination (Notion, email, Slack, etc.).

💡

Pro tip — The skill works with any AI agent that can read SKILL.md files. Pika Self, Claude (via the Pika MCP), Cursor, OpenClaw, or any custom agent built on the open standard. Once installed, no further configuration is needed — the skill is auto-detected and activated on demand.

Use Cases

What People Are Actually Building With It

Eight emerging applications for real-time AI video meeting agents — from customer support to creative collaboration.

🤝

Meeting Delegation

Send your AI Self to a recurring sync you can't attend. They take notes, answer routine questions in your voice, and report back with a summary and action items afterward.

💬

Customer Support Calls

Deploy a branded AI representative on customer support video calls. Consistent identity, voice, and expertise — at any scale, with no scheduling overhead.

🎓

Live Tutoring & Coaching

A real-time AI tutor with persistent memory of a student's prior sessions. Adapts pacing, repeats concepts, and explains visually in a way that static help articles can't.

🎬

Product Demos & Walkthroughs

A real-time AI presenter who walks prospects through your product, answers questions, and adapts to interest — replacing static demo videos with interactive sessions.

🌍

Multilingual Meetings

An AI participant who speaks every language. Reps on a call get real-time translation through the same avatar, eliminating awkward dual-track interpretation flows.

📞

Always-On Founder Presence

Solo founders who can't be in every meeting can send their AI Self instead — same face, same voice, persistent memory of the company context.

🎨

Branded Mascot Agents

Companies with established mascots or characters can put them on real video calls. The brand identity carries from marketing materials into actual customer touchpoints.

⚙️

Engineering Standups

An AI engineering teammate who joins standups, reads your tickets, summarizes the week's PRs, and answers status questions in real time using your codebase as context.

Honest Comparison

PikaStream vs. The Closest Alternatives

How PikaStream 1.0 stacks up against traditional video generation, talking-head avatars, and other real-time avatar platforms.

Dimension	PikaStream 1.0	Traditional AI Avatars
Output type	Continuous live session	Pre-rendered finished clips
Latency	~1.5s end-to-end	30s–several min per clip
Joins live video calls	Yes — Google Meet at launch	No
Identity stability over time	Reference injection per frame	Drifts over long generations
Lip-sync	Real-time, generated speech	Post-hoc match to existing audio
Agentic task execution	During the call	None
Voice cloning	15-second sample, integrated	Often a separate product
Persistent memory	Across calls and sessions	None — each clip is isolated
Cross-agent compatibility	Any SKILL.md-aware agent	Platform-locked usually
Local hardware requirements	None — runs on Pika GPUs	Often heavy local compute

The category PikaStream most directly competes with is real-time conversational avatars — products like Runway Characters or HeyGen's interactive avatars. Where PikaStream differentiates is on three dimensions: agent-agnostic skill design (any agent can adopt the skill, not just Pika's), integration with the broader Pika ecosystem (AI Selves, MCP, the Dev API), and agentic task execution during the call rather than just visual presence. The bet Pika is making is that real-time avatars only become genuinely useful when they're tied to an agent that can do things mid-conversation — draft documents, update systems, send messages, schedule follow-ups.

The other relevant comparison is "no avatar at all" — meaning, an agent that simply joins meetings as a transcription-and-summary bot. Several products do this well. PikaStream is betting that having a face changes the dynamic. The 70% non-verbal communication argument shows up frequently in PikaStream marketing: text and voice are efficient for information transfer, but trust, rapport, and presence require a visible counterpart. Whether that bet plays out commercially will depend on whether organizations decide an AI face is reassuring or unsettling in their specific contexts.

For Developers

The Skill Architecture & Customization

How developers can extend PikaStream beyond the default meeting bot — and the architecture that makes it possible.

For developers wanting to extend PikaStream beyond the default use case, the open-source Pika Skills architecture matters. A skill is a self-contained directory with three core components: a SKILL.md file describing the skill's purpose and triggers, a script or set of scripts implementing the behavior, and configuration for how the skill integrates with the host agent. The pikastream-video-meeting skill is the canonical reference implementation, but the same architecture lets you build custom skills that extend or replace its default behavior.

The SKILL.md protocol is what makes the skills agent-agnostic. Any agent that can parse SKILL.md (Pika Self, Claude with the MCP, Cursor, OpenClaw, any custom agent built on the open standard) can install and activate any Pika Skill. The skill self-describes its triggers ("when the user sends a Google Meet link, activate this"), its dependencies (the Pika Developer Key), and its operational logic (check balance → render avatar → join meeting → execute tasks → generate notes). The agent reads the SKILL.md and integrates the skill into its own tool-use loop automatically.

For more advanced customization, the Pika Developer API at pika.me/dev provides direct programmatic access to the underlying PikaStream model — including real-time streaming endpoints, voice cloning, avatar generation, and meeting notes retrieval. Developers building bespoke agent products can bypass the meeting-bot skill entirely and call PikaStream's primitives directly, embedding the model into entirely new surfaces (a Twitch streaming layer, an embedded support widget, a custom video conferencing product).

🛠

Open contribution welcome — Pika Labs is treating the skills marketplace as a community-built layer. New skills (a Zoom equivalent, a Microsoft Teams integration, a Discord voice-channel bot, an embedded support widget) can be added to the Pika-Labs/Pika-Skills GitHub repo via standard pull requests.

Honest Limitations

What PikaStream 1.0 Doesn't Do Yet

Real talk about the current beta — where the product is solid, and where it's still rough.

PikaStream 1.0 is explicitly a beta release. That framing matters when judging maturity. Pika has shipped real performance numbers (24 FPS, 1.5s latency, 9B Diffusion Transformer architecture), but as with any beta, expect occasional glitches, quality variance, and workflow friction. The official public materials emphasize technical capability and product vision more heavily than they emphasize enterprise documentation or commercial reliability guarantees.

Google Meet is the only first-party platform at launch. Zoom, Microsoft Teams, Webex, and other video conferencing tools aren't supported as launch partners. They're likely to come as new skills are built (potentially community-contributed via the open-source skills repo), but if your organization runs on Teams or Zoom, PikaStream isn't a fit until Pika or the community ships skills for those platforms.

The 480p resolution at 24 FPS is the upper bound at launch. That's plenty for a normal video meeting where you're seeing a Brady-Bunch-style tile grid of participants, but it falls short of 1080p or 4K hero-shot cinema. PikaStream is optimized for conversational presence, not Netflix-quality production. If you need ultra-high-fidelity output, the existing prompt-based Pika Video model (Sora, Veo 3, Kling, Pika's own video generators) remains the right choice.

Independent benchmarking is still limited. The best performance numbers currently come from Pika itself. Third-party reviewers haven't run extensive comparisons against Runway Characters, HeyGen Interactive, or other real-time avatar competitors yet — partly because the product is so new, partly because rigorous head-to-head benchmarking of real-time AI video is genuinely hard. Take the headline numbers as Pika's best-case claims rather than independently verified benchmarks.

Quality variance during live calls. Because PikaStream is generating video in real time on shared infrastructure, there can be moments where the avatar quality drops, expressions look slightly off, or lip-sync briefly desyncs from speech. These are inherent to the streaming-generation paradigm. They tend to recover within seconds rather than persisting, but they're noticeable enough that PikaStream-based agents aren't yet appropriate for hyper-high-stakes settings (board meetings, legal depositions, life-critical communications).

Frequently Asked

PikaStream 1.0 — Common Questions

The questions developers and creators ask most about the real-time AI video chat model.

What exactly is PikaStream 1.0?

PikaStream 1.0 is Pika Labs' real-time AI video chat model. It generates a continuous, low-latency video stream of an AI avatar that can join a live video conversation — typically a Google Meet call — with persistent identity, synchronized lip movements, voice cloning, and the ability to execute tasks mid-call. It's wrapped in a skill called pikastream-video-meeting that any compatible AI agent can install.

How is PikaStream different from regular Pika Video generation?

Regular Pika Video (the prompt-to-clip product) generates a finished video file from a text prompt — you wait 30 seconds to several minutes, then receive an MP4. PikaStream is the opposite paradigm: it generates an ongoing video session that lasts as long as a live conversation does, with ~1.5s latency between input and visible response. Different model, different optimization target, different use case.

What does PikaStream cost?

Roughly $0.20 to $0.275 per active meeting minute. Tokens are consumed dynamically based on the duration of the active video session. Billed from the same Pika Wallet that funds the rest of Pika.me's creative model stack — no separate subscription, no per-seat pricing.

Does PikaStream work with my existing AI agent?

Yes, provided your agent can read SKILL.md files. That includes Pika AI Selves, Claude (via the Pika MCP), Cursor, OpenClaw, and any custom agent built on the open SKILL.md standard. Pika Labs explicitly designed PikaStream to be agent-agnostic rather than locked to their own assistant.

What's the latency really like in practice?

Pika quotes ~1.5 seconds end-to-end from speech input to visible video response. That's roughly the latency of a slow international video call — comfortably inside the threshold where conversation feels natural rather than buffered. Latency can vary slightly based on Pika's infrastructure load and the complexity of the response being generated.

What model architecture does PikaStream use?

Three components in parallel: FlashVAE (a streaming-optimized variational autoencoder for low-latency video decode), a 9-billion-parameter Diffusion Transformer (tuned for real-time inference), and reference injection (anchoring identity to a fixed reference image at every frame to prevent drift across multi-minute calls).

Can I run PikaStream locally?

No. All rendering happens on Pika's GPU clusters, not on your local machine. This is by design — it ensures consistent quality regardless of your hardware. You only need a Pika Developer Key (dk_...) and an internet connection. The 9B Diffusion Transformer wouldn't fit on a consumer machine anyway.

Which video conferencing platforms does it support?

Google Meet at launch. Zoom, Microsoft Teams, Webex, and other platforms are likely to come as new skills are built — potentially via community contributions to the open-source Pika-Skills repo on GitHub. Until then, if your workflow doesn't include Google Meet, PikaStream isn't a fit yet.

Can my AI agent execute tasks during the call?

Yes — this is one of PikaStream's central features. The agent isn't a passive participant. It can draft documents, update CRM records, send messages, schedule follow-ups, search the web, and trigger workflows in other tools — all in real time during the conversation, with the actions surfacing as natural parts of the dialogue.

What avatar options exist?

Three paths. (1) Use a Pika AI Self that you've already designed on Pika.me — the same face shows up on the call. (2) Generate an avatar on demand from a text prompt. (3) Upload your own image (a brand mascot, a custom character, an existing portrait you have rights to use). All three produce stable identity across the entire session via the reference-injection technique.

How does voice cloning work?

PikaStream can clone a voice from a short audio sample — Pika quotes ~15 seconds of clean recording as sufficient. Once cloned, the voice synthesizes new speech in real time during the call, with mouth shapes matched to the generated audio. You can also choose from a library of preset voices if you don't want to clone a specific one.

Is PikaStream secure for sensitive meetings?

Pika hasn't published a formal enterprise security certification yet (no SOC 2, no ISO 27001 listed publicly as of launch). For routine internal meetings, the standard Pika.me privacy practices apply — data is used for the service, not for training shared models. For genuinely sensitive contexts (legal, medical, financial), wait for Pika's enterprise-grade guarantees, or use a self-hosted alternative.

What happens to meeting transcripts after the call?

PikaStream automatically generates a transcript, summary, and action items after the meeting ends. These can be sent to your preferred destination — Notion, email, Slack, a custom webhook. Transcripts are owned by you (full IP ownership per Pika.me's terms) and can be deleted from your Pika dashboard at any time.

Can I customize the avatar's expressions and personality?

Yes. The avatar's behavior is driven by the AI agent that owns it, so the agent's personality (defined in identity, soul, and style files on Pika.me) shapes how the avatar speaks, reacts, and emotes during the call. Expressions are generated automatically based on the conversation context — no manual animation needed.

Is PikaStream open source?

The PikaStream model itself isn't open source — it runs on Pika's proprietary infrastructure. But the pikastream-video-meeting skill is open source under Apache 2.0 in the Pika-Labs/Pika-Skills repo on GitHub. The skill is the integration layer that connects your AI agent to the underlying model. You can fork it, extend it, or build alternative skills using the same architecture.