Hybrid Real-Time Studios, A Case Study

The Birth of Living Stories, Part 9

Apr 06, 2025

Note: This is the ninth installment in a twelve-part series exploring the future of entertainment through the convergence of five core fields—AI, Blockchain, XR, Neo Cinema, and Gaming. Together, they are reshaping how content is created, distributed, and experienced. This installment posits what a future entertainment studio could look like. Here’s Part 1.

What is a Hybrid Real-Time Studio?

The term “studio” is misleading because it conjures images of brick-and-mortar production houses—but this is something different. A Hybrid Real-Time Studio isn’t defined by physical space but by a process: a new way of creating, distributing, and expanding content (i.e., ‘stories’) using real-time engines, AI, blockchain, and interoperability-first design.

Traditional studios make products—a movie, a TV show, a game—and then move on to the next thing. Hybrid Real-Time Studios don’t make products; they build networks based on Infinite IP—a concept where stories exist in a state of permanent beta, evolving based on audience engagement, technology, and network effects, where the IP itself becomes the platform.

By merging creative storytelling with real-time technologies, hybrid studios unlock new monetization models, scale fan engagement, and de-risk production cycles.

Assumptions

This case study makes a few assumptions that are best articulated upfront:

A Real-Time Team building a hybrid studio N years from now will encounter fewer pain points than one starting today
AI is infused into every fiber of a real-time studios’ efforts, allowing the team to continuously optimize processes & pipelines
1. It is presumed that a Hybrid Real-Time Studio augments traditional 3D pipelines with AI (more control) rather than building a 100% end-to-end AI pipeline (less control)
Blockchain serves as the economic backbone of the project
The Immersive Web functions as the central distribution platform for the project, thus bypassing legacy gatekeepers
The Founding Team has sufficient expertise, talent network, and resources to successfully execute on all aspects of the project
The IP (‘Intellectual Property’), in this case, refers to a new story that is developed in-house by the creative team
Virtual Production (VPROD), in this context, refers to 100% 3D world capture, including real-time motion capture (Mocap), not In-Camera VFX (ICVFX)

Note: A 100% 3D virtual production pipeline blends real-time technologies with traditional filmmaking techniques, allowing production teams to visualize live performances within completely virtual environments. By keeping all production assets native to the digital realm, hybrid real-time teams can seamlessly reuse 3D assets while better aligning cadences and reducing sunk costs.

Top-Down vs Bottom-Up

For the purpose of this case study, I am using a Top-Down model. Meaning, there would be a large initial Capital Expenditure (CapEx) that allows the real-time team to move forward with the largest, longest, and most complex outlay, which, in this hypothetical case, is the virtual world. The 3D assets used to develop the virtual world can then be reused and funneled to the studios’ other initiatives while the world is being developed.

With a Bottom-Up approach there is little-to-no upfront CapEx investment. That means the studio would need to start at the smallest, quickest-to-market, and least complex outlay, which, in this context, is the short-form XR (e.g., VTubing) initiative, and expand through Total Addressable Markets (TAMs) via business development.

Note: XR is an umbrella term encompassing Augmented Reality (AR), Mixed Reality (MR), and Virtual Reality (VR) experiences. It’s helpful to think of Virtual Production (VProd) as a subset of XR and VTubing (Virtual YouTubing) as a subset of Virtual Production.

The logic behind a Top-Down approach is to align the natural cadences of different real-time initiatives with the economics of real-time 3D pipelines.

• AAA Game Development Cycle → 3-5 years

• TV/Film (100% Virtual) Production Cycle → 1-2 years

• Short-form (XR) Content Production Cycle → Weeks to months

Note: As AI continues to scale, expect cycle times to shorten

By developing the costliest, most time-intensive effort first, then repurposing production assets downstream, the studio begins monetization almost immediately, well before the game, TV series, or film launches.

This strategy transforms a costly, high-risk content pipeline into a modular, revenue-generating ecosystem. It accomplishes this by transforming the traditional horizontal production process used in film and TV into the more vertical process used in games, XR, and software development.

I can’t really stress enough how important this is, because process is culture-and culture is arguably the greatest single barrier that separates the game and film worlds. By aligning processes, a new culture can be formed that bridges the two worlds.

The Hybrid Real-Time Studio Framework: A Two-Part Process

The foundation of a Hybrid Real-Time Studio rests on two key phases:

1. Creative Development → Defining the story, technical parameters, and ecosystem strategy

2. Creative Execution → Deploying the story across platforms in a way that scales and monetizes over time

Each phase is built to be flexible, ensuring that assets are interoperable, audience engagement is continuous, and monetization begins ASAP.

Phase 1: Creative Development (Building Infinite IP)

Most stories are still treated like finite experiences, existing only within a single medium or release window. Even massive franchises like Marvel or Star Wars still follow a release-driven cycle, with long gaps between major content drops.

But what if stories were designed from the start to be open-ended, interactive, and platform-agnostic? That’s what Creative Development in a Hybrid Real-Time Studio does—it architects a world that isn’t limited to a single format.

At this stage, the initial founding team—led by a Chief Creative Officer (CCO) and a Chief Technology Officer (CTO) for example—maps out the story, characters, and technical foundation. The purpose here is not only to flesh out the creative aspects of the storyworld but also to identify both the soft and hard technical constraints that the storyworld must push past or adhere to.

Ideally, these are two very senior industry veterans:

The CCO would come from Hollywood, since this founder excels at telling world-class stories. The ideal founder would be equally adept at writing and directing, and have deep experience filming in 100% virtually produced 3D worlds
- Examples: Jon Favreau, Wes Ball, Tim Miller, James Cameron
The CTO would come from the AAA game world, since this founder excels at developing fun interactive experiences. The ideal founder would have held the highest-ranking engineering role directly responsible for game technology, systems architecture, and technical execution across multiple large projects. This person would be a C++ and Python master so he/she can effectively integrate AI into all real-time processes
- Knowledge of Assembly paired with Cuda/cuDNN/TensorRT, PyTorch/TensorFlow, and compiler tools like LLVM and NVCC are a bonus, as is knowledge of WebAssembly, WebGPU, WebXR, and RDMA

Assuming the founding team has aligned on the IP, the next two seeds required to plant the storyworld are characters and assets.

The Four-Corner Opposition Framework

Traditional storytelling pits a single protagonist against a single antagonist, with a clear arc leading to resolution. But static narratives aren’t optimized for interactive worlds—they need room to grow, evolve, and allow players/users to take different roles and incentivize UGC.

That’s why real-time IP development follows a Four-Corner Opposition Model.

Instead of a single hero/villain dynamic, this structure creates four main characters who represent two conflicting philosophical viewpoints.

The Protagonist holds the positive position of Viewpoint 1.
The Antagonist holds the negative position of Viewpoint 2.
Main Character 3 reflects the negative position of Viewpoint 1.
Main Character 4 reflects the positive position of Viewpoint 2.

Four Corner Opposition creates a constellation of characters and factions, setting the preconditions for scalability

What’s interesting about this framework is that the Protagonist has to deal with three opposing forces rather than only one (FYI: not all opposing forces are villains). This creates a dynamic web of relationships in which allies can form around each main character, clash, shift allegiances, and/or evolve based on the interplay of these philosophical viewpoints.

By tripling the number of opposing forces arrayed against the Protagonist, the creative team compounds the total surface area of conflict by orders of magnitude.

Four Corner Opposition incentivizes faction-based storytelling, community participation, and player-driven experiences, and works equally well in a game, a film, a short-form content drop, a livestream, or an Agentic AI-driven experience.

It allows the creative team to focus on the main story and characters that are necessary to drive discoverability, while ensuring there are sufficient factions surrounding those characters to allow the IP to scale by incentivizing UGC.

OpenUSD x glTF: A New Creative Infrastructure

For a Hybrid Real-Time Studio to function, it needs a unified asset pipeline that allows characters, props, and environments to move seamlessly between formats. That’s where OpenUSD x glTF comes in—an emerging new industry standard for interoperable 3D assets.

Note: In the world of 2D, we already have universal standards like JPEG and PNG. These file formats allow images to move effortlessly between devices, platforms, and tools. But in the world of 3D, studios often rely on proprietary formats or highly specialized workflows tied to specific tools. This fragmentation creates bottlenecks that slow down production and limit collaboration.

By building with an OpenUSD x glTF standard, studios can:

Create once, deploy anywhere (games, film, XR, mobile)
Reduce production costs and streamline iteration cycles

Let’s unpack this a bit.

Modular Asset Complexity

USD (Universal Scene Description)—developed by Pixar—is layered and highly modular by design. Its architecture supports:

Variants (e.g., low-res vs high-res meshes)
Payloads (external references to other files/assets)
Composition arcs (ways to combine, override, or specialize content)
Schema extensibility (you can define your own metadata and behaviors)

glTF (GL Transmission Format), on the other hand, is optimized for runtime delivery, especially on the web and in games. It’s lean, lightweight, and built for efficient parsing by GPUs. It doesn't offer the same authoring complexity USD does, but it's excellent for deployment.

Note: Autodesk has expressed a desire to sunset development of its FBX file format while ensuring that key features are integrated into more modern open standards, such as USD and glTF.

So, the USD x glTF pipeline that's being worked on by the Metaverse Standards Forum, including Autodesk, aims to combine these strengths: USD for authoring, glTF for delivery. That means you get authoring-time flexibility with runtime efficiency.

Avatar Example: A Perfect Fit

Let’s say you have a character avatar, and you want to use it across film, games, MR headsets, and mobile devices.

You can set it up like this:

1. Core Asset Layer

Base mesh
Core skeleton
Primary textures
Universal topology

This is the neutral layer shared by all departments—film, game, XR, social media, etc.

2. Variant Layers

In USD, you can create variants to handle different use cases:

variantSet = "rig_complexity"
- variant = "film" → full facial rig, 150+ blendshapes, 8K textures
- variant = "game" → simplified rig, 5–10 blendshapes, optimized 2K textures
variantSet = "LOD"
- variant = "high" → dense mesh
- variant = "low" → decimated mesh for mobile

3. Composition Arcs

Using composition arcs, real-time teams can override or layer behaviors depending on the use case. So the film pipeline can pull in the base avatar and layer on additional complexity without touching the original asset. Meanwhile, the game team can do the opposite—strip away unnecessary complexity for performance.

4. Switching On and Off

Because USD supports variant switching at runtime or load-time, DCC tools (Maya, Houdini, Blender), game engines (Unreal, Unity), or even web-based tools can toggle complexity on demand.

Summary: Interoperability Is The Point

FBX sunsetting is pushing the industry toward modular, open, and layered asset pipelines.
USD’s variant and composition system lets teams toggle complexity depending on need.
glTF serves as the optimized delivery target, but USD is the authoring superstructure.
Teams can collaborate without duplicating work or fragmenting their pipelines.

This approach is perfect for Hybrid Real-Time Studios, where assets flow across use cases: social media → film → games → XR → back again.

This is what allows Infinite IP to function—every digital asset becomes a persistent, evolving piece of the world, rather than a disposable asset tied to a single project.

Additionally-and, arguably, most importantly-this open standard allows fans to build on top of the project in the same way that JPEGs allowed early NFT projects to scale-but with actual utility (more on this in Part 11).

Of course, there are myriad other creative and technical decisions the founding team would need to flesh out at this stage but those can be faced on a case-by-case basis rather than detailed here.

What’s important is that the founding team aligns on the story, characters, production design, and technical components in Phase 1.

Phase 2: Creative Execution (Bonding, Scaling, and Monetizing the IP)

The execution phase splits into two sub-buckets, reflecting two primary ways people engage with stories:

1. Third-Person POV Narratives → Linear storytelling (TV, Film, Short-form Social Content, Livestreams, and Media Mix)

Purpose: Network bonding and scaling

2. First-Person POV Experiences → Interactive, player-driven experiences (Games, Virtual Worlds, Marketplaces)

Purpose: Fun + value capture

This structure ensures both economic sustainability and community growth, as each category feeds into the next.

Create a 3D asset once and share it across pipelines

It’s helpful here to pause and ground our thinking in Disney’s business model, where movies are passive (narrative) forms of entertainment and theme parks are interactive (immersive). In this model, Disney derives 3x more revenue from theme park admissions than from box office receipts-yet movies act as the delivery vehicles that monetize the characters and worlds in its theme parks. Remove the characters and stories from Disneyland and it’s no longer a theme park it’s an amusement park, like Six Flags.

In this sense, Disney hooks the heart so fans stay to play, which is why its studio business sits at the center of its flywheel and forms its core brand.

Using Disney’s model as inspiration, here’s how hybrid real-time studios can create a flywheel for the digital realm.

1. Third-Person POV: Storytelling & Discovery Engine

This is the narrative expansion and network bonding layer—traditional media adapted for real-time evolution. This is where fans fall in love with the IP (“hook the heart”).

Short-Form Narratives (Social Media, VTubing, AI) → Continuous character-driven content that bonds fans to the IP and keeps them engaged

Short-form narrative social content (e.g., real-time 3D Webtoons) serves as the top of the marketing funnel
Character avatars (e.g., VTubers) engage audiences in real-time
Agentic AI enables continuous, interactive engagement across social platforms

Long-Form Narratives (TV, Film) → High-fidelity storytelling that scales the entire ecosystem

100% real-time virtual production (no LED walls, all 3D mocap)
Distribute on streaming platforms to expand overall pie
All assets seamlessly transition into game & virtual world

Note: Distribution windows may be different in a hybrid studio model. For example, a first-run virtual film or TV series could be screened for the community ahead of wider release on streaming platforms and/or in theaters. All downstream windows could then follow a traditional strategy.

Media Mix (Live Events, XR Activations, Collabs, Physical Merch) → Connecting digital experiences to physical engagement, reinforcing the bond

Hybrid virtual/physical events drive deeper fan engagement
XR-enhanced concerts, meetups, interactive screenings
Ties digital experiences to real-world touchpoints

This short-form content strategy creates real-time feedback loops with fans, which is used to “bond nodes to the network” by focusing on two key engagement drivers: Consistency and Authenticity.

This turns fans into members of a community (see graphic below).

Flywheel amended from McKinsey framework

The idea is to cultivate a test-and-learn mindset that is attuned to the community and open to experimentation, forming a sort of social glue that bonds creators, fans, and IP together, helping fuel UGC, scale network effects, and, ultimately, increase digital-native payment conversion rates.

These efforts also incentivize the community to spread word-of-mouth (WOM) for the long-form narrative component (film/TV), which adheres to a more traditional distribution model. In other words, hybrid real-time studios effectively monetize traditional marketing processes.

From a business perspective, long-form filmed content is no longer the thing, it’s the thing that scales the thing. 100% virtually produced films and TV series act as the mass market accelerant that expands the overall pie across the other layers and funnels fans into the interactive realm, where value is captured.

We already see this dynamic play out routinely in legacy media, for instance when the release of Netflix’s The Witcher drove up demand for the game, Witcher 3: Wild Hunt (see image below), and again when HBO’s The Last of Us boosted sales for the game.

From a creative perspective, this narrative layer is arguably the most important. If poorly executed in the narrative layer, the IP turns into Six Flags in the interactive layer.

2. First-Person POV: The Economic Engine

This phase establishes persistent, interactive environments where fans can participate, transact, and create.

Virtual World (The Social Hub) A persistent sandbox for exploration and commerce

Themed digital playground where fans interact, trade, and socialize
Tokenized economy with digital collectibles and UGC-driven content
Persistent world evolution based on community contributions

Virtual Backlot (The Creator Economy Marketplace) A creator economy where assets can be bought, sold, and remixed

Digital asset store where users buy/sell skins, props, environments
Community-built mods & expansions drive network effects
Interoperability-first approach ensures assets work across platforms

Game Mode (The Fun Engine) A playable, interactive experience driving engagement and retention

Core gameplay experience (e.g., a Battle Royale within the world)
Drives retention and engagement through competitive/social play
Creates the basis for future expansions

The virtual world and marketplace are the economic engines, while the game mode is the engagement loop.

Here’s why it’s strategically advantageous to develop a virtual world (social hub/sandbox) and a game (e.g., Battle Royale mode) in parallel (I’ll leave the marketplace for Part 11), especially within a Hybrid Real-Time Studio framework.

The key is to structure development around shared assets and modular systems, so the team isn’t building two separate products but two layers of the same platform.

Let’s break this down:

Shared Foundation: One World, Multiple Modes

Both the social hub and the game mode live within the same core virtual environment, meaning they can (and should) share:

Environment assets (terrain, structures, lighting, weather systems)
Character avatars and animations
Physics and gameplay systems
Networking and player state logic
Token economy infrastructure

Think of it like Disneyland: the park is the world, and each ride is a “game mode” or “themed area.” The social hub is the park itself. The Battle Royale mode is like Space Mountain—a high-intensity experience nested within a larger social environment.

Parallel Development Tracks

To avoid bottlenecks, teams often split development into concurrent tracks:

Worldbuilding Team (Social Sandbox):

Focuses on persistent spaces, social mechanics (emotes, chat, parties), modular architecture, and creator tools.

Gameplay Team (Battle Royale Mode):

Builds out the core loop—drop-in, scavenge, survive, eliminate—and tightly tunes balance, matchmaking, and netcode.

Platform/Infrastructure Team:

Handles backend systems (identity, inventory, wallets, analytics), asset loading, and real-time network architecture.

As long as all teams adhere to a shared asset and tech stack, the team can build both experiences simultaneously without duplicating effort.

Development Phases: Staged Rollout

No need to launch everything at once. A smart rollout might look like this:

Phase 1:

Launch the social hub with avatar creation, emotes, and public spaces; it’s even possible to start early and small with the equivalent of the IP’s “Diagon Alley” and expand out from there, iteratively
Begin testing player concurrency, item drops, and creator tools

Phase 2:

Soft-launch the Battle Royale mode within a portal in the hub
Use early player data to iterate on weapons, gameplay balance, and session flow

Phase 3:

Introduce persistent progression systems, tournaments, and UGC integration
Add cross-mode functionality (e.g., wear your Battle Royale skins in the hub)

In summary, by aligning games, social, and film cadences, studios create a perpetual content cycle rather than a start-stop release strategy.

This creates an AI-infused flywheel, built on a blockchain backbone, where hybrid real-time teams bond, engage, scale, and capture attention.

The Future is Now

For roughly half a century, Hollywood and gaming have existed in separate silos. But technology has caught up to the way fans think—stories aren’t single-format experiences anymore. They are ecosystems, economies, and interactive worlds.

Hybrid Real-Time Studios are the next evolution of storytelling, blending the best of games, film, XR, and AI to create IP that never stops growing.

I call this Story Stacking. Similar to how Disney’s Theme Parks grow as more Themed Areas are added, 3D virtual worlds can scale as the founding team moves from one IP to the next-just out in front of the community-like a leadout in a bike race, incentivizing users to create content in its wake by harnessing the underlying IP rights, storylines, digital assets, and fit-for-purpose software solutions developed by the team.

These networks have the potential to dwarf current media & entertainment modalities. The technology is here. The audience is ready.

The only question left is: who will build it first?