A Streaming Bestiary
By Kyle Wilson
Sunday, November 27, 2005
I've been doing some thinking about streaming recently. I'm going to discuss some options for streaming data off disk, some of the tradeoffs involved, and how game design interacts with streaming systems. In the examples that follow, I'm going to assume that
- we're developing a game for some console with 512 megs of memory
- we can stream resources off DVD at a rate of 8 megs/second
- we reserve 64 megs up front for executable code, render targets, and other fixed overhead
- we allocate 192 megs for shared assets--textures, animations and model data that don't get streamed because they're reused across levels
- we allocate the remaining 256 megs for streamed level data
So what are our streaming options?
Unstreamed loading is not a competitive option for titles on the coming generation of consoles. If we want to fill the 256 megs of per-level data I've outlined above, it will take 32 seconds, given the load speed I've assumed. I doubt most players are willing to wait that long to get into a game.
Maybe with compression and good data placement on DVD, we could load content twice as fast as I estimate. That still means that as a player moves through our world, he'll regularly be confronted with loading screens that take over fifteen seconds. The more detailed our world is--the more immersive it is--the more dense it will be, and the more often we'll yank the player out of that immersive experience with the interruption of a loading screen.
Quality titles on the Xbox 360 and the PS3 will stream data extensively to give players a continuous uninterrupted experience of the worlds they present.
Perhaps the simplest streaming implementation is to treat memory for model and texture levels-of-detail (LODs) like a cache. The renderer always renders the highest available LOD for a model. LODs are streamed into memory, nearest first. Low LODs (low-poly meshes for models and low mip-map levels for textures) are always kept in memory. In the worst case, a player sees ugly objects until higher-detail versions pop in, but something is always rendered.
This approach has related costs and benefits. The cost is that it's very difficult to tell at any time how much content must be streamed to present a given area at an acceptable level of detail. The lack of discrete streaming regions means that there can be no easy design budgets or metrics. The benefit of caching is that it is automatic, and doesn't require work from the designer to specifically partition a level for streaming.
Unfortunately, the consequences of cached streaming are likely to be more apparent to the player than they are to the designer. Consider Halo 2, which used cached streaming: players were frequently confronted with popping as nearby models streamed in appropriate levels of detail too late for graceful transitions to occur. To be sure that this will never happen, we need to forget about cached streaming and consider region-based streaming.
Unidirectional Region-Based Streaming
The easiest form of region-based streaming to implement is purely one-dimensional, and relies on the player's always moving forward. As the player advances through a region, the next region he will enter is streamed in. Levels must be designed so that:
- the player cannot see the next region until it has finished streaming
- the player cannot return to the region he has just exited
- unique content in any region is limited to some streamable amount
Play looks like:
- Player enters region
- Player sees exit
- Player exits region
Streaming of the next region happens between A and B. Because we've reserved 256 megs for streamed level data, and we need to be able to hold two regions in memory at any one time (the current region and the next region the player will enter), our budget for any single region is 128 megs of unique data. We can stream 128 megs in 16 seconds. It therefore must take at least 16 seconds for the player to move from point A to point B.
Bidirectional Region-Based Streaming
It's more difficult to stream a continuous linear world in which the player can move either forward or backward. This effectively halves the time available for streaming each region:
- Player enters region
- Player no longer can see entrance
- Player is equidistant in travel time between E and G
- Player sees exit
- Player exits region
Slightly less than half as much time is available to stream in upcoming content in a bidirectional streaming system as in a unidirectional streaming system. Streaming of the next region cannot begin until the player has advanced halfway through the current region; at any earlier point, we dare not discard the previous region because the player could turn around and go back into it.
A further complication is the possibility of environmental changes made by the player to the previous region. If the player can affect the region in any ways that will be visible when he returns to it, then those changes need to be saved off when the level is swapped out and need to be restored when the level is streamed back in. This further reduces the time available for streaming.
In our example, each region is again 128 megs and takes 16 seconds to load. Travel time from point F to either point E or point G must be at least 16 seconds. Spatially, F and G or E and F must be as far apart as A and B were in the unidirectional example.
N-directional Region-Based Streaming (Branching)
Both unidirectional and bidirectional streaming have branching variants that allow for some choice in destinations. In either case, though, the costs and benefits are similar. The only difference is that time available for streaming starts at the nearest branch point.
Omnidirectional Region-Based Streaming
Omnidirectional streaming permits a continuous streaming world that allows the player to move freely in any direction. The GTA games and Mercenaries use omnidirectional (presumably) region-based streaming algorithms.
Because the player can move in any direction, much more content, in terms of area, must be streamed in over less time than in a linear streaming game. This will result in a lower density of detail in the world. This is why Mercenaries, for example, is less detailed than Splinter Cell.
The first step is to break the world up into tiles. Assuming square tiles (a hex grid would allow more efficient use of space, but would be harder to index into), the player could at worst be standing at the corner where four tiles meet, and all four would need to be loaded. If the required viewing range is small enough, having only four tiles visible at a time might be sufficient. The caveat is that if the character moves along a diagonal at top speed, the system must be able to stream in three new tiles before the player can move from the center of the current tile to within viewing range of the edge. This can be seen in the image to the right. The dark gray tiles are loaded. The light gray tiles must be loaded before the player gets within the chosen viewing range.
Assume that we have a grid of city tiles 200 m on a side. We want to keep a 2x2 region in memory and insure that the player never gets within 100 m (one-half tile width) of the edge of the currently-loaded region.
At worst case, the player moves diagonally. We need to stream 3 tiles in the time it takes the player to travel sqrt(2*0.52) = 0.71 tile widths. We need to budget memory so that we can have 7 tiles in memory at once.
We've allocated 256 megs for streamed level data, which gives us a budget of 256/7 = 36.6 megs of unique content per 200 m city tile. We need to stream 3 tiles * 36.6 megs = 109.7 megs, which will take 13.7 seconds to stream. The player can't be allowed to travel 0.71 tile widths in less than 13.7 seconds. That works out to a maximum speed of 23.2 miles per hour.
If the viewing range is greater than a fraction of a tile, then more than four tiles will need to be loaded at a time. Define
Len(x) = sqrt(2x2),
the length of the diagonal between two edges of length x that meet at a right angle. If we try to keep an N tile x N tile section of the world loaded at any time, then at worst case we will need to stream 2N-1 tiles in the time it takes the player to travel Len(0.5N - view range). (And we need to allocate enough memory for N2 + 2N - 1 tiles to be held in memory at once.)
For example, assume that we have a grid of city tiles 100 m on a side. We want to keep a 5x5 tile region in memory and insure that the player never gets within 100 m (one tile width) of the edge of the currently-loaded region:
At worst case, the player moves diagonally. We need to stream 2N-1 = 2*5-1 = 9 tiles in the time it takes the player to travel Len(0.5N - view range) = Len(0.5*5 - 1) = Len(2.5 - 1) = Len(1.5) = 2.12 tile widths. We need to budget memory so that we can have N2 + 2N - 1 = 52 + 2*5 - 1 = 34 tiles in memory at once. (You can count them in the diagram above.)
We've allocated 256 megs for streamed level data, which gives us a budget of 256/34 = 7.5 megs of unique content per 100 m city tile. We need to stream 9 tiles * 7.5 megs = 67.5 megs, which will take 8.4 seconds to stream. The player can't be allowed to travel 2.12 tile widths in less than 8.4 seconds. That works out to a maximum speed of 56 miles per hour. If the game will have faster vehicles, then tile budgets will need to be reduced.
These examples give the misleading impression that more tiles are more efficient. That's because I made the simplifying assumption that DVD read speeds are constant. In fact, as we increase the number of tiles to be streamed, we increase the amount of overhead spent on seeks and reduce the amount of data to be read. For better efficiency, fewer tiles are better. It's good to have a sense of the balance between resource density, memory requirements, and player travel speed, though.
What Gets Streamed
I've been vague so far about exactly what content gets streamed. In theory, any content--textures, mesh data, or game object state--could be streamed under any of the systems mentioned. In fact, however, most engines are set up for streaming groups of game objects together, in the form of a level. Streaming meshes or textures alone require new systems that bind resources to a particular region.
As I mentioned before, if the player can return to regions that have been released from memory, we're going to be doing extra work regardless. In theory, we can either
- keep all game objects persistent in memory, but bind their model and texture resources to individual streaming regions, to be loaded and freed with the appropriate region, or
- free game objects when their region is freed, saving their state to disk to be reloaded when the player reenters the region.
That assumes that we have access to a hard drive or memory unit. This doesn't appear to be a safe assumption on next-generation consoles, so our choice is simpler. We can either keep all game objects in memory or we can free them and lose all information about their current state.
The different approaches to streaming that I describe aren't really competitors. They describe very different games, and the appropriate streaming solution is determined by its game's requirements. As is usually the case in life, there are tradeoffs in streaming. The more flexible our approach to streaming is, the less detailed our worlds will be. As in so many game technologies, the more we constrain the player, the more life-like we can be within the bounds of those constraints.
Any opinions expressed herein are in no way representative of those of my employers.