Wolfram|Alpha: Systematic knowledge, immediately computable.

Monday, May 17, 2010

Doing the Jitterbug: Lag, Latency, Jitter and Throughput and How They Affect Online Gaming (Part I)

Consistency: In an ideal online online gaming world, every player in the game would see the same consistent game world, and every client of a game server would have access to exactly the same game state information at any given instant. No player would suffer from seeing actions in the game world at a later time than other players, or losing a shooting duel because the opponent has a faster, lower lower latency connection to the game server. In the real world, this is of course impossible to achieve with existing technology.

Perhaps the greatest impediment to reaching this goal is caused by the vagaries of the underlying infrastructure that provides the communication between the game clients and servers: the Internet. The path to the goal of perfection is washed away by two primary effects in game communication over the Internet:
  • Throughput limitations - the connection between server and client has some maximum throughput.
  • Network delays - even with unlimited throughput, server messages do not arrive instantly.
The first, throughput limitations (bandwidth), means that the game state transmitted to game clients is at best a sample, or approximation, of the actual complete game world. The bandwidth and processing power simply do not exist in typical environments to allow all of the information required to completely represent the game state to be communicated. Just as the typical music CD has less information than the master recording from which it is made due to the sampling of the audio signal to match the constraints of the CD audio format, so too do the messages that the game clients and servers interchange. By careful crafting of the structure of the messages used, game developers can provide an approximation that is sufficient for realistic game play, so long as the minimum throughput requirements of the game are met.

The second, network delays, means that regardless of the available bandwidth, there is always a delay between the servers sending a message and the client receiving it and vice versa. In effect, the information in the message is time-shifted to a later wall-clock time: an event may happen at actual time X, but by the time the client receives the message of the event, the actual time is perhaps X+2.  When the possibility of time-shifting is present, the possibility of causality violations rears its head: events may seem to occur in the wrong order for game clients. For example, a player may open a door, but that door isn't there in the game world because it was destroyed by a grenade (and seen as such by another online player), an event that the client component of the first player was not yet aware of due to the delays in message arrival. Other players might see the first player's avatar "opening" a door that isn't even there.

We will review the details of these problems, the effect they have for players of online games, and the techniques commonly used to minimize the unfairness between players that can result if these problems are left untreated. We will cover the definitions and details of the problems in this blog entry, with part II to cover the effects these have on players and the techniques used in games to minimize their impact.

Throughput and Latency: 

The throughput requirements for online PC games vary widely, but in general are far below the available bandwidth of the typical client (we will only be discussing the client side and impact, obviously, running a server for a game dictates a much higher aggregate bandwidth requirement). Recent studies (Feng 2002 & 2005) using games such as Counter-Strike, Day of Defeat, Medal of Honor: Allied Assault, and Unreal Tournament 2003 showed a client load ranging from 6400bps to 8000bps for client to server packets and 20800bps to 36000bps for server to client communications. These are far below even lower-tired ISP services typically used by online gamers.

Congestion of the network may cause throughput to drop to a level that is insufficient for smooth game play. Congestion typically occurs in one of three primary areas: the last mile near the user, the middle or Internet cloud, and the last mile on the server side.

In the case of the user-side congestion, they may simply have a service tier that does not provide sufficient bandwidth. This can of course be remedied with a service upgrade. At a minimum, a service with 600kbps down and 50kbps up should suffice. The faster down link speed while not strictly required to play online will ensure faster downloads of game items such as server hosted maps.

The gamer should also ensure that other activities on their local network are not causing congestion. Other users gaming, streaming audio or video, using services such as torrents, etc. can all adversely affect the overall available broadband bandwidth for the player.

Problems in the last mile on the server side can be caused by too many players joining a specific game server, causing a bottleneck on the network link to that server. Game servers typically employ a player count limit to avoid this occurrence. Any other congestion in this link of the game communication network (router congestion or failure modes, etc.) is likely to be out of the control of both the player and their server provider.

Congestion in the Internet cloud is usually temporal: Perhaps a widely viewed sporting event is viewed by large numbers via streaming technologies. As with most last mile issues on the server side, these are out of the control of the server provider and game player. In cases where Internet cloud congestion is the cause of game play issues, the only remedy is to wait until the problem "goes away".

Any kind of congestion, whatever the cause, can cause throughput degradation that may adversely affect the consistency of game play. If the game client is starved of message packets due to actual throughput issues or congestion related throughput issues, the synchronization between the client and server will be lost, resulting in "laggy" game play, "rubber-banding", and other temporal effects. Severe throughput problems can result in the game client "giving up" and disconnecting from the game server.

There is no accepted and commonly agreed upon definition for latency (Delaney 2006). The latency of a network is commonly measured using the ping command. This however measures not the one-way trip from client to server or vice versa, but instead measures the round-trip time. Since the routes from client to server and server to client are usually asymmetric, simply guessing at half the value arrived at from a ping measurement may be grossly inaccurate, and provide incorrect information for making client and server timing decisions. In addition, such a measurement does not account for processing and other delays at the client and server endpoints.

A more useful measurement is the endpoint-to-endpoint measurement of latency that accounts for time needed for client-side processing, bi-directional network delay, and server-side processing (Stead 2008).
This is important: It has been found in studies that much of the delay in the overall game processing loop is caused by the game client handling and processing of messages.

The sources of network delay fall into four basic categories (Kurose 2009):
  • Transmission delay: Packet time to physical layer.
  • Queuing delay: Packet time waiting to be sent to a link.
  • Processing delay: Packet time spent at routers along the route.
  • Propagation delay: Packet time in physical link (bounded by the speed of light).
Transmission delay occurs during the movement of the packet to a physical link. For example, if you are using a 1Mbps WAN connection, each bit takes 1 µs to send, and a 500 byte packet takes 0.5 ms.

Queuing delay can occur at routers along the path of the packets. If a router is is under heavy utilization or the required outbound link is busy, the packet will be queued into a buffer until it can be sent.

Processing delay is also incurred at routers, since these must handle routing table checks, possible firewall rule application, packet check sum and error checking.

Lastly, even if  delays in packet transmission due to processing overhead , transmission delays and queuing delays could be eliminated, we are still bound by the laws of physics. No signal can travel faster than light (2.998x10^8 m/s in vacuo). Speeds in actual transmission media will be lower (e.g. 1.949x10^8 m/s in typical optical fiber, significantly lower for twisted-pair copper). This means we are bounded by an absolute minimum round-trip latency of roughly 2 ms client endpoint to server endpoint and back for a client to server distance of 200 km.

Jitter:

Jitter is the variation in network latency caused by changes in the state of the network. Packets that comprise the communication between the game client and server seldom follow the exact same route endpoint to endpoint. This can cause packets to have different latencies. In addition, network congestion can result in changes in the routing and router buffering behavior, changing the queuing delays for the affected routers.

We can visualize this effect with the aid of a diagram.


In this diagram, packets are sent from the server represented by the lower solid line at regular intervals (time ticks) to the client represented by the upper solid line. If we were able to construct a network with none of the four causes of latency outlined, and in addition discovered a way to violate the laws of physics and send our packets with infinite speed, the green line results: there is no latency between the server sending a packet and the client receiving it.

The more realistic example is represented by the blue line, which shows the slight delay the packet experiences traversing the network from the server to the client. The orange line depicts the next packet in the sequence, which is delayed by the same amount as the packet of the blue line. In the ideal world, the latency from the server to client and vice versa would exhibit this constancy. This would simplify any "compensation" for latency the game developers might wish to utilize, and even without compensation, humans tend to have an easier time adapting to latency in a game when it is relatively constant, even when the latency is rather large (Claypool 2006).

More typically, the game packets experience changes in latency from routing and congestion problems. This is illustrated with the final train of three packets colored red, magenta, and dark brick red. For these packets, it is clear any semblance of packet arrival at relatively regular time ticks is completely lost. There is currently no standard measure for jitter in game traffic. Jitter in networks tends to exhibit randomness, but can be characterized by a Gaussian distribution for inter-packet arrival times (Perkins 2003). Since we are bounded by conditions such as some minimal amounts of processing, queuing, and transmission delay in addition to the absolute bound due to the propagation delay, the actual distribution is biased: there is some absolute minimum that can be realized, and network congestion and related issues can cause delays to be skewed. This is illustrated in the following graph.


Graph of Gaussian (Red) and skewed/biased distributions (Blue) for inter-packet arrival times.

The fit is sufficient that we can use this model for predicting the likelihood of specific inter-packet times for use in the design of compensatory mechanisms for games.

In part II of Doing the Jitterbug, we will investigate what effects these issues have on game play, and what techniques can be used to minimize these effects.

Interested readers can find references for further study after the jump break.


References (informal: This is not an academic paper, enough is provided so that the reader can Google for PDF archives of papers, or purchase the textbooks):

Cited in the text:

(Claypool 2006). Latency and player actions in on-line games.
Communications of the ACM (49)

(Delaney 2006).On consistency and Network Latency in Distributed Interactive Applications...
Presence: Teleoperators and Virtual Environments 15.

(Feng 2002). Provisioning on-line games. An analysis of a busy Counter-Strike server.
Proceedings of the 2nd ACM Sigcomm workshop on internet measurement.

(Feng 2005). A traffic characterization of popular on-line games.
IEEE/ACM Transactions on Networking (13).

(Kurose 2009). Computer Networking: A Top-Down Approach (5th Edition)

(Perkins 2003). RTP: A/V Transport for the internet.

(Stead 2008). A simple method for estimating the latency of interactive real-time graphics simulations.
Proceedings of the 2008 symposium on virtual reality software...


Other references used but not cited in the text:

(Armitage). Networking and Online Games: Understanding and Engineering Multiplayer Internet Games

(Comer). Internetworking with TCP/IP Vols. 1-3

(Comer). Computer Networks & Internets

(Gregory). Game Engine Architecture 

(Kozierok). The TCP/IP Guide: A Comprehensive, Illustrated Internet Protocols Reference

(Smed). Algorithms and Networking for Computer Games

(Steed). Networked Graphics: Building Networked Games and Virtual Environments 
 
(Stevens).  TCP/IP Illustrated Vols. 1-3

(Tanenbaum). Computer Networks

7 comments:

  1. I can't wait! I've tried some of the tweaks posted at the BFBC2 forums. One setting in particular made it worse for me. I've seen the "invisible door being opened" before, and other strangeness. I never thought about the fact that packets could arrive in an altered order of being sent.

    ReplyDelete
  2. I've seen some pretty funny ones: vehicles driving on their own, with the "driver" momentarily floating nearby, still in the driving position. I've seen players walk through doors that were "closed" for me. And of course we've all had to duel where we know we shot first and did so accurately, but some how the other guy kills us. I'm sure that many that have been called hackers were simply the beneficiaries of some kind of time-warping (not saying that hackers aren't a problem, though!)

    I didn't specifically talk about the possibility of the disordering of packets to the point they were out of sequence because that pathological of a case is usually transparently handled either by the network layer, or by the netcode of the game, or simply ignored. Depending on how the game netcode is built, you could consider such cases as having jitter greater than the tick rate of the server, I suppose.
    Thanks for the comment!

    ReplyDelete
  3. Few questions:

    1. Isn't it true that having a higher bandwidth connection will decrease the transmission delay/overall latency for each packet sent (e.g. 50mbps compared to 1mbps)?

    2. Even if you have DSL (rather than Cable), there will still be congestion issues along the route, right? The only thing DSL helps with is the last mile as you put it, correct?

    3. When can we expect Doing The Jitterbug part II? Can't wait to learn more on the subject.

    ReplyDelete
  4. Oh, and I forgot to ask you... Why is it that ever since Valve updated Counter Strike 1.5 to 1.6 (which was a long time ago) it's been so hard to get kills in the game? I've always had more than enough bandwidth to run the game and even when I shot people point blank, I still wouldn't get kills. When I changed ISPs and my internet to 15mbps down and 1.7mbps up which was a while back, I tested the game once again to see if maybe my previous ISP was throttling game traffic and there wasn't much difference. I'm shooting people and they don't die while the Pros get kills so easily. I'm not a new player to the FPS genre, I've been playing for a long time and can't figure this out. Does it have to do with the netcode they implemented in CS 1.6 such as adding lag compensation or is it my internet being overloaded or what? My ping to the servers I've tested this on is 15-25ms and seems like ping doesn't make a difference for me. I don't have any problems with registering shots in BF:BC2. Any idea as to what could be in effect here?

    ReplyDelete
  5. @Anonymous 10/30/11 12:36 :
    1) No. Ceteris paribus, bandwidth is not related to latency. Think of a dump truck and a car, both traveling at 60 MPH over a mile. The latency for both is one minute, but the dump truck has higher bandwidth (it can carry more). That said, network infrastructures supporting higher speeds likely have newer / faster gear, so some decrease in overhead / computational latency could be seen, but it would be generally inconsequential (i.e., small fractions of a millisecond).

    2) Correct.

    3)As you might surmise, my postings tend toward random bursts of creativity. This topic is deep - I am working through three texts and ~100 research papers on the subject. Add to that I've been busy gaming, and progress has been slow. Perhaps by end of year.

    Thanks for the comments!

    Rob

    ReplyDelete
  6. @Anonymous 10/30/11 14:30:

    Beats me! I don't play the game (too old and slow for it, I'm afraid), so I can't comment on changes to it, though if they did change hit logic / net code, one could certainly expect some change in one's hit ratio. I seriously doubt it's your net/machine that changed things: games (as a client) need surprisingly small bandwidth. An exception can be when a game has dramatic changes in rendering, that stress older machines: this can cause jitter / lag in the client transport of packets - check your CPU utilization when playing.
    Might want to ask the same question on one of the big CS forums - perhaps others have noted the same.

    Rob

    ReplyDelete
  7. I know Valve decreased the bullet damage but that's not the problem as it seems to work fine for others. I think it has to do with router congestion in the Internet cloud? I live in a pretty big community and everyone has Cable so maybe that's what's causing it. Once in a while there would be a game where the hit detection would be great but in most games, it's pretty terrible. Anyway, I guess it will be an unsolved mystery. Half-Life and Counter Strike are what got me into FPS gaming and have been curious as to what might be causing the bad hit detection but never figured it out. My current machine is an i7-2600K with a GTX 570 and the game runs great but the hit detection is awful as with all my other machines and setting configurations.

    Anyway, thanks for trying to help and for answering my questions. I'll keep an eye out for your article.

    ReplyDelete