Wednesday, October 27, 2010
Schannel Event 36888 ( 10 / 10 ) When Playing BFBC2 / MOH / Etc. - WTF?
Specifically, the event log entry in the windows system log is:
Event 36888, Schannel
The following fatal alert was generated: 10. The internal error state is 10.
When I first saw the error myself, I recognized it from my network programming days as an informational error, indicating some kind of barf-o-rama on the server side of a secure connection handshake. Unlike most of the other Schannel event IDs, this particular one seems to remain undocumented. Nonetheless, the Info opcode and 10 / 10 Alert Description and Error State hint strongly at it being server side.
Since it seemed to have no material effect on the playability of the game(s), my interest in investigating it stopped there. A recent poster, however, indicated that disabling their AV (Trend) caused the apparently related game issues to be remedied. While it appears that the game itself runs correcly despite encountering the Schannel error, it may be that some A/V that muck with everything on the wire might take drastic action in the face of it. Strange if some do, but plausible.
In any case, barring some other application / utility causing problems (e.g., said A/V), the error itself can be safely ignored. If it really bothers you, you can change the logging level via a registry change by modifying (or adding if needed) the key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SecurityProviders\SCHANNEL
DWORD value EventLogging with a value of 0 will eliminate such event log messages. Note that current versions of windows seem to be more voluble for these errors - on older (e.g. XP), the error may occur without a log entry being generated.
I became interested again in this event / error recently while tracing the traffic activity of the game debugging a separate issue. Both games are built on the same engine / network infrastructure, so it is not surprising they share the same frailties.
From an outsider's view (since I have no access to the game source code, nor the EA master or game servers, my view must be the result of probing and testing theories, using debuggers and traffic sniffers), the network infrastructure for these games is a bit of a wreck. In the same way one might surmise their neighbor is a slob from observations of the trash bags thrown on their front lawn, the mishmash of connections and traffic these games generate is appalling. The possibilities of problems due to one piece failing or being unavailable are surely a source of grief for many attempting to play these games online.
If this system was designed this way from scratch, someone should be publicly whipped with a length Ethernet cable. If it is the result of 'evolution' of features and functionality by adding servers to the 'master' pool, the time has come perhaps for EA to rethink the infrastructure and rebuild it from scratch.
In any case, the Schannel error in these games appears to be generated by an improperly configured EA server that provides them with hardware information à la Steam's hardware survey.
Another way to eliminate the error (and stop spying by EA, if that's your stance), is to add the following to the file \windows\system32\drivers\etc\hosts:
127.0.0.1 bf1943hwtelemetry.ea.com
This prevents the game(s) from even starting the handshake process, short-circuiting the error path.
In summary: The error is harmless, it is not the cause of crashes / etc. in the game itself per se though it appears it might throw programs such as A/V into a tizzy (when I feel like it, I may investigate this further.) You can just ignore it, or if it bothers you having it in your event log, take one or both of the steps outlined above.
Monday, May 17, 2010
Doing the Jitterbug: Lag, Latency, Jitter and Throughput and How They Affect Online Gaming (Part I)
Perhaps the greatest impediment to reaching this goal is caused by the vagaries of the underlying infrastructure that provides the communication between the game clients and servers: the Internet. The path to the goal of perfection is washed away by two primary effects in game communication over the Internet:
- Throughput limitations - the connection between server and client has some maximum throughput.
- Network delays - even with unlimited throughput, server messages do not arrive instantly.
The second, network delays, means that regardless of the available bandwidth, there is always a delay between the servers sending a message and the client receiving it and vice versa. In effect, the information in the message is time-shifted to a later wall-clock time: an event may happen at actual time X, but by the time the client receives the message of the event, the actual time is perhaps X+2. When the possibility of time-shifting is present, the possibility of causality violations rears its head: events may seem to occur in the wrong order for game clients. For example, a player may open a door, but that door isn't there in the game world because it was destroyed by a grenade (and seen as such by another online player), an event that the client component of the first player was not yet aware of due to the delays in message arrival. Other players might see the first player's avatar "opening" a door that isn't even there.
We will review the details of these problems, the effect they have for players of online games, and the techniques commonly used to minimize the unfairness between players that can result if these problems are left untreated. We will cover the definitions and details of the problems in this blog entry, with part II to cover the effects these have on players and the techniques used in games to minimize their impact.
Throughput and Latency:
The throughput requirements for online PC games vary widely, but in general are far below the available bandwidth of the typical client (we will only be discussing the client side and impact, obviously, running a server for a game dictates a much higher aggregate bandwidth requirement). Recent studies (Feng 2002 & 2005) using games such as Counter-Strike, Day of Defeat, Medal of Honor: Allied Assault, and Unreal Tournament 2003 showed a client load ranging from 6400bps to 8000bps for client to server packets and 20800bps to 36000bps for server to client communications. These are far below even lower-tired ISP services typically used by online gamers.
Congestion of the network may cause throughput to drop to a level that is insufficient for smooth game play. Congestion typically occurs in one of three primary areas: the last mile near the user, the middle or Internet cloud, and the last mile on the server side.
In the case of the user-side congestion, they may simply have a service tier that does not provide sufficient bandwidth. This can of course be remedied with a service upgrade. At a minimum, a service with 600kbps down and 50kbps up should suffice. The faster down link speed while not strictly required to play online will ensure faster downloads of game items such as server hosted maps.
The gamer should also ensure that other activities on their local network are not causing congestion. Other users gaming, streaming audio or video, using services such as torrents, etc. can all adversely affect the overall available broadband bandwidth for the player.
Problems in the last mile on the server side can be caused by too many players joining a specific game server, causing a bottleneck on the network link to that server. Game servers typically employ a player count limit to avoid this occurrence. Any other congestion in this link of the game communication network (router congestion or failure modes, etc.) is likely to be out of the control of both the player and their server provider.
Congestion in the Internet cloud is usually temporal: Perhaps a widely viewed sporting event is viewed by large numbers via streaming technologies. As with most last mile issues on the server side, these are out of the control of the server provider and game player. In cases where Internet cloud congestion is the cause of game play issues, the only remedy is to wait until the problem "goes away".
Any kind of congestion, whatever the cause, can cause throughput degradation that may adversely affect the consistency of game play. If the game client is starved of message packets due to actual throughput issues or congestion related throughput issues, the synchronization between the client and server will be lost, resulting in "laggy" game play, "rubber-banding", and other temporal effects. Severe throughput problems can result in the game client "giving up" and disconnecting from the game server.
There is no accepted and commonly agreed upon definition for latency (Delaney 2006). The latency of a network is commonly measured using the ping command. This however measures not the one-way trip from client to server or vice versa, but instead measures the round-trip time. Since the routes from client to server and server to client are usually asymmetric, simply guessing at half the value arrived at from a ping measurement may be grossly inaccurate, and provide incorrect information for making client and server timing decisions. In addition, such a measurement does not account for processing and other delays at the client and server endpoints.
A more useful measurement is the endpoint-to-endpoint measurement of latency that accounts for time needed for client-side processing, bi-directional network delay, and server-side processing (Stead 2008).
This is important: It has been found in studies that much of the delay in the overall game processing loop is caused by the game client handling and processing of messages.
The sources of network delay fall into four basic categories (Kurose 2009):
- Transmission delay: Packet time to physical layer.
- Queuing delay: Packet time waiting to be sent to a link.
- Processing delay: Packet time spent at routers along the route.
- Propagation delay: Packet time in physical link (bounded by the speed of light).
Queuing delay can occur at routers along the path of the packets. If a router is is under heavy utilization or the required outbound link is busy, the packet will be queued into a buffer until it can be sent.
Processing delay is also incurred at routers, since these must handle routing table checks, possible firewall rule application, packet check sum and error checking.
Lastly, even if delays in packet transmission due to processing overhead , transmission delays and queuing delays could be eliminated, we are still bound by the laws of physics. No signal can travel faster than light (2.998x10^8 m/s in vacuo). Speeds in actual transmission media will be lower (e.g. 1.949x10^8 m/s in typical optical fiber, significantly lower for twisted-pair copper). This means we are bounded by an absolute minimum round-trip latency of roughly 2 ms client endpoint to server endpoint and back for a client to server distance of 200 km.
Jitter:
Jitter is the variation in network latency caused by changes in the state of the network. Packets that comprise the communication between the game client and server seldom follow the exact same route endpoint to endpoint. This can cause packets to have different latencies. In addition, network congestion can result in changes in the routing and router buffering behavior, changing the queuing delays for the affected routers.
We can visualize this effect with the aid of a diagram.
In this diagram, packets are sent from the server represented by the lower solid line at regular intervals (time ticks) to the client represented by the upper solid line. If we were able to construct a network with none of the four causes of latency outlined, and in addition discovered a way to violate the laws of physics and send our packets with infinite speed, the green line results: there is no latency between the server sending a packet and the client receiving it.
The more realistic example is represented by the blue line, which shows the slight delay the packet experiences traversing the network from the server to the client. The orange line depicts the next packet in the sequence, which is delayed by the same amount as the packet of the blue line. In the ideal world, the latency from the server to client and vice versa would exhibit this constancy. This would simplify any "compensation" for latency the game developers might wish to utilize, and even without compensation, humans tend to have an easier time adapting to latency in a game when it is relatively constant, even when the latency is rather large (Claypool 2006).
More typically, the game packets experience changes in latency from routing and congestion problems. This is illustrated with the final train of three packets colored red, magenta, and dark brick red. For these packets, it is clear any semblance of packet arrival at relatively regular time ticks is completely lost. There is currently no standard measure for jitter in game traffic. Jitter in networks tends to exhibit randomness, but can be characterized by a Gaussian distribution for inter-packet arrival times (Perkins 2003). Since we are bounded by conditions such as some minimal amounts of processing, queuing, and transmission delay in addition to the absolute bound due to the propagation delay, the actual distribution is biased: there is some absolute minimum that can be realized, and network congestion and related issues can cause delays to be skewed. This is illustrated in the following graph.
The fit is sufficient that we can use this model for predicting the likelihood of specific inter-packet times for use in the design of compensatory mechanisms for games.
In part II of Doing the Jitterbug, we will investigate what effects these issues have on game play, and what techniques can be used to minimize these effects.
Interested readers can find references for further study after the jump break.
Sunday, May 2, 2010
I P in the Pool.
At the core of the IP protocol is the concept of addresses. You've probably seen these on you gaming machine with references to things like 192.168.1.2, etc. The IP address consists of four Octets (less pedantically known as Bytes). So the same example address can be written as 0xC0.0xA8.0x01.0x02 in hexadecimal notation or 11000000.10101000.00000001.00000010 in binary (usually done when manually calculating subnet information).
The IP address, since it consists of four octets, has a total of 2^32 available addresses. There are actually fewer available for actual use, since parts of the specification allow for specialized addresses for use by things like broadcast traffic, loopback, testing, etc. Even so, there are a bit over 4,000,000,000 unique addresses available for use.
Early on, the bodies (most notably the Internet Engineering Task Force, IETF) involved in the design and specification of Internet protocols and technologies realised that an address pool of this size would likely become insufficient at a future time, and designed mechanisms to effectively extend the size of the available address pool. The almost universally used mechanism is known as Network Address Translation, or NAT. NAT allows Local Area Networks (LAN) to reside on some segment of addresses defined in the protocol as private while providing bidirectional access to public WAN addresses. Your PC, using a 192.168.#.# address, is on one such private address. Among other characteristics of Private addresses is that they are non-routable, that is, once they reach any router, they are kept on the LAN - no information is leaked to the WAN side of the router, and packets with addresses confined to the LAN will not be routed via the WAN. This ensures such packets from private networks stay on the private network.
So how does a PC on a LAN with a private address gain access to the rest of the Internet WAN residing at public addresses if their own traffic must remain on their LAN? That is of course the primary function of the router via NAT. Any traffic from a LAN PC destined for a public WAN destination has the address information in the packet manipulated by the router to appear to have originated at the public address of the WAN side of the router. Any return traffic is directed toward the public WAN address of the router, whereby the router manipulates the address information of the packet and forwards it to the appropriate PC on the LAN. For an excellent overview, see Network Address Translation in an unusually well written Wikipedia entry.
Using NAT, we can have tens of thousands of PCs on a LAN residing on one single public WAN address. Other groups of PCs can be residing behind other routers with other public WAN addresses, where the PCs on different routers may have the same private addresses, but there will be no conflicts: the WAN outside of each router cannot even 'see' the LAN since it is composed of private addresses, so no source of conflict can exist.
Even with this 'multiplication factor' of addresses provided by NAT, there is a growing shortage of available addresses: there is a growing number of services that need public addresses, and an explosion of devices (phones, media devices, even refrigerators!) that use addresses. Technologists in the field predict we will run out of IPv4 addresses sometime in 2011-2012.
The IETF realized early on a better solution would be needed, and started work in earnest in 1991. By 1998, RFC 2460 resulted, defining IPv6. The biggest change in this new version of IP is an address space consisting of 16 octets, or 128 bits. This means, compared to the roughly four billion (4,000,000,000) addresses available in IPv4, IPv6 has over 340 Undeciliion. That's a huge number, written out in its full glory, it's:
340,000,000,000,000,000,000,000,000,000,000,000,000.
That's roughly seven IP addresses for each atom in each body of every human on the planet today.
Or about 3,120,000,000,000,000,000,000,000,000 (~3 Octillion) addresses for each man, woman and child that has ever lived on our planet (thanks, Population Reference Bureau). I think this will suffice for some time!
How long? To put this enormous number into perspective consider this: The age of the Earth is about 4.54 billion years. If we had been assigning IPv6 addresses for this entire period at the rate of 1,000,000,000 (one billion) addresses per second, we would have used a little over .00000000004% of the available address pool! It's BIG!
How big? If you imagine all of the IPv4 addresses mapped into the area of a space the size of one typical pixel of the display you're reading this on, the IPv6 address space would need a square about 52,500,000 miles on each side (thanks, WolframAlpha), roughly the surface area of 14,000,000 Earths.
An address space that large ensures every device on our planet can have a unique address, and completely eliminates any need for troublesome mechanisms like NAT (though there are proposals to retain such functionality for specific purposes).
IPv6 offers many other improvements compared to IPv4, such as much simplified address configuration options, much improved and intrinsic multicast support, mandatory network security support (IPSec), vastly simplified and improved handling by routers (no fragmentation by routers is allowed, PMTUD is mandatory), no header checksum is utilized (reducing processing load on routers), TTL replaced with Hop Limit (routers no longer need to make time in queue calculations), and vastly improved mobility components (no proxy/triangular routing, devices and whole subnets can move between router points without renumbering), jumbogram support (packets up to 4,294,967,295 bytes long), and vastly improved extensibility (options are implemented as extension headers, effectively meaning unlimited protocol extensibility without requiring any changes in the basic protocol design).
Being the protocol newborn, adoption rates are in the early adopter stages: only a fraction of a percent of current Internet traffic at the time of writing is IPv6. Nonetheless, technologies to ease the transition to IPv6 such as Teredo which tunnels IPv6 over IPv4 NAT using UDP, point-to-point tunnel services, etc. ensure that the migration to IPv6 will happen. It is not question of if IPv6 will be the predominant standard, but when. Behemoths such as Google are leading the charge, offering IPv6 interfaces to their search engine at ipv6.google.com and support for Google Services over IPv6 to compatible networks.
So why should we as gamers care about this? NAT is one of the primary culprits in connectivity issues for on-line gamers. Often, problems with the behavior or configuration of NAT in consumer routers result in spotty connectivity and sporadic connection problems. Additionally users that play peer-to-peer games or need to run a game server suffer all the slings and arrows of router configuration and port forwarding in attempts to attain proper game functionality.
IPv6 will eliminate the need for these machinations, providing a much more reliable, higher performance, more flexible infrastructure for on-line gaming.
Saturday, May 1, 2010
Black Holes Stephen Hawking Knows Not. But You Will.
Through the work of the brilliant physicist Stephen Hawking, master of all things "black hole", we know that we may even be surrounded my tiny, microscopic primordial black holes, and should the geometry of the universe be appropriate, we may find ourselves producing our own microscopic black holes at the rate of one a second through experiments at the Large Hadron Collider. Fortunately, these will evaporate nearly instantly. Recent work by astrophysicists points to the tiny primordial black holes as a likely culprit for the ultra-high energy cosmic rays occasionally observed on earth.
But there are back holes lurking here on earth already. You may be closer to one than you might imagine. They will gobble up your network packets as efficiently as their cosmic cousins quaff matter. I am of course referring to Black Hole Routers and their cohorts in the crime of packet theft, overly restrictive or misbehaving firewalls. What is a black hole router, and why should it matter to a gamer? After some preparatory material, we'll cover that in this blog entry.
In the typical network environment, IP provides the basic Internet layer mechanism on top of the nearly universal Ethernet V2 frame. The commonly used TCP and UDP protocols ride on top of the IP layer. Each Ethernet frame can have a data packet up to 1500 Bytes in size (we'll not go into exceptions for our examples here). The primary purpose of network devices is to route these protocols from senders to receivers, and back. Along any given path between a sender and receiver, there may be devices that are unable to process packets beyond a certain size limit. Usually, the device sending packets to such a device will simply break the packet into smaller packets, a process called fragmentation.
Fragmentation of packets has undesirable side effects, including a performance impact, so ideally we want to minimize this for a route. The same holds true for packets traveling the return path (which may differ from the send path) to the sender. A mechanism that will allow the sender to determine the ideal maximum packet size to avoid any fragmentation will maximize performance and minimize network overhead. The standard mechanism for achieving this goal is known as Path Maximum Transmission Unit Discovery, or PMTUD.
Devices along the path (usually exclusively routers) that must impose a limit on the packet size by fragmentation when the sender has explicitly disallowed fragmentation by using the IP "Do not fragment" (DF, the normal default behavior in modern systems) flag are expected to notify the sender of this limit. This is done thorough a standardized ICMP "Fragmentation needed and DF set" message, with the required limit encapsulated. A sender receiving such a message can adjust the outgoing packet size accordingly. All would be well if everything worked as it should, but in the real world there are routers that misbehave and do not respond with the needed ICMP message, and firewalls that block these needed messages. These are known as Black Holes.
The result is senders appearing to establish a proper connection (the three-way TCP handshake is successful), but having sporadic problems sending and/or receiving data. For a game, this can result in the appearance of lagging, connection loss, etc. (N.B., the PMTUD is normally reserved for the TCP protocol. While it can be used for UDP, it is seldom utilized due to difficulties in determining proper sender behavior for packets dropped by black holes.) Since most games use a combination of UDP and TCP, it is quite possible that black hole on the path between the game client and server(s) will adversely affect game play and connectivity.
Fortunately, there are steps the gamer can take to diagnose and remedy problems with black holes. The Ping utility provides a ready tool to aid us in our diagnosis. Using ping -f -l#, where # is the packet size, we can send probes toward the desired destination. The "f" flag denotes setting the "DF" flag in the IP header.
For example, the command:
ping -f -l 1450 74.125.19.103
will send packets of length 1450 to the destination IP of 74.125.19.103. The returned results will tell us much about the characteristics and behavior of devices along the path of the packets.
If the ping command returns with the expected results, we know that all is well, and no device in the path is requiring a smaller packet size than the length specified (with the addition of the IP overhead).
If the ping command returns with the ICMP "Fragmentation needed and DF set" message, we know that some device in the path needs a smaller packet size to avoid it fragmenting our packets. We can modulate the value of l until we find a length that passes the path with normal results.
If the ping command returns "Request timed out", we know that some device in the path requires a smaller packet size, but is not returning the needed ICMP message or that message is being blocked. As with the second case, we can modulate the value of l until we get the expected return results.
A good starting point for l to work down from is 1472, and 1464 for PPPoE connections. In particular, users of PPPoE should check their environment for proper MTU and PMTUD behavior: in some cases, when PPPoE is done on the router, the PC or responding server may only see the router MTU of 1500, causing dropped packets and connection issues. Check with your ISP to determine if your required MTU is other than the typical 1492 for PPPoe.
Once armed with the appropriate MTU value (l+28, 28 is the overhead for the IP and ICMP headers), we can configure our PC and router as required. Other more drastic measures may be used to diagnose or bypass black holes, such as enabling PMTU black hole detection, disabling PMTUD, and hard setting the MTU (MSS clamping) for the interface device. The details of these steps vary by device and OS, by router make/model, and are beyond the scope of a blog entry.
Consult the documentation for your hardware and OS environment for details, or use the web: there are many excellent detailed descriptions for configuring these settings. Or give Stephen Hawking a call. But you now probably know more than he does on the subject!
Thursday, April 29, 2010
I Scream, You Scream, We All Scream for DICE Scream!
Maple, for those not familiar with it, is an extremely sophisticated application for doing all things mathematical. The product has an amazing list of capabilities for mathematical analysis, graphing, and programming. It is however primarily a mathematical tool, competing with the likes of Mathematica and Matlab. I use all three, Mathematica being my personal favorite. For quick and dirty, however, I find the 'Document Mode' in Maple to be ideal for rapid exploration. I often do proof of concepts there, and when the ideas are fleshed out, move them to Matlab or Mathematica.
So how hard would it be to get to the Electronic Arts centralized server, using a tool completely out of its domain (kind of like using a champagne bottle for a baseball bat), without any of the raw socket nonsense that the game developers used? See for yourself - seven lines of Maple gets you the initial connection and response. A handful more lines would get you a complete server browser. Without the hassles the game introduces by using raw sockets. Pretty powerful tool, doing things out of its real domain. It makes me wonder even more: what were the game developers thinking when they chose to use raw sockets?
Maple Code:
with(Sockets);
sid := Open("159.153.235.12", 18395);
reply := Array(1 .. 65, datatype = integer[1]);
WriteBinary(sid, Array([67, 79, 78, 78, 64, 0, 0, 0, 0, 0, 0, 91, 80, 82, 79, 84, 61, 50, 10, 80, 82, 79, 68, 61, 98, 102, 98, 99, 50, 45, 112, 99, 10, 86, 69, 82, 83, 61, 49, 46, 48, 10, 80, 76, 65, 84, 61, 80, 67, 10, 76, 79, 67, 65, 76, 69, 61, 101, 110, 95, 85, 83, 10, 83, 68, 75, 86, 69, 82, 83, 73, 79, 78, 61, 53, 46, 49, 46, 50, 46, 48, 46, 48, 10, 84, 73, 68, 61, 49, 10, 0], datatype = integer[1]));
ReadBinary(reply, sid);
Close(sid);
convert(subs(0 = 32, 10 = 32, convert(reply, list)), bytes);
EA Server Response:
"CONN ATIME=1272627041 TID=1 activityTimeoutSecs=240 PROT=2"
Monday, April 26, 2010
Pings? We Don't Need No Stinking Pings!
Poppycock! The fix is to get rid of the poor coding choices that get this kind of result from my debugging traces of over a month ago:
socket: 0: Process 0xfffffa8005375060 (0x768 ), Endpoint 0xfffffa8005c49420, Family 2, Type SOCK_RAW, Protocol 1, Seq 1006, Status 0x0
(FAM: AF_Inet/IPv4) (SOCK: Raw) (PROTO: Stream)
The game is opening (or attempting to open) a Windows socket of type (3) sock_raw. The use of raw sockets has become increasingly restricted with each new version of Windows, and for good reason.
This is the reason the BFBC2 game executable must be run as an administrator, or have its privileges elevated, for the player to see pings properly in the server browser.
Readers having this issue, either run the game from a 'real' administrative account, or right-click on the game executable and mark it for compatibility mode "Run this program as an administrator". Do note, there can be other issues such as firewall, ISP, or PC configuration that may still prevent pings showing properly.
There is no reasonable reason I can think of that an application like this game needs to use raw sockets, forcing the user to compromise the security and functionality of their system to make the application work the way it should. There are several proper solutions to accomplish the goal of getting the needed data that do not require unneeded privilege use or elevation by the user.
That this information was handed to the devs, and nothing has been done to remedy it, despite a patch in the intervening time being released, is puzzling to say the least. A lunch hour's worth of coding changes should fix this amateur mistake that should never have been made. (How hard is it to get to the server without raw sockets? See "I Scream, You Scream, We All Scream for DICE Scream!" for an answer).
Fix this, DICE!
Tuesday, April 6, 2010
How could it possibly be my router causing connection issues?
Readers of game enthusiast forums have undoubtedly seen the sometimes heated exchanges between the It's all gotta be the GAME, crap developers! crowd, and the It could be a problem on your end gang.
I'm of the latter, if you've read any of my posts.
I produced a guide based on helping users in BFBC2 and other games, called Troubleshooting Multi-Player PC Game Connectivity Issues that covers many of the often subtle conditions that can cause connectivity issues for online games. This is rather long - there are many interacting factors that can cause these problems. Some may not have the time to read it, some just won't, sticking to their guns of It's all gotta be the GAME!
I've been attacked by a few, questioned by many. That's fine (the latter, at least), skepticism is good. Ignorance isn't!
I've compiled a list of references for those that want to take the time to see what experts in the field, and real users of many other games, have to say on this matter. In particular, there are much more detailed NAT technical information and the issues involved covered in some of these. It was not practical for me to include this level of detail in my document - it would have become a hardbound book!
If you're really interested in answering the question How could it possibly be my router causing connection problems? , these will help you get there. It will also help you to understand how things can be fine in every other game yet broken for a certain few games, with different users having differing experiences.
I've also included links to forum posts for many games, and many platforms, clearly showing that this kind of issue happens all the time.
Perhaps with a clearer understanding, these may help you solve your own connection problems.
They should also help you to dispel the myth of 'port forwarding has to be used' often repeated in forums. A read of Port Forwarding: Slaying the Mythical Dragon of Online PC Gaming will clarify how NAT and port forwarding are related, and why forwarding ports blindly is unneeded and potentially problematic when used.
Good reading!
References for NAT technologies and issues involved:
Superb overview by Geoff Huston of CISCO
http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_7-3/anatomy.html
Nice overview by the author of RakNet, with success/failure charts:
http://www.jenkinssoftware.com/raknet/manual/nattypedetection.html
http://www.jenkinssoftware.com/raknet/manual/natpunchthrough.html
IETF recommendations for NAT behavior:
http://www.ietf.org/rfc/rfc4787.txt
A very good overview, with diagrams:
http://en.wikipedia.org/wiki/Network_address_translation
For the absolute bibles for NAT and other TCP/IP technical information:
http://preview.tinyurl.com/yefzpfv
http://www.amazon.com/TCP-Guide-Comprehensive-Illustrated-Protocols/dp/159327047X
AnalogX NAT and Nat traversal issues overview:
Interesting test of 100 consumer routers. DGL-4300 top rates, Apple /3com/US Robotics the worst.
Only 43% of routers properly supported full cone NAT.
http://www.analogx.com/contents/articles/nattraversal.htm
A game developer talks about problems with NAT of differing types by consumers:
"If it is a router, it's the user's problem to solve it."
http://forum.unity3d.com/viewtopic.php?p=233448
Other forums for games where users have the same issues some have with game "X".
These same users could well be playing game "Y" without issue.
http://forums.gamesforwindows.com/p/1860/23980.aspx
http://utforums.epicgames.com/showthread.php?t=665969
http://support.microsoft.com/kb/840420 (Microsoft? What would they know?)
http://utforums.epicgames.com/showthread.php?t=602500
http://forums.epicgames.com/showthread.php?t=616452
http://www.dslreports.com/forum/r19418084-Gaming-Mode-Why-does-Dlink-recommend-disabling
http://www.dslreports.com/forum/r20554921-Xtreme-NAT-help-with-Netopia-2241n006
http://support.microsoft.com/kb/941207 (More Microsoft. Maybe they know something about networking?)
http://www.gtaforums.com/index.php?showtopic=353023
http://forum.instantaction.com/smf/index.php?topic=3631.0
http://blogs.msdn.com/johnmil/archive/2006/10/29/nat-traversal.aspx
http://www.gtagaming.com/forums/archive/index.php/t-103945.html
http://www.xfire.com/nat_types/ (widely used Xfire, and problems NAT can cause in the wrong router)
http://www.ureadit.com/solutions/home-network/79-xbox-live-compatible-router.html
http://www.gtagaming.com/forums/archive/index.php/t-103841.html
http://forums.eu.atari.com/archive/index.php/t-59626.html
http://www.bing.com/search?q=full-cone+nat+games&first=51&FORM=PORE
http://forums.eu.atari.com/archive/index.php/t-62799.html
http://openarena.ws/board/index.php?topic=3261.0
http://www.poweredbygamespy.com/services/view/category/connect/ (Gamespy - they brag at only having 10% failure of NAT)
http://boardreader.com/thread/Port_Restricted_Cone_Nat_Router_l9xyXvexg.html
http://text.broadbandreports.com/forum/r22505721-HSI-Known-NAT-problem-with-Charter-HSI
http://computerhelpforum.org/forum/networking/f43/full_cone_nat/t18037.html
http://forums.gametrailers.com/thread/nat-type-1-question/1042731
http://forums.epicgames.com/showthread.php?t=665969
http://www.bgforums.com/forums/viewtopic.php?f=58&t=12277
There are a myriad more, Google is your friend!
Thursday, March 18, 2010
Port Forwarding: Slaying the Mythical Dragon of Online PC Gaming.
Most blindly follow this advice, not even understanding what it means to 'forward a port', much less the ramifications of doing so. My intent is to set the record straight for the reader, so that they may better understand the how, what, when, why, and where of port forwarding. I have greatly simplified and generalized the terminology and examples, which may offend experts, but is appropriate for the intended audience.
To be clear, forwarding of ports is seldom if ever required to allow the client of the online PC game to function properly. Unnecessarily forwarding ports is not only undesirable, it may expose the gamer to security risks, and can interfere with proper functioning of their environment, including games.
The typical PC gamer has a pretty simple environment: Their PC, a router, a modem (perhaps a unit that combines the two functions of router and modem), and...and that's it. The router serves the function of shepherding traffic from the gamer's local area network (LAN) to the wide area network (WAN), where the online game servers 'live'. The modem provides the electronic means for the gamer to access the WAN infrastructure. In some cases, these two functions (router and modem) are combined into a single unit, variously called a router or modem, depending on who you're asking. Often, gamers have a router in their environment without knowing it - they've been told 'that's your modem'.
Why these pieces of hardware are used comes down to the subject of addresses. Each PC in a network must be assigned a unique address. The gamer is probably familiar with these. They're the number sets like '192.168.1.123' you might see for you PC on your LAN, or the '74.125.19.106' you might see if you ping http://www.google.com/. You've probably heard them called the 'IP address'. The important thing is that each PC must have a unique address. Much like your mail goes to a unique address, if different households could have the same address, you can imagine the mess that would ensue.
Now early in the days of the 'net', the groups defining various standards and protocols decided it would be wise to have addresses that were 'public', that is, known to the world as the address to send to, and 'private', that is, addresses that the 'outside' (WAN) world can't even see. This was done for many reasons including reducing the need for public addresses to be used, and to allow enterprises to split up a 'public' address into one or more internal 'private' addresses.
The router's primary function is to manage, control, and manipulate the barrier between the 'private' LAN and the 'public' WAN.
In a typical environment, the modem provides the connection to the WAN, giving the user on the 'inside' of the modem connection some public IP address on the WAN assigned by their ISP. The router takes the traffic from the PCs on the LAN and passes it on through the modem to the destination server on the WAN. We'll call this the 'request' to the server. The server does whatever it needs to process the request, and responds to the WAN address of the gamer. We'll call this the 'reply' from the server.
The router will keep track of requests sent out to the WAN, and in general only allow traffic from the WAN to a PC on the LAN if it determines that traffic is an appropriate reply from a server to a request from a PC on the LAN. Now the router/modem usually have one, and only one public WAN address assigned to them. What are we to do if we have several PCs on our lan that all want to make requests to the same server on the WAN and get their respective replies? The router does this for us through a mechanism generically called Network Address Translation, or NAT for short. There are many details we won't delve into here, a good overview can be found at http://en.wikipedia.org/wiki/Network_address_translation, with some useful references. Readers that wish a more in depth treatment might use the superb books by Comer at http://www.cs.purdue.edu/homes/dec/.
The problem NAT solves is analogous to sending mail between two apartment buildings. We know the street address where we want to send it (the IP address), and the apartment number. In the IP world, the apartment number is called the 'port'. For our PC game, the game client (what the gamer plays) needs to send requests to the game server(s), and it does so by sending requests to the IP address of the server, and including the port that address should go to on the server. The request needs to have a 'return address' so the server can reply, so the game will add the address of the game client, and the desired return port to the request.
Now as we've said, the client is on a private address. The server can't see this or do anything with it. So the router changes the address information, replacing the private LAN address with its public WAN address, and remembers the return address port for the request. If more than one PC on the LAN make a request to a server and specify the same return port, the router notices this, and changes the return port along with the return IP address, keeping track of which PC corresponds to which requested return port the router sends in the request to the server. When the server replies, it uses the return address of the client, which will be the public IP (WAN) address of the gamer, and the return port, which may have been changed from the actual return port by the router.
When the router sees this traffic, it peeks into the packet and determines which PC belongs to the requested return port. The router changes the return port to the one originally in the PC's request, if needed, changes the return IP address to that of the correct PC, and passes the traffic onto the LAN, where the PC that made the request will receive its reply from the server.
In general, we don't want random traffic coming from the WAN into our LAN. Because the router peeks into traffic to determine if it even belongs on the LAN, random attempts to enter the LAN are thwarted. Unless the user specifically needs to have requests from the WAN enter the lan (to a server of some sort on our LAN to reply to), this is precisely what we desire. Routers usually include some kind of 'firewall' capability, which considerably enhances the security of the client<->server interchanges, and provides even more probative capability toward unsolicited traffic from the WAN. We will not detail firewall functionality.
What if the gamer needs to have a server on the LAN that can be accessed by others on the WAN? How might we accomplish this? This is where the feature of the router called 'port forwarding' comes into play. The user can configure their router, and set it to allow traffic from the WAN to its WAN address into the LAN. The user does this by specifying what PC is going to reply to traffic on which port(s). For example, if we wanted to run our own web server on our LAN (or game server, just change the nomenclature and numbers), it would need to get requests on port 80, the default port number for HTTP (browser) traffic. If the PC running our web server on our LAN had an address of say 192.168.1.2, we would configure the router to forward any traffic from the WAN to its WAN address with a destination port of 80 to the PC at 192.168.1.2. When the web server (or game server) replies to the request, it is sent through the router back to the WAN address of the original requester. The same kinds of manipulations to the address happen via NAT as with the game client example, just in reverse. So forwarding is for clients on the WAN to get to a server on your LAN. Pretty simple, no?
Now, to kill the dragon!
Modern PC games played online need the game client to make requests to the game server. The game server, and other game clients, do not make requests to the game client. There are exceptions to this, namely some peer-to-peer games, and cases where one of the clients is also running the game server on one of their PCs. Both fall into the generalized description of a server from earlier. But in general, modern games are client-server based, where the server is run by a provider on the WAN, and the gamer plays the client on the LAN. At no time do the servers try to make an 'inbound' request to the client. Hence, forwarding of any ports to play the game is completely unnecessary, and accomplishes nothing. Forwarding ports when not explicitly required poses a security risk to the user, and can in fact interfere with proper traffic flow for games.
The game's client makes the requests, the router handles the manipulation and shepherding of the traffic to the server on the WAN and the corresponding reply traffic from the server on the WAN to the game client on the LAN. Not the other way around!
Unfortunately, 'You need to forward your ports!' is one tough dragon to slay, and this myth is constantly perpetuated in forums, and even by occasionally by misinformed game publisher support staff. There are even whole web sites devoted to the subject, with applications to automate this unnecessary and potentially security compromising router feature for the uninformed user.
Unless you are instructed that your game requires ports to be forwarded from an authoritative source (the game manual, the game developers, or in some cases the publisher with the caveat noted earlier), you are likely not required to do it. Abandon all hope ye that consider enthusiast forums to be an authoritative source!
To humanize it, think of it this way: You, in your household, act as the 'router' and 'firewall' in a way for traffic in and out of your house (your LAN). You, and others in the house are free to go out from the house to seek information (onto the WAN). When someone comes knocking at the door with the answer, you can peek through the peephole on your door and decide if you expected them, and let them into your house. If a stranger comes knocking, you're likely to decide they're uninvited, and not let them in. Port forwarding is giving a stranger the key to your door. In fact, its giving the key to your door to everyone in the world that knows how to get to your door! The 'experienced gamers' and 'net experts' that tell you there's no danger in forwarding ports when it's not explicitly required are doing just that: telling you it's OK to give the key to your front door to everyone on the planet. I'd venture most intelligent readers wold never subscribe to such nonsense.
How many readers in game enthusiast forums do you think blindly forward ports to 'fix' problems? How many of those same readers will download the latest coolest 'tweak tool' for the game when offered up on the forum? How hard do you think it would be to perhaps list some real ports for the game, and throw an extra one in that the later downloaded 'tweak tool' actually listens on, allowing a remote intruder in to the victim's PC to run amok? If you don't know how to verify exactly why a game should need ports forwarded, exactly which ports should be forwarded, and know exactly how to do this, you probably shouldn't. Since port forwarding, with a properly configured and behaving modem/router is not needed by any modern PC game client, you probably shouldn't anyway.
Before ending and in all fairness, it should be noted that some misbehaving or otherwise buggy routers can be 'worked around' by forwarding ports where this would not normally be required. Part of this 'You need to forward your ports!' malarkey is undoubtedly from uniformed users seeing this 'fix' an issue, not understanding that the problem in fact is elsewhere and the 'fix' is a bandage that may cause other problems and security issues. Used properly, this can allow routers that restrict the user to NAT other than Type 1/Cone to mimic a properly behaving full-cone router for a game. This will of course be limited to only one PC on the LAN side and will not allow multiple players to simultaneously play from the LAN if this work-around is needed. See Troubleshooting Multi-Player PC Game Connectivity Issues for examples of this.
I hope after reading this, the reader has a clarified understanding of what port forwarding is, and when its use is appropriate.
Wednesday, March 17, 2010
Troubleshooting Multi-Player PC Game Connectivity Issues
The suggestions are presented in a form generalized enough to prevent the document from becoming a book (and it's already plenty long), but not so much so that they become useless.
Using these suggestions with the details provided, along with resources on the web, your hardware and software documentation and Google should you need to do low-level changes where step-by-step details would have been impractical to provide for reasons of document length or where steps vary wildly for differing equipment, you should be able to resolve your connection problems.
Regardless, do note that this is not for the faint of heart: there is a lot of ground to cover and many subtleties regarding PC game connectivity and issues involved in troubleshooting problems.
You can view the current incarnation of this document at:
Please leave any comments and suggestions here, or via e-mail as shown in the document.