Extreme lag

Elsiekelsie · November 11, 2018, 3:57pm

Anyone else having insane amounts of lag and disconnects today?
I’ve hardly been able to play all day.

HiggsFoton · November 11, 2018, 3:58pm

Yeah, needed to restart the game three times today. Its usually around places with portals i noticed.

Krollbar · November 11, 2018, 6:29pm

Which region do you guys live in, and which portals are you using? The main connection issues are USA–>Europe and Australia–> anywhere.

Krollbar · November 11, 2018, 6:30pm

Europe and Japan–>anywhere tends to be fine

Honourless · November 11, 2018, 6:58pm

I just got disconnected on Alnitans with “character still connected” message. Hope it fixes itself soon.
/It did fix.

Elsiekelsie · November 11, 2018, 7:33pm

I live in UK. It was specifically worse around portals, although was also struggling at my home in Xa Frant.
Seems to have calmed down mostly now, with only a spike every 10-15 mins

Manabanana · November 11, 2018, 9:12pm

The problem is the uplink that the boundless servers from EU (EU->USA). It’s getting throttled somewhere. I was able to solve about 80% of my problems by changing the chunk download rate to 4mbits.

No idea why this is a problem. I regularly work remotely with clients in EU with >250mbit down/up, 0 packet loss, and about 150-200ms latency.

It also sucks for me cause I have 0 problems with a higher download rate on every single other boundless server as well. Ideally there would be a separate chunk download configuration for EU, USA, and AUS servers so we could set them accordingly.

I’m suspecting this is going un-detected on their end because they are looking at useless counts like packet drops and FIN,ACKs, as opposed to combing through the packet captures and looking at the window sizes.

Kamenik · November 12, 2018, 7:45am

same here my connection is ok but in this game extremly lags spikes FPS lags spikes mirrors againand error ce on ps 4 again

lucadeltodecso · November 12, 2018, 7:59am

The clients don’t download chunks directly from the game servers, but from a CDN which on cache misses will then itself download from the game servers, so chunk downloads should not have an effect on the specific game server connection. I’m not an expert in this kind of networking (or any in particular) but it sounds to me like if throttling your chunk download rate improves things, then your ISP may be the one throttling your connection to the gameserver whenever you happen to also be downloading a lot of chunk data quickly??

Manabanana · November 12, 2018, 2:08pm

@lucadeltodecso

Hi Luca,

I’ve noticed a few behaviors that lead me to think this isn’t the case:

I have no issues with the ec2 instances running in Virginia.
I have no issues in off-peak hours to the EU servers. Usually when I get unplayable lag, it’s when there are 30 or so people on an EU server.
It won’t go away. I’ll just keep rubber banding forever, until I have to log out or change my skills to warp to a different planet. I think this might be important because I’d assume if it was issues getting the chunk data only, I’d eventually download it all at some point.
I’ve tried different ISPs (Comcast vs Verizon), same result.
My ISP claims the opposite policy, allowing fast bursts above my cap for the first part of a connection. (I know people don’t like to trust ISPs, so there is that).

My professional life has me dealing with network protocols, the thing I keep coming back to is TCP over a long fat line (like the one between me and the amazon datacenter in Germany) tend to have very sinusoidal type behaviors. Meaning, you can get high throughput on average, but moment to moment, the bandwidth (and RTT times) can vary drastically.

I fired up Wireshark this morning (no lag right now), and it looks mostly clean. I’ll take a capture if I can reproduce the problem. I have a few questions:

What size are the ec2 instances? Amazon can throttle as low as 250mbit per server depending on size.
What behavior causes the client to report : Unplayable connection?
What would cause me (my client) to rubber band?
I see UDP going to the server and TCP via Websockets coming back. I’m guessing part of this process is my client requesting chunk data and then getting it back. What is being used to throttle the data coming to me? (I know a TCP is receiver paced, but what throttles the “chunk download” to 8mbit). Does it depend on the client request rate?
Is there a TCP application level heartbeat with location coordinates coming from the server that is getting delayed due to the slow connection, thus causing my client to snap my character all over the place?
Are you seeing and UDP packet drops on some servers? I wonder in particular if you see any on 52.59.222.228 (Gloviathosa). (AWS instances are horrible for UDP traffic in my humble opinion)

Next time I see the issue I’ll also setup a proxy in Virginia, since I know I have no problems there. It should be Amazon backbone all the way from Virginia to Germany. It would be interesting to see if there is still a problem.

Apologies for all the questions. The packets are encrypted, otherwise I would be able to work much of it out it myself probably.

lucadeltodecso · November 13, 2018, 12:19am

you sound more knowledgeable than me on networking but the chunks definitely do not come via websocket, they are independent http requests, and the throttling is roughly based on the client request rate yes.

the rubberbanding is when your inputs to the server are just not arriving quickly enough and the server has no choice but to keep moving you based on last inputs (or if input-protection is turned on in the networking options, it will reset all your inputs whilst it hears nothing from you). The client can’t know about that and keeps predicting ahead until it gets physics state indicating that it mispredicted a second earlier and prediction has to rewrite its history based on the corrected past-state (and if you are really struggling to get inputs to the server that could happen multiple times per second)

the inputs are sent over UDP (and keeps sending the last N frames of input up-until the last acknowledge from the server to handle dropped packets) but if it detects that the UDP is falling over (has dropped packets, or the rough latency is higher than the measured websocket latency for whatever reason) it falls back to sending the inputs over websocket in parallel until it sees the UDP stabalise enough to only send over UDP again (I’ve never actually seen this happen though, the only times the UDP has practically speaking ever been dropped and started sending on websocket is when latency on both the websocket and udp is far too high for either to be good enough)

the “unplayable connection” is based on the websocket latency (there are frequent ping/pongs to keep measuring the latency and keeps the game clocks synchronised with the server too), there are some hueristics in it, but largely its just if it sees the latency ever spike above 450ms, noting that the latency measure includes overheads in the client/server too rather than trying to be a pure network latency (so may be say, 30ms more than the true one-way latency)

I believe we are running on EC2 C5 instances, I don’t know which model.

Manabanana · November 13, 2018, 11:51pm

Unplayable lag on all the EU servers today at approximately 10:15 PM UTC.

I took a packet capture today… here is an excerpt of what I’m seeing:

There seems to be a whole lot of individual requests, maybe those are the HTTP requests you were talking about.

I believe the constant connection I see would be the websocket then (I’m not able to dissect the packets because of encryption and I don’t have to tools to capture the secrets). On this connection I am seeing packet loss with the server.

In this case I believe it’s Gloviathosa, is trying to send me some data. The flow is from Glovia to my PC.

I see out of order segments, which cause me to send duplicate acks to enable TCP fast retransmissions. This means a packet was lost in transit from Gloviathosa to me. I do eventually receive the missing packet(s) (in many cases as fast as possible… the very next packet). The missing packets are marked as retransmissions meaning packets are not getting delayed. They are getting dropped.

This is causing my RTT to spike though, which is especially bad because the server is located far away in Germany (I’m in the states).

So my RTT Is on average very low ~168ms (fast as heck for Europe). Anyway it looks like 1 packet drop is causing a delay up to 510 ms. I guess that’s not a great latency, but honestly… it seems the game network code should be able to handle that. It does explain why the “unplayable connection” screen is showing.

Does something special happen when it’s deemed unplayable? What I’m observing is the game is warping me back to a previous position (maybe due to some code related to “unplayable connection” it’s probably the last acknowledged position from the server). And then I’m warped forward to where the server “thinks I am” (which is where my client was before).

Honestly if #3 (the client moving me backwards) never happened, the problem would be resolved. My connection is otherwise fine. Dropping 1 packet in TCP is a pretty normal situation. It’s the mechanism used for flow control.

Another option might be, don’t let the client snap the position around during “unplayable connection”. Even if my player was forced to stand still in that scenario it would feel much better than warping too a fro.

Manabanana · November 14, 2018, 12:15am

@lucadeltodecso As a follow up I’ll take some captures of what a normal play session looks like on a US server see what that looks like.

lucadeltodecso · November 14, 2018, 12:20am

“Another option might be, don’t let the client snap the position around during “unplayable connection”. Even if my player was forced to stand still in that scenario it would feel much better than warping too a fro.”

there is an option in the game settings under network for just that (pause inputs something something)

Manabanana · November 14, 2018, 12:26am

Interesting. I can’t do any tests right now because I can’t even connect to the server. The USA servers are perfect though.

I pinged the server and got a response. 167 ms RTT with 75% packet loss.

I should also mention that ~85% of the time I have no issues even on the EU servers. This is an exceptional event but it does seem to coincide with a lot of other folks having connection issues as well (judging from the forum).

Shadesmar · November 14, 2018, 12:30am

its because steam is down right now

Manabanana · November 14, 2018, 12:30am

I just logged into Gyosha Ophin just fine. Is EU steam down?

Shadesmar · November 14, 2018, 12:32am

was for maintaince… its up again now

Manabanana · November 14, 2018, 12:35am

Not sure that’s related. I’ve been able to log in and log out of Boundless just fine several times in fact over the last hour, since I’m on the US server side. Glovia still “down” from my side:

Also saw the message “Too many active connections” “Timed out…” and “Game sever down”

lucadeltodecso · November 14, 2018, 12:38am

meanwhile pinging the same server here from Japan I get a nice stable 240ms and 0 packet loss over a few minutes, can also connect to the server fine in-game.