Improbable Icon

Forums

Stress test

Hello,
i have done kinda sorta stress test. I got 1000 of totally physical spheres, constantly moving on server via forces. There is 1 GameLogic worker handling them all, since multiple worker setup is not working correctly for me yet. Spheres are in local groups of various sizes, maxmium of 500 units. Player will see only one nearest group, so you can check how different crowd sizes perform. Each sphere is sending transform sync updates at 30Hz max. Client interpolates this updates.

So, testing various numbers, i have found out that 150 spheres visible by player are sorta stable. 250 are moving laggy, and this leads to disconnect of the player every time, because it “fails to respond to enough pings”. Something is overloaded here, but i am unsure what exactly. All metrics seems inside range for the template. Maybe it is related to FPS sample project player heartbeat code.


sphere groups in the world


moving spheres, as observed by client

So, i am wondering, how to exactly detect overload, what value to watch and for which critical range? Also, disconnect reason for “failing to respond to pings” seems strange, because player is getting updates from server normally, and then just drops. Maybe signal from player to server is not coming because of network overload?

And also, should multi-worker setup work on FPS starter project? I tried it with 4 workers (2 : 2 grid), but player is not seeing spheres at all in multi-worker setup. Just some spheres near the spawn point are visible and also lagging badly, groups are not visible.
Project is on “beta_don_glucose_565”

Hi @stalker23b,

I think you may be hitting the limits of this deployment. See the pricing page for details. For the template you’re using (w4_r1000_e10) the “Max op units per second per worker connection” for your template is 6,000. If you have 250 entities sending transform and position updates at 30hz that amounts to (250 transforms + 250 positions) * 30 hz = 15,000 ops per second.

Now i am getting disconnected at 150 objects. But still can’t see exact reason on metrics. Where should i look?

There is some metrics

There is constant 35-40kHZ ops from GameLogic worker handling all spheres on map. These are updates to runtime? Are they subject to 6k ops per worker connection limit?

If i understand correctly, i should have limits of 75k ops total, 6k ops per connection, and 10gb egress. What is being violated?

Hey @stalker23b,

I believe that the problem is that since you have 35-40kHz ops all coming from that one GameLogic worker, that’s where you are over the 6k ops per worker connection limit. I’m not sure how the limit is enforced, but from what I can tell that is the problem you are facing.

As for why your client seems to be getting updates normally while still getting disconnected, there are a few possibilities. One thing to consider is that not all ops are equal, so perhaps the client heartbeat is failing before transform updates fail when the server load is too high.

Edit: As for other metrics to look at, It’s also probably worth looking at CPU usage (for your GameLogic worker) and runtime to worker latency (SpatialOS Metrics). If CPU usage is averaging over 80% you could start to see logic problems happening on the server. If the latency has spiked it means your workers are getting far out of date from the ops the runtime is trying to send, and this is also a bad sign. I like to use the 99th quantile latency metric since it will catch if just one kind of op is consistently being slowed down.

So, i tried to use multiple workers to balance the load between them, with 3x3 grid. It seems to work ok in the inspector.
But there is error spamming in logs, and player can see entities only from one worker, closest to spawn place. Also, when client comes to another worker’s authority zone, player’s entity can not move anymore. Client can still move around in 3d scene and see other entities updates, but player’s entity in inspector is frozen in place.

This error spamming in logs (worker8 is the closest one to player spawn) :

[Worker: UnityGameLogic8] InvalidOperationException: Not a valid subscription
Improbable.Gdk.Subscriptions.RequiredSubscriptionsInjector+Handler.OnUnavailable () (at <646a4b04ee9847119123ca06a7010b70>:0)
Improbable.Gdk.Subscriptions.SubscriptionAggregate.HandleSubscriptionUnavailable () (at <646a4b04ee9847119123ca06a7010b70>:0)
Improbable.Gdk.Subscriptions.SubscriptionAggregate+Handler.OnUnavailable () (at <646a4b04ee9847119123ca06a7010b70>:0)
Improbable.Gdk.Subscriptions.Subscription1[T].SetUnavailable () (at <646a4b04ee9847119123ca06a7010b70>:0) Improbable.Gdk.Subscriptions.EntitySubscriptionManager.<.ctor>b__2_1 (Improbable.Gdk.Core.EntityId entityId) (at <646a4b04ee9847119123ca06a7010b70>:0) Improbable.Gdk.Subscriptions.Callbacks1+WrappedCallback[T].Invoke (T arg) (at <646a4b04ee9847119123ca06a7010b70>:0)
Improbable.Gdk.Subscriptions.Callbacks1[T].InvokeAllReverse (T op) (at <646a4b04ee9847119123ca06a7010b70>:0) UnityEngine.Debug:LogException(Exception) Improbable.Gdk.Subscriptions.Callbacks1:InvokeAllReverse(EntityId)
Improbable.Gdk.Subscriptions.EntityRemovedCallbackManager:InvokeCallbacks()
Improbable.Gdk.Subscriptions.ComponentConstraintsCallbackSystem:Invoke()
Improbable.Gdk.Subscriptions.RequireLifecycleSystem:OnUpdate()
Unity.Entities.ComponentSystem:InternalUpdate()
Unity.Entities.ScriptBehaviourManager:Update()
Unity.Entities.DummyDelagateWrapper:TriggerUpdate()
-[WorkerLogger:UnityGameLogic8]

So, is FPSSample intended work in multi-worker setup? Is there some steps to make it multi-worker compatible?