Improbable Icon

Pro-tips on Commands

bestpractice
v9-0-0

#1

I would like to share with you all what you could encounter when working with commands.

It can happen that your commands experience what is called a ‘local time-out’ or that you receive a message ‘DEADLINE_EXCEEDED’. For as far as I have experienced this can happen in the following situations:

  1. The worker containing the CommandReceiver that listens for this command is not yet ready. The solution to this is to try again. I recommend to add a retry facility to your commands when they are vital to your gameplay.
  2. If your CommandReceiver is on a MonoBehaviour: the MonoBehaviour may be disabled because the ACL prohibits that worker from handling the MonoBehaviour. This will cause your application to have no handlers registered for your command, causing the sender to wait until timeout for a reply.
  3. The method that handles the command does not call the ‘Respond’ method of the ResponseHandle object that is passed to your Command Receiver. Your sender will wait until it actively receives a response; when it doesn’t receive one then the actions may be executed but because no response is given it will timeout.

To illustrate the above, allow me to demonstrate it with a bit of example code:

public class PlayerInfo : MonoBehaviour {
	[Require] private Player.Writer playerWriter;

	void OnEnable() {
		playerWriter.CommandReceiver.OnChangeName += OnChangeName;
	}

	void OnDisable() {
		playerWriter.CommandReceiver.OnChangeName -= OnChangeName;
	}

	void OnChangeName (ResponseHandle<Player.Commands.ChangeName, ChangeNameRequest, EmptyResponse> handler)
	{
		playerWriter.Send(new Player.Update().SetName(handler.Request.name));

		handler.Respond(new EmptyResponse());
	}
}

The above MonoBehaviour is responsible for changing the name of my Player. If I were to omit the Respond method call, or omit the CommandReceiver registration in the OnEnable then the sender will be confronted with an error saying ‘local timeout’.

Also: if there is no worker that has write permission for the Player component than this MonoBehaviour will be disabled and as such the OnEnable is never invoked; meaning no one will hear you change that name :wink:

p.s. although I haven’t encountered this I can imagine that a worker migration can cause a listener to be temporarily unavailable and a command to fail with a timeout. As such: ensure that vital commands are always checked and sent again at a later stage.


#2

Great post as always @draconigra
I’ll point @beth at this and see if we can get some of this written up in the docs.
Great stuff!
Cal


#3

Hi @draconigra,

1.The worker containing the CommandReceiver that listens for this command is not yet ready. The solution to this is to try again. I recommend to add a retry facility to your commands when they are vital to your gameplay.

How can the sender know that that there is no command receiver?

From https://spatialos.improbable.io/docs/reference/9.0-alpha/workers/unity/commands:

If you try to invoke a command on an entity that does have the component that specifies this command, but you haven’t implemented any handlers to respond to incoming requests, then you will simply get back an error in your callback.

But how can you be sure because of:

If you receive StatusCode == StatusCode.Failure, then the command may or may not have been successful.

Is there a different error code received?


2.If your CommandReceiver is on a MonoBehaviour: the MonoBehaviour may be disabled because the ACL prohibits that worker from handling the MonoBehaviour. This will cause your application to have no handlers registered for your command, causing the sender to wait until timeout for a reply.

But the docs say:

If you try to invoke a command on an entity that does have the component that specifies this command, but you haven’t implemented any handlers to respond to incoming requests, then you will simply get back an error in your callback.

Shouldn’t that return an error message instead of running into a timeout?


3.The method that handles the command does not call the ‘Respond’ method of the ResponseHandle object that is passed to your Command Receiver. Your sender will wait until it actively receives a response; when it doesn’t receive one then the actions may be executed but because no response is given it will timeout.

I had some command receivers which didn’t respond (i just forgot the respond call) but i haven’t observed this. Is this logged as an error?


I must state i haven’t tried case 1 and 2 myself yet.


#4

From the perspective of commands there is no difference between an error or a timeout; both are errors. So the status is FAILED and in the ErrorMessage of the response you will see that it mentions there was a timeout.

I think it is a best practice to make your commands idempotent so that you can retry them without side effects. This means that a command that is supposed to perform an action may be spammed but will still perform the action once.

In the case of the CreateEntity command you could attempt to retrieve the entities using a Query and see if the one that you just wanted to make is there or not.

I had some command receivers which didn’t respond (i just forgot the respond call) but i haven’t observed this. Is this logged as an error?

The Sender will receive a FAILED statuscode and the error message will contain the message that a local timeout has occurred.


#5

Do you have an example how you do this? In my oppinion one can’t create commands to be idempotent in cases like TakeDamage(int amount), as this can’t be applied multiple times. And sending the result health for the target object would cause other problems.

One option would be that every command call gets an sequential number per sender so that the receiver can check if it has already processed this. The receiver has to hold a list with sender and last x used sequential numbers in this case.

How does improbable handle this?


#6

Do you have an example how you do this? In my oppinion one can’t create commands to be idempotent in cases like TakeDamage(int amount), as this can’t be applied multiple times. And sending the result health for the target object would cause other problems.

For one: I try not to create commands that send values from a client to a worker. If I do create such a command then that is by accident :wink: This is to prevent cheaters from gaining tools to cheat with; an example here would be that a cheater would be able to intercept the TakeDamage call using a proxy and change the amounts (this can be done using proxies and SSL doesn’t completely protect against it).

Do you have an example how you do this?

Examples are highly case dependent; let’s take your TakeDamage as an example. This can be solved in various ways that are still resistant to cheating. On example could be that you have the following command:

Damage(int expectedHealth)

With this command, you will need to know the health of the entity that you want to damage. As long as you verify that the health of the entity equals that which is provided you will be able to send this command infinitely since it will only do something guaranteed once.

Downside to this method may be that if another player damages the given entity in the meantime that your Damage command will no longer be valid and thus doesn’t do anything.

Let’s try to think of another approach:

What you could indeed do is provide a UUID with a command and register the UUID to verify that your command has arrived and been processed. This will however increase traffic so I can recommend keeping the amount of verified traffic rather low.

Another approach

Another way of approaching your example is by changing how we think about the problem. You can also model this behaviour without commands by making the decision whether an entity takes damage entirely in the FSim, in OnCollider for example, and writing to the state directly. State writes are guaranteed IIRC and as such you don’t have to make your actions idempotent.

This all has to do with commands acting similar to queued messages; there is no guarantee that a command arrives only one time. A command may arrive 0 or even multiple times (the latter is more or less assumed based on how distributed systems work).


#7

For one: I try not to create commands that send values from a client to a worker.

Yep, that would be game design for cheaters :slight_smile:

With this command, you will need to know the health of the entity that you want to damage.
Downside to this method may be that if another player damages the given entity in the meantime that your Damage command will no longer be valid and thus doesn’t do anything.

Yep

What you could indeed do is provide a UUID with a command and register the UUID to verify that your command has arrived and been processed. This will however increase traffic so I can recommend keeping the amount of verified traffic rather low.

Yep

Another way of approaching your example is by changing how we think about the problem. You can also model this behaviour without commands by making the decision whether an entity takes damage entirely in the FSim, in OnCollider for example, and writing to the state directly. State writes are guaranteed IIRC and as such you don’t have to make your actions idempotent.

First thought: I don’t agree, but this needs deeper discussion:

A SpatialOS simulation has more than one FSim and you never know if the target entity which collides or is discovered by a raycast is the authoritative version as a entity can be checked out on multiple FSims. Ok, one way to make sure the authoritative version is updated is to process the interaction (collision/raycast) on all FSims. In my implementation i do raycasts only on the authoritative FSim and therefore use commands to notify the target that has been hit. This behaviour can be changed easily to achieve what you describe.

Result would be

  • for raycasts: a slight increase in performance used (n (number of FSims the object is checked out) raycasts instead of 1)
  • for collisions this should make no difference as they are processed on all nodes (unless you disable colliders)
  • guaranteed action results

Sounds good so far.

But what happens if the authority of an object changes while you call methods on them? Will there be a small timeslot where no version is authoritative? If yes the problem would be almost the same… no damage would be applied.

In either case… Commands or State writes used as described… maybe the error rate is so low that it is negligible? Of course depends on game mechanics…

Maybe @Improbable can tell about their experiences…


#8

The edge case you’re talking about does exist so you’re absolutely right.
One worker will always be authoritative: but if the command is in-flight when the change happens then you will essentially “drop” that command and you’ll have to retry.

We’re working really hard at the moment on the concept of “best-practise” which is essentially what you’re talking about here. I’m hoping to get some new docs and examples out which will certainly help!

This is a fantastic discussion chaps. Keep it up. I know our product team is going to love this :smiley: <3


#9

One worker will always be authoritative: but if the command is in-flight when the change happens then you will essentially “drop” that command and you’ll have to retry.

Actually, in-flight commands currently block authority changes on the recipient.


#10

Actually, in-flight commands currently block authority changes on the recipient.

Does this also apply to state changes?

Assuming there is no network or implementation error… can then there be a case where commands/state changes are not executed successfully?

Looking forward to the best practises!


#11

Does this apply to components or the whole entity?


#12

Actually, in-flight commands currently block authority changes on the recipient.

Sorry, I’ve since learned this was based on an outdated design and is actually incorrect. Authority changes can definitely fail commands. My apologies for the misinformation. We might have some ideas about some ways to mitigate this on our end in the future.


#13

Thanks for the update @stu

@callumb When can we expect the best practise guide on commands?


#14

Beth did a great job on introducing them as a concept here and we’ll be working hard to get some more examples your way soon.


#15

@callumb please add some advanced examples, e.g. like handling command failures in case of authority changes. For me properly using commands is the only uncertain topic i have with SpatialOS. Currently i’m thinking about my game architecture and what needs to be changed to adress this problem after @stu brought it back up again.


#16

I’m a little surprised there isn’t a message bus to allow resilient messaging… But anyway, is there a way to tell who is calling the command? I would hope you could check if the caller is a non-unity worker for example? Or do you have to build your own authority such as discussed above?

Thanks!