Improbable Icon

Java Worker API Sample Project

java-worker-sdk

#1

After seeing a few mentions of the Java API last weekend, I decided to give it a go myself.

The following github is a small sample project where I replaced the FSim with a java worker for managing entities on a 2D grid.

Java Worker Github

It’s nothing fancy and far from polished, but I figured I’d make a little repository out of it just in case it’s useful for anyone looking to get started using the Java SDK themselves!


#3

Hey @cascaid, glad to see you got stuck in with one of our experimental APIs :slight_smile:
We’ve been building some workers with the Java SDK too, so I wanted to add some comments/suggestions to consider:

  • Log via the connection, as well as to the provided log file, using connection.SendLogMessage. This way your logs will appear (properly categorised) in the Cloud Log Viewer - indispensable for cloud deployments!
  • Report a load metric to inform the platform of your worker’s current load. This is crucial for dynamic load balancing, but also for your own information so you have some chance of diagnosing your workers if you have issues. Check your worker’s load in the Inspector - a high load metric is a big red flag!
    You can do this by registering a callback on dispatcher.OnMetrics. A decent load heuristic to start with is totalTimeSpentProcessingOpList/timeSinceLastMetricsCallback - as metrics are typically collected less often than the event loop completes!
  • Consider blocking connection.getOpList calls, to avoid thrashing your event loop. Your current approach uses Thread.sleep to prevent this, but with a long enough timeout (2 seconds in this case), you can end up in trouble when a worker instance can’t handle a burst of traffic (so the message queue fills), or don’t respond to pings in time (so get killed for latency).
    As an alternative, you can use connection.getOpList(timeoutMs) to block whilst fetching the OpList for up to timeoutMs milliseconds. getOpList will block until there is some message to fetch, or the timeout expires. In examples like this which don’t need regular ‘tick’ rates but should be responsive, this has some nice benefits.
  • Close resources that need it to avoid memory leaks and other funkiness. Many Java SDK classes implement Closeable so best to allocate them using try-with-resource (like you are with the log file already). Connection and OpList should be handled this way, for example.
  • Clean up registered callbacks. Dispatcher (and View) callbacks will return a long when registered, which can then be used to remove them. Consider doing this when your callback objects get destroyed, so that there’s no chance callbacks get invoked on destroyed objects. It’s unlikely but can happen.

For a simple worker like your example, you can probably get a way with not doing most of this. However if you/others use this as a template for bigger and greater things, I hope you find some of the above useful.

Karl


#4

Hi Karl,

Thanks for taking a look, some great points there which I’ll make sure to incorporate, especially the information regarding the getOpList method.

Regarding the load metric, I notice the docs suggest using the current queue length to calculate this. Is this generally the method other workers like the FSim would calculate load, or is number of entities it has authority over likely to be a more consistent benchmark?

One thing I found would have been very useful when putting this together would be a ‘-sources’ jar for the SDK.
Is this something that would be possible, or is the java SDK closed source?


#5

Load metrics are hard - how can you know exactly how many commands your worker can handle, or how many entities? Depending on what exactly you’re doing, an entity may be very cheap to be authoritative over, or very expensive.
Ultimately it’s down to your application and logic.
The examples in the docs use a simple queue of tasks with a size limit, but you’d have to know the profile of the code, and the limits of a single process to set that limit intelligently.

Using a time-based metric attempts to ignore all that difficulty. If you spent 500ms of the last 1 second processing stuff, you’re probably at 50% load.
However, that’s only a reasonable approximation if your worker is single-threaded, and CPU-bound. If you’re multithreaded, or would run out of memory before CPU time, maybe time spent processing isn’t a good metric.

UnityWorkers (aka FSims before v10) are different as they have a target framerate, so each frame has a time ‘budget’. If the event loop uses half the budgeted time, that’s 50% load.
You can actually take a look and see in Assets\Improbable\Sdk\Src\Unity\Metrics\UnityFixedFrameLoadMetricProvider.cs

re: a -sources jar, that sounds interesting, but maybe worth a new topic to explain why :slight_smile: