Genvid Cluster Servers

Genvid Cluster Servers

Supervisor Servers

The supervisor servers are usually a small group of micro servers that act as a point of registration and authority for all the other services. A single server is enough to run the cluster, but usually, 3 or more servers are preferred to achieve high availability of the cluster. The group of servers will keep the state of the orchestration services consistent using a gossip protocol with a simple majority, allowing half of the supervisor servers to go down without affecting the service. The principal tasks run by the servers are:

  • High-Available Key-Value store for configuration.
  • Secure Key-Value store that secures, stores, and tightly controls access to tokens, passwords, certificates, API keys, and other secrets (See Vault)
  • Service registration exposed both through a DNS and HTTP API.
  • Monitoring all services and reporting the health of the different worker instances.
  • Scheduler service able to scale up and down the different services and restart them when they go down.

Given the criticality of the services running on them, the supervisor servers usually don’t run any other tasks, instead the supervisor servers delegate tasks to the worker servers.

Public, Encoding and Internal Workers

All worker servers run a client version of the orchestration services, reporting to the supervisor servers. The worker servers are responsible for executing and monitoring the tasks scheduled by the supervisor servers. The difference between them are mostly minor:

  • The public worker is the only one with public ports open to everyone.
  • The encoding worker is dedicated to the composition and encoding tasks.
  • The internal worker is a cheaper machine, with more stable load.

The internal and encoding workers are only accessible to a limited set of pre-registered network range, as a security precaution.

Being on the edge of the network, the public workers usually require far more running instances than the internal workers. In the future, automation functionality will be added to the services to allow automatic provisioning and deallocation of servers when the load goes up and down. The internal workers have a far more stable load, usually related only to the game needs, although internal workers also run the Messaging Bus which has a load proportional to the number of clients, and a portion of the load supported by the public workers. Finally, the encoding workers have usually the most stable load, given that the number of stream to encode is usually well-known and fixed.