Production Deployment

A production-level Everyware Cloud deployment requires planning. You should take some time to think about several aspects of the deployment such as securing communications, access to services and availability/resiliency of the instance. This page explains some topics you may consider to set up a production-ready Everyware Cloud instance.

Data Encryption

When the services are exposed to the public we recommend to disable HTTP and switch to the HTTPS version which guarantees an higher level of security. Everyware Cloud Admin Console and API services do support plain HTTP connections; however, plain HTTP connection should be used only during initial setup of the instance or for instances like self evaluation or development, HTTPS guarantees an higher level of protection. Moving to HTTPS requires creating and configuring proper certificates; please take a look at the Certificate section in the Installation with Helm Charts chapter.

Same can be said for MQTT connections. MQTT protocol is implemented on top of TCP/IP; use MQTT over TLS (aka MQTTS) to secure communications between your devices and the EC instance.

Domain Names

While it is technically possible to access the EC front end services through their IP address, we recommended using meaningful DNS Names like for example mqtt-broker.example.com, console.example.com, api.example.com etc.
DNS Names work well especially when used in combination with the certificates mentioned in the section Encrypted communications. Moreover, having a DNS Name for your services:

  • simplifies the management of the infrastructure in case when the IP address needs to be changed for any reason
  • solves the issue of updating your (possibly unattended) clients when the IP address is changed
  • solves the issue of changing (and re-issuing) the TLS certificates every time the IP changes

The choice of using DNS Names or not depends on your application domain. If you are sure enough that your IPs are static along the lifetime of your instance you can go with plain IPs. Even in this case, however, opting for DNS Names is a much more clean solution aligned with state of the art practices and can save you a lot of work later while running the instance.

Resiliency and High Availability

Resiliency and Highly Availability in an Everyware Cloud instance are a combination of product and infrastructure features.

Everyware Cloud is a distributed application composed by several connected service components running in Docker Containers (check the list of components here). Container orchestrators are used to support the configuration, deployment, scaling and lifecycle management of the components. Resiliency and High Availability are achieved leveraging the functionalities of the orchestrator and those of the underlying infrastructure.

Supported EC deployments are Kubernetes based, however Kubernetes comes in different flavors and integration level with the underneath infrastructure (i.e. data center technology). Check the list of supported orchestration platforms and versions here.

Everyware Cloud software distribution comes with a set of Helm Charts that are used to deploy and configure the application within the supported orchestration platforms. Check installation guide here.

Restart Policy

First High Availability feature used is the Restart Policy. Each EC service component runs within a Kubernetes Pod. The pod is configured with the Always Restart policy. When a pod fails for some reason, Kubernetes control plane services will restart it so that the service component will be up and running again as soon as possible.

Replication

Second High Availability feature used is the Pod Replication. With just the restart policy in place when the node where a pod runs fails, the pod and ultimately the service components in it are no longer available. Pod replication works by setting a desired number of pod replicas across the cluster. If a node fails, the Kubernetes control plane will start its pods elsewhere in the cluster wherever there are enough resources to run it. If too many replicas are present for some reason, control plane will kill the exceeding replicas in order to maintain the overall number consistent with the desired target number.

Default replica number for EC service components is one. Check the next section Horizontal Scaling for more details about which service components can be scaled to more than one replicas.

Horizontal Scaling

Third High Availability feature used is horizontal scaling. The workload may increase over the capacity currently deployed so that requests from EC clients cannot be fulfilled as desired. Pod Replication helps to cope with these situations. Capacity of the service can be increased by increasing the number of replicas of a pod across the cluster.

EC supports horizontal scaling for two of its service components: the Messaging Service (MQTT) and the RESTful API Service. Check how to replicate these two services here.

Increase and decrease of the number of replicas is manual, EC administrators should quantify the current workload and the workload profile in the short and medium term and plan the amount of replicas required and ultimately the number of nodes as a consequence. Administrators should also take into account average and peak workloads in order to provide the desired service level.

Multiple Zones

Fourth High Availability feature is the multiple failure zones deployment. If a zone becomes totally or partially unavailable and the cluster is all in the dzone the application may become unavailable as well. A possible way to avoid this is spreading the nodes of the same Kubernetes cluster across multiple failure zones. Some infrastructure providers support distribution of a cluster over multiple failure zones. EC doesn’t provide this feature in its software distribution, however if the feature is available, EC deployment may benefit of it.

Infrastructure Notes

Resiliency and High Availability relates to the number of worker nodes. If the Kubernetes Cluster has just one big worker node and the node crashes, there’s no other node for the control plane services to replicate the pods inside it. In order to leverage the replication feature, it is a good practice to consider some degree of redundancy in the number of worker nodes. Administrators should plan the number of worker nodes in accordance with the required service level.

Pod restart and replication are features controlled by the services in the Kubernetes Control Plane so the control plane has to be at least as available as the application. Highly available control plane require multiple dedicated Control Plan nodes, check documentation of the orchestration platform used to verify how High Availability of the Control Plane is implemented.

Autoscaling

Autoscaling is a functionality that allows new nodes to be automatically provisioned to a Kubernetes cluster based on some conditions. For example, a worker node node that failed can be replaced by a new one allowing Pod Replication to start pods on it. Autoscaling is infrastructure specific; some orchestration platforms (e.g. Amazon EKS and Azure AKS) support autoscaling functionalities others do not. EC doesn’t provide this feature in its software distribution, however if the feature is available, EC deployment may benefit of it.

Databases and Cache

EC relies on MariaDB and optionally on Elasticsearch. The use of a Redis cache is recommended as well to increase data access performances and reduce the workload on the relational database. All these components should be deployed in a configuration appropriate to the availability targets of the overall system.
All the named services support highly available configurations natively, however, available options depend on the provider of the infrastructure. Check with the provider of your services the configurations supported.

Workload Partitioning

Everyware Cloud supports partitioning of workloads through the creation of multiple instances of a service. Services supporting this feature are :

  • Messaging Service (MQTT connections)
  • Remote Access Service (on-demand VPN connection)

The scope of this feature is to support the creation of dedicated resource pools that can be assigned to distinct accounts so that all the devices of an account will interact with only the resource pool assigned that account.

Workload partitioning works well in scenarios where, from a device perspective, the service level of an account needs to be decoupled from the service level of another account. Since the account has its dedicated resource pools, even though resource pools assigned to the other accounts have some issue, the resource pool assigned to the account will continue working (and vice versa).

📘

Resource pool assignment is only possible for the EC root account and the level-one accounts. All the child accounts that descend from the same level-one account also share the same resource pool.

For more info regarding how to setup multiple service instances fo one type see section Installation using Helm Charts.

Administration Account

We recommend to reserve the EC instance root account, ec-sys, for platform administration tasks only. Avoid connecting devices to it and apply a strict control over the users that are allowed to access it as well as the permissions assigned to those users.

Data Access

We recommend to use EC application integration tools and services to access EC data:

Avoid write and read data through direct connections to the data engines used by EC (i.e. the relational DB and the message store). By using application integration tools and services you will ensure that the business logic is properly enforced over data, performances can be monitored and security centrally and uniformly managed.

Moreover, a new Everyware Cloud version may require data base changes that in turn may break integrations built using direct connections to data engines. The product team will strive to minimize the impact of these changes on application integration tools and services in order to keep them as much stable as possible so that integrations built on top of them require less maintenance or they can rely on the EC documentation for details regarding the migration path.