This article describes the requirements and prerequisites for the K2cloud self-hosted cloud deployment, which is based on the Kubernetes (K8s) infrastructure, when deployed at your cloud. The supported cloud providers are: AWS, GCP, and Azure.
K2cloud is also available as a fully-managed service (PaaS), where K2view manages the platform for you, with all relevant deployments and installations, on a segregated arena in the cloud.
A Terraform sample for the creation and installation of the infrastructure, as well as the Helm chart used during the deployment, can be found here.
The K2cloud platform's Orchestrator handles the namespaces creation and the ongoing lifecycle.
A Kubernetes worker node is expected to meet the following requirements:
The CPU to memory ratio is useful for memory optimized machines' profile.
Determining the base number of the required worker nodes, as well as the maximum number of nodes for a cluster's horizontal auto-scaling, depends on the K8s cluster's purpose, your project's needs, and the project's type. According to these, different modules and PODs are required to be deployed, which affect the nodes' calculations.
Below are some use cases:
The recommended resources for Studio namespaces, for Fabric POD, are: 4 cores and 16GB RAM. (Several applications are running on this POD: Fabric runtime, Studio, and Neo4J).
Additional PODs may be required, depending on the project and solution types:
Non-Studio namespaces, such as UAT, SIT, pre-production, and production, require a cluster of several Fabric PODs, using K8S auto-scale capabilities.
On the other hand, PODs and resources that are required for Studio namespace, might not be needed here: for a non-studio case, it is recommended to use managed services (buckets / blob-storage for a massive storage; managed DBs like managed Postgres or managed Cassandra; managed Kafka rather than running it on Fabric POD). Accordingly, a namespace might contain only Fabric Pods, where two cores and 8GB RAM are required. Just so you know, different resources will be necessary, according to your project's needs.
Note: You may consider having several clusters. For example: Dev cluster for Studio, QA, preproduction, and Production. This separation leads to a higher enforcement of security and privacy policies (that is, which clusters are allowed to access what data platforms/DBs). Additionally, it can help with resource allocation, as scaling in and out may be different, and you may wish to avoid the effect of Studio namespaces on production and vice versa.
In POT - for Studio namespaces, a single 3-node K8s cluster is required.
While setting up a K8s cluster, you shall follow these guidelines:
The supported versions for a Kubernetes cluster are: 1.28 - 1.32
The supported versions for the Helm chart are: 3.X
Verify that you have a client environment with the kubectl and Helm command-line tools, configured with a service account or a user that has admin access to a namespace on the subject Kubernetes cluster.
Prepare a domain name that will be used for this cluster and that can be resolved by DNS. The domain should point to the load balancer that points to the NGINX Ingress controller.
Provide the domain name to the K2view team.
Ensure the following, according to the cloud provider:
The proposed sample Terraform defines several modules that are part of the cluster preparations. If, according to your organization's needs, you need to change some parts of it or run your Terraform, ensure the following:
The type of volume that shall be provisioned depends on the cloud provider:
AWS: EFS storage class is being used for Studio namespaces. Please refer here to the EFS storage class sample.
These are the default names and UIDs that are used by K2cloud deployments. If different values have to be set, provide them to K2view.
The list below covers several storage classes, but not all of them are required for all projects. Please check with your team and with K2view about the project and the solution that you are using. For example, for the TDM solution, usually only Fabric and PG are required.
GCP
Azure
The K2-agent is a module, deployed in each cluster, as a POD inside a dedicated namespace. It polls instructions for deployment from the K2cloud platform Mailbox. Adopting this workflow eliminates the need for connectivity from the K2cloud orchestrator into the cluster, so that only outbound traffic from the agent to the K2cloud Orchestrator is required.
The k2-agent source code can be found here.
As part of cluster preparations, you shall deploy the K2-agent. It is deployed in a dedicated namespace (whose default name is "k2view-agent").
Refer here for the k2-agent helm charts and with its configuration values.
Cluster's dedicated Mailbox ID shall be obtained from K2view and applied to the agent's configuration values.
The kubeInterface should be accessible by the k2-agent.
For simplicity, K2view suggests using its shared Nexus for the Fabric and k2-agent images. To use and consume them, you shall open an outbound connection to the Nexus host. Refer to the Networking section.
You can also use your OCI-based registry. For this, you shall:
The non-Fabric images - Postgres, Cassandra, and Neo4j - are not provided by K2view. Instead, you should use the images as published on the Docker Hub. If you prefer hosting them in your registry, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
The cluster interacts with external hosts, into which you shall open the outbound network, all in port 443:
Note: As mentioned, container images can be hosted in your OCI registry. Helm charts can also be copied into your GIT repository and maintained there (It is the responsibility of your team to synchronize with the official repository to ensure smooth operation). If you consume them from your repositories, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
For a Fabric cluster namespace, like production, where massive data is handled, it is recommended to use managed services (like managed Postgres or bucket/blob storage). K2cloud is creating on-the-fly relevant managed resources during the namespace creation process. For this creation purpose, the k2-agent namespace needs to have credentials. This can be achieved by using K8s cloud native credentials:
The K2cloud fully-managed solution includes a monitoring mechanism for collecting and showing the Fabric's metrics and logs.
Assuming that you have your standards and regulations about monitoring, monitoring is out of the self-host guidelines' scope. Contact the K2view team when required for further explanations. Read here and here for more information on Fabric (non-cloud) monitoring setup examples.
This article describes the requirements and prerequisites for the K2cloud self-hosted cloud deployment, which is based on the Kubernetes (K8s) infrastructure, when deployed at your cloud. The supported cloud providers are: AWS, GCP, and Azure.
K2cloud is also available as a fully-managed service (PaaS), where K2view manages the platform for you, with all relevant deployments and installations, on a segregated arena in the cloud.
A Terraform sample for the creation and installation of the infrastructure, as well as the Helm chart used during the deployment, can be found here.
The K2cloud platform's Orchestrator handles the namespaces creation and the ongoing lifecycle.
A Kubernetes worker node is expected to meet the following requirements:
The CPU to memory ratio is useful for memory optimized machines' profile.
Determining the base number of the required worker nodes, as well as the maximum number of nodes for a cluster's horizontal auto-scaling, depends on the K8s cluster's purpose, your project's needs, and the project's type. According to these, different modules and PODs are required to be deployed, which affect the nodes' calculations.
Below are some use cases:
The recommended resources for Studio namespaces, for Fabric POD, are: 4 cores and 16GB RAM. (Several applications are running on this POD: Fabric runtime, Studio, and Neo4J).
Additional PODs may be required, depending on the project and solution types:
Non-Studio namespaces, such as UAT, SIT, pre-production, and production, require a cluster of several Fabric PODs, using K8S auto-scale capabilities.
On the other hand, PODs and resources that are required for Studio namespace, might not be needed here: for a non-studio case, it is recommended to use managed services (buckets / blob-storage for a massive storage; managed DBs like managed Postgres or managed Cassandra; managed Kafka rather than running it on Fabric POD). Accordingly, a namespace might contain only Fabric Pods, where two cores and 8GB RAM are required. Just so you know, different resources will be necessary, according to your project's needs.
Note: You may consider having several clusters. For example: Dev cluster for Studio, QA, preproduction, and Production. This separation leads to a higher enforcement of security and privacy policies (that is, which clusters are allowed to access what data platforms/DBs). Additionally, it can help with resource allocation, as scaling in and out may be different, and you may wish to avoid the effect of Studio namespaces on production and vice versa.
In POT - for Studio namespaces, a single 3-node K8s cluster is required.
While setting up a K8s cluster, you shall follow these guidelines:
The supported versions for a Kubernetes cluster are: 1.28 - 1.32
The supported versions for the Helm chart are: 3.X
Verify that you have a client environment with the kubectl and Helm command-line tools, configured with a service account or a user that has admin access to a namespace on the subject Kubernetes cluster.
Prepare a domain name that will be used for this cluster and that can be resolved by DNS. The domain should point to the load balancer that points to the NGINX Ingress controller.
Provide the domain name to the K2view team.
Ensure the following, according to the cloud provider:
The proposed sample Terraform defines several modules that are part of the cluster preparations. If, according to your organization's needs, you need to change some parts of it or run your Terraform, ensure the following:
The type of volume that shall be provisioned depends on the cloud provider:
AWS: EFS storage class is being used for Studio namespaces. Please refer here to the EFS storage class sample.
These are the default names and UIDs that are used by K2cloud deployments. If different values have to be set, provide them to K2view.
The list below covers several storage classes, but not all of them are required for all projects. Please check with your team and with K2view about the project and the solution that you are using. For example, for the TDM solution, usually only Fabric and PG are required.
GCP
Azure
The K2-agent is a module, deployed in each cluster, as a POD inside a dedicated namespace. It polls instructions for deployment from the K2cloud platform Mailbox. Adopting this workflow eliminates the need for connectivity from the K2cloud orchestrator into the cluster, so that only outbound traffic from the agent to the K2cloud Orchestrator is required.
The k2-agent source code can be found here.
As part of cluster preparations, you shall deploy the K2-agent. It is deployed in a dedicated namespace (whose default name is "k2view-agent").
Refer here for the k2-agent helm charts and with its configuration values.
Cluster's dedicated Mailbox ID shall be obtained from K2view and applied to the agent's configuration values.
The kubeInterface should be accessible by the k2-agent.
For simplicity, K2view suggests using its shared Nexus for the Fabric and k2-agent images. To use and consume them, you shall open an outbound connection to the Nexus host. Refer to the Networking section.
You can also use your OCI-based registry. For this, you shall:
The non-Fabric images - Postgres, Cassandra, and Neo4j - are not provided by K2view. Instead, you should use the images as published on the Docker Hub. If you prefer hosting them in your registry, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
The cluster interacts with external hosts, into which you shall open the outbound network, all in port 443:
Note: As mentioned, container images can be hosted in your OCI registry. Helm charts can also be copied into your GIT repository and maintained there (It is the responsibility of your team to synchronize with the official repository to ensure smooth operation). If you consume them from your repositories, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
For a Fabric cluster namespace, like production, where massive data is handled, it is recommended to use managed services (like managed Postgres or bucket/blob storage). K2cloud is creating on-the-fly relevant managed resources during the namespace creation process. For this creation purpose, the k2-agent namespace needs to have credentials. This can be achieved by using K8s cloud native credentials:
The K2cloud fully-managed solution includes a monitoring mechanism for collecting and showing the Fabric's metrics and logs.
Assuming that you have your standards and regulations about monitoring, monitoring is out of the self-host guidelines' scope. Contact the K2view team when required for further explanations. Read here and here for more information on Fabric (non-cloud) monitoring setup examples.