This article describes the requirements and prerequisites for the K2cloud self-hosted cloud deployment, which is based on the Kubernetes (K8s) infrastructure, when deployed at your cloud. The supported cloud providers are: AWS, GCP and Azure.
K2cloud is also available as a fully-managed service (PaaS), where K2view manages the platform for you, with all relevant deployments and installations, on a segregated arena in the cloud.
A Terraform sample for the creation and installation of the infrastructure, as well as the Helm chart used during the deployment, can be found here.
The namespaces creation and the on-going lifecycle are handled by the K2cloud platform's Orchestrator.
A Kubernetes worker node is expected to meet the following requirements:
The CPU to memory ratio is usefull for memory optimizied machine's profile.
Determining the base number of the required worker nodes, as well as the maximum number of nodes for a cluster's horizontal auto-scaling, depends on the K8s cluster purpose, your project needs and the project's type. According to these, different modules and PODs are required to be deployed, which affect the nodes' calculations.
Below are some use cases:
The recommended resources for Studio namespaces, for Fabric POD, are: 4 cores and 16GB RAM. (There are several applications running on this POD: Fabric runtime, Studio and Neo4J).
Additional PODs may be required, depending on the project and solution types:
Non-Studio namespaces, such as UAT, SIT, pre-production and production, require a cluster of several Fabric PODs, using K8S auto-scale capabilities.
On the other hand, PODs and resources that are required for Studio namespace, might not be needed here: for a non-studio case, it is recommended to use managed services (buckets / blob-storage for a massive storage; managed DBs like managed Postgres or managed Cassandra; managed Kafka rather than running it on Fabric POD). Accoridngly, a namespace might contian only Fabric Pods, where for each 2 cores and 8GB RAM are required. Note that different resources will be required, per your projects' needs.
Note: You may consider having several clusters. For example: Dev cluster for Studio; QA and preproduction; Production. This separation leads to a higher enforcement of security and privacy policies (that is: which clusters are allowed to access what data platforms/DBs). Additionally, it can help for resources allocation, as scaling in and out may be different, and you may wish to avoid the effect of Studio namespaces on production and vice versa.
In POT - for Studio namespaces, a single 3 nodes K8s cluster is required.
While setting up a K8s cluster, you shall follow these guidelines:
The supported versions for a Kubernetes cluster are: 1.24 - 1.27
The supported versions for Helm chart are: 3.X
Verify that you have a client environment with the kubectl and Helm command-line tools, configured with a service account or a user that has an admin access to a namespace on the subject Kubernetes cluster.
Prepare a domain name that will be used for this cluster and that can be resolved by DNS. The domain should point to the load balancer that points to the NGINX Ingress controller with domain wildcard.
Provide the domain name to the K2view team.
When creating a namespace, its name is associated as a subdomain to this domain name in the Ingress controller. For example, if the domain is "k2dev.company.com" and a created namespace is "test", then the URL of this namespace, that users shall access, will be "test.k2dev.company.com".
Ensure the following, according to the cloud provider:
The proposed sample Terraform defines several modules that are part of the cluster preparations. If according to your organization regulation you need to change some parts of it or run your Terraform, ensure the following:
The type of volume that shall be provisioned, depends on the cloud provider:
AWS: EFS storage class is being used for Studio namespaces. Please refer here to EFS storage class sample.
These are the default names and UIDs that are used by K2cloud deployments. If different values have to be set - provide them to K2view.
The below list covers several storage classes, that not all of them are required for all projects. Please check with your team and with K2view about the project and the solution that you are using. For example, for the TDM solution - usually only Fabric and PG are required.
GCP
Azure
The K2-agent is a module, deployed in each cluster, as a POD inside a dedicated namespace. It polls instructions, for deployment, from the K2cloud platform Mailbox. Adopting this workflow eliminates the need of connectivity from the K2cloud orchestrator into the cluster, so that only outbound traffic from the agent to K2cloud orchestrator is required.
The k2-agent source-code can be found here.
As part of cluster preparations, you shall deploy the k2-agent. It is deployed in a dedicated namespace (whose default name is "k2view-agent").
Refer here for the k2-agent helm charts and with its configuration values.
Cluster's dedicated Mailbox ID shall be obtained from K2view, to be applied in the agent's configuration values.
The kubeInterface should be accessible by the k2-agent.
For simplicity, K2view suggests using its shared Nexus for the Fabric and k2-agent images. To use and consume them, you shall open an outbound connection into the Nexus host. Refer to the Networking section.
You can also use your OCI based registry. For this, you shall:
The non-Fabric images - Postgres, Casandra and Neo4j - are not provided by K2view. Instead, you should use the images as published in the Docker Hub. If you prefer hosting them also in your registry, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
The cluster interacts with external hosts, into which you shall open the outbound network - all in port 443:
Note: As mentioned, container images can be hosted in your OCI registry. Helm charts can be also copied into your GIT repository and maintained there (It is the responsibility of your team to synchronize with the official repository to ensure smooth operation). If you consume them from your repositories, inform K2view team about it, so they can be configured in the K2cloud platform orchestrator.
For Fabric cluster namespace, like production, where massive data is handled, it is recommended to use managed services (like managed Postgres or bucket / blob storage). K2cloud is creating on-the-fly relevant managed resources during the namespaces creation process. For this creation purpose, the k2-agent namespace needs to have credentials. This can be achieved by using K8s cloud native credentials:
The K2cloud fully-managed-solution includes monitoring mechanism, for collecting and showing Fabric's metrics and logs.
Assuming that you have your standards and regulations about monitoring, monitoring is out of the self-host guidelines scope. Contact the K2view team, when required, for further explanations. Read here and here for more information of Fabric (non-cloud) monitoring setup examples.
This article describes the requirements and prerequisites for the K2cloud self-hosted cloud deployment, which is based on the Kubernetes (K8s) infrastructure, when deployed at your cloud. The supported cloud providers are: AWS, GCP and Azure.
K2cloud is also available as a fully-managed service (PaaS), where K2view manages the platform for you, with all relevant deployments and installations, on a segregated arena in the cloud.
A Terraform sample for the creation and installation of the infrastructure, as well as the Helm chart used during the deployment, can be found here.
The namespaces creation and the on-going lifecycle are handled by the K2cloud platform's Orchestrator.
A Kubernetes worker node is expected to meet the following requirements:
The CPU to memory ratio is usefull for memory optimizied machine's profile.
Determining the base number of the required worker nodes, as well as the maximum number of nodes for a cluster's horizontal auto-scaling, depends on the K8s cluster purpose, your project needs and the project's type. According to these, different modules and PODs are required to be deployed, which affect the nodes' calculations.
Below are some use cases:
The recommended resources for Studio namespaces, for Fabric POD, are: 4 cores and 16GB RAM. (There are several applications running on this POD: Fabric runtime, Studio and Neo4J).
Additional PODs may be required, depending on the project and solution types:
Non-Studio namespaces, such as UAT, SIT, pre-production and production, require a cluster of several Fabric PODs, using K8S auto-scale capabilities.
On the other hand, PODs and resources that are required for Studio namespace, might not be needed here: for a non-studio case, it is recommended to use managed services (buckets / blob-storage for a massive storage; managed DBs like managed Postgres or managed Cassandra; managed Kafka rather than running it on Fabric POD). Accoridngly, a namespace might contian only Fabric Pods, where for each 2 cores and 8GB RAM are required. Note that different resources will be required, per your projects' needs.
Note: You may consider having several clusters. For example: Dev cluster for Studio; QA and preproduction; Production. This separation leads to a higher enforcement of security and privacy policies (that is: which clusters are allowed to access what data platforms/DBs). Additionally, it can help for resources allocation, as scaling in and out may be different, and you may wish to avoid the effect of Studio namespaces on production and vice versa.
In POT - for Studio namespaces, a single 3 nodes K8s cluster is required.
While setting up a K8s cluster, you shall follow these guidelines:
The supported versions for a Kubernetes cluster are: 1.24 - 1.27
The supported versions for Helm chart are: 3.X
Verify that you have a client environment with the kubectl and Helm command-line tools, configured with a service account or a user that has an admin access to a namespace on the subject Kubernetes cluster.
Prepare a domain name that will be used for this cluster and that can be resolved by DNS. The domain should point to the load balancer that points to the NGINX Ingress controller with domain wildcard.
Provide the domain name to the K2view team.
When creating a namespace, its name is associated as a subdomain to this domain name in the Ingress controller. For example, if the domain is "k2dev.company.com" and a created namespace is "test", then the URL of this namespace, that users shall access, will be "test.k2dev.company.com".
Ensure the following, according to the cloud provider:
The proposed sample Terraform defines several modules that are part of the cluster preparations. If according to your organization regulation you need to change some parts of it or run your Terraform, ensure the following:
The type of volume that shall be provisioned, depends on the cloud provider:
AWS: EFS storage class is being used for Studio namespaces. Please refer here to EFS storage class sample.
These are the default names and UIDs that are used by K2cloud deployments. If different values have to be set - provide them to K2view.
The below list covers several storage classes, that not all of them are required for all projects. Please check with your team and with K2view about the project and the solution that you are using. For example, for the TDM solution - usually only Fabric and PG are required.
GCP
Azure
The K2-agent is a module, deployed in each cluster, as a POD inside a dedicated namespace. It polls instructions, for deployment, from the K2cloud platform Mailbox. Adopting this workflow eliminates the need of connectivity from the K2cloud orchestrator into the cluster, so that only outbound traffic from the agent to K2cloud orchestrator is required.
The k2-agent source-code can be found here.
As part of cluster preparations, you shall deploy the k2-agent. It is deployed in a dedicated namespace (whose default name is "k2view-agent").
Refer here for the k2-agent helm charts and with its configuration values.
Cluster's dedicated Mailbox ID shall be obtained from K2view, to be applied in the agent's configuration values.
The kubeInterface should be accessible by the k2-agent.
For simplicity, K2view suggests using its shared Nexus for the Fabric and k2-agent images. To use and consume them, you shall open an outbound connection into the Nexus host. Refer to the Networking section.
You can also use your OCI based registry. For this, you shall:
The non-Fabric images - Postgres, Casandra and Neo4j - are not provided by K2view. Instead, you should use the images as published in the Docker Hub. If you prefer hosting them also in your registry, inform the K2view team about it, so they can be configured in the K2cloud platform orchestrator.
The cluster interacts with external hosts, into which you shall open the outbound network - all in port 443:
Note: As mentioned, container images can be hosted in your OCI registry. Helm charts can be also copied into your GIT repository and maintained there (It is the responsibility of your team to synchronize with the official repository to ensure smooth operation). If you consume them from your repositories, inform K2view team about it, so they can be configured in the K2cloud platform orchestrator.
For Fabric cluster namespace, like production, where massive data is handled, it is recommended to use managed services (like managed Postgres or bucket / blob storage). K2cloud is creating on-the-fly relevant managed resources during the namespaces creation process. For this creation purpose, the k2-agent namespace needs to have credentials. This can be achieved by using K8s cloud native credentials:
The K2cloud fully-managed-solution includes monitoring mechanism, for collecting and showing Fabric's metrics and logs.
Assuming that you have your standards and regulations about monitoring, monitoring is out of the self-host guidelines scope. Contact the K2view team, when required, for further explanations. Read here and here for more information of Fabric (non-cloud) monitoring setup examples.