07 Jun Is your Kubernetes cluster production ready?
Microservices, Containers, Kubernetes & Cloud Native have become the de facto choice of technology for almost all types of applications such as 5G, IOT, Blockchain etc. When such organizations start their journey, they invariably start with Kubernetes along with containerization of the workloads. The choice of Kubernetes is mostly likely to be a managed service on the cloud or a well-known distribution for on-prem deployments.
Having a managed service reduces the complexity of managed and operation master nodes, but a lot of work needs to be done to get it really to production grade. I have observed that it takes about 4 to 6 months for most organizations to get close to the production-grade cluster for application teams to start using them meaningfully.
At Tailwinds, we have compiled a list of aspects one needs to look at for making Kubernetes production-grade. It should be observed that most will be applicable for all and certain aspects would be specific to deployments.
Fleet Management & Operations
We have observed, even small organizations run a minimum of 3-5 clusters which are Dev, Test, Staging and Production. This is in line with what is observed with the CNCF survey(https://www.cncf.io/wp-content/uploads/2020/11/CNCF_Survey_Report_2020.pdf). Big organizations run a single cluster per application team and there are deployments where it is one or more clusters per customer. These fleets of clusters could be across multiple clouds (including on-prem) based on deployment needs or for DR purposes. Given that the Kubernetes part is mostly consistent across clouds but other aspects such as security, underlying networking & costs vary depending on the cloud.
This begets a requirement for managing and operating a fleet of clusters across clouds in a consistent & repeatable way.
Observability
We all know the famous quote “If you can’t measure, you cannot improve” but when it comes to microservices, I would rephrase it to “If you cannot observe, you are not production-grade.” Events, Logs & Metrics are three aspects of observability and one will need to have the right tools to monitor and store them. To observe, one has to figure out what metrics, events and logs to focus on. Additionally, a prescriptive scheme is much better than a descriptive one.
Understanding toolsets, configuring them the right way and keeping them updated is an art and takes a huge amount of time. Also, we are starting to observe that as organizations/products grow, the first call needs to go to the application team rather than DevOps/SREs. This means all these metrics, logs and events to be made Developer friendly.
Security
Given that applications are highly distributed in microservices architecture, Security becomes more important than ever having many different aspects but it is necessary to have few table-stakes features such as; are my applications and infrastructure with right practices, are the images pulled from secured artifactory, does the images that are currently running on the clusters actually secured, do users have only the required access, are network policies in place are some necessary guardrails.
High Availability
We have observed that in some organizations high availability has been an afterthought but it is necessary to strategize during the architectural phase while part of implementation can be done at a later point in time. Some key aspects to look at are, When failure occurs such as application, node, cluster, zone or region are there necessary backups in place, Is my data getting backed up regularly? If you are managing Kubernetes-master, is the master running in HA mode?
Cost Optimizations
Resource utilization and cost go hand-in-hand. If you have the right architecture and run your infrastructure and applications in an optimized way, then you probably have optimized for cost. This is a lot easier said than done. Being too tight on the optimization in the early days would probably put a lot of breaks on innovation but being completely blind would lead to high cost. A right balance needs to be found between these two spectrums. The best approach is to have tools in place and take stock of things every given time.
Tailwinds
For each of the above aspects, you have different sets of tools either from vendors or from open source, but you will need to have the right automation to create clusters consistently and in a repeatable way and provide a unified view.
Stay tuned to our website to read weekly articles on the topics that interest you.
We hope you got to learn something new and interesting.
Do leave your thoughts and suggestions in the comments section below.
No Comments