This post is about some notes/hints/tips/traps of deploying a production-level Elasticsearch, Logstash and Kibana (ELK) stack with the free open-source security add-ons/plugins SearchGuard and ElastAlert on Azure Kubernetes Service (AKS).
Let me post some important references here and express my big thanks for the authors:
- Azure Kubernetes Service (AKS): Deploying Elasticsearch, Logstash and Kibana (ELK) and consume messages from Azure Event Hub - Part 1 - published on September 24, 2018
- Azure Kubernetes Service (AKS): Azure AD SAML based Single Sign on to secure Elasticsearch and Kibana and securing communications in ELK - Part 2 - published on October 9, 2018 - YES, they are pretty new as of now.
- High Performance ELK with Kubernetes: Part 1 - Set up Elasticsearch and Kibana in a Kubernetes cluster - published on August 7, 2018
- High Performance ELK with Kubernetes: Part 2 - Expose Logstash and Kibana to the outside world - published on August 7, 2018 - also quite new and good
Here are some of my notes:
- AKS nodes are not directly reachable. To access them, one has to use temp pods as brokers as described in this official document. I personally think this is a good practice as the temp pods will be destroied after the one-time usage.
- To let K8s be accessible to some private docker registry, please DON’T manually connect to each K8s physical node to run
docker login, which is a very ugly, error-prone and difficult to maintain way (think about if some nodes crash and some new ones are added when scaling out, or IP addresses are changed for some reason..)
- The correct way is to create a
Secretin your K8s and tell docker to use it when pulling images as described in the official doc Pull an Image from a Private Registry
network.host: 0.0.0.0is a must-have config. otherwise the DNS resolution will fail => master selection will fail => first data node will fail => no extra data nodes can get up.
- it’s better to put configuation entries for SearchGuard in
elasticsearch.ymland define it in
ConfigMap, then mount it as a volumn with subpath
- the health checking endpoint for data nodes has to be changed from
The command looks like:
- only on the first data node is sufficient. not on master nodes and other data nodes.
to be continued…