K8s data collectors

The Splunk Operator for K8s deploys Splunk Enterprise custom resources across a single namespace or multiple namespaces. The helper scripts k8s-splunk-collector.sh and k8s-systeminfo-collector.sh in the tools directory collect data from a K8s cluster which runs the Splunk Operator for K8s.

Splunk Collector for K8s - k8s-splunk-collector.sh

The Splunk collector for K8s collects data from multiple get/describe commands, container logs and diags from the Splunk instances (if opted for) in the context of the current namespace.

The script:

  • Collects extensive data on your K8s cluster. If there is any data you’d like to keep private please avoid using the script (modify it as per your needs)
  • If diags is opted for, generates Splunk diag on all of the Splunk Instances running inside of Splunk Enterprise CR pods deployed by the Splunk Operator for K8s. The diag generated by the script on the Splunk instance gets deleted after extraction. If any of the above is not desired, please resort to collecting data manually.

Requirements to run the script

  • Kubeconfig context set to the cluster/namespace running the Splunk Operator for K8s
  • Access to kubectl commands to get data
  • Access rights on the host file system to create/delete directories (atleast within the directory where you want to run this script and store data)
  • Enough space to collect data in a target folder

Script run instructions

  • Run the script using the following command - sh k8s-splunk-collector.sh -d <flag_to_collect_splunk_diags> -t <target_folder> -l <flag_to_limit_output_by_avoid_kubectl_describe> -s <flag_to_collect_secret_object_metadata>. There are three options which are configurable through the script:
    • The -d option is used to specify whether Splunk diags needs to be collected. Splunk diags are not collected by default. Set to true if Splunk diags are to be collected.
    • The -t option is used to specify a <target folder>. This option is not mandatory. The script allows you to store the data collected in two different ways:
      • If the -t option is not used, a timestamped folder tmp-<timestamp> is created in the present working directory where the data will be written to.
          Eg. sh k8s-splunk-collector.sh -d true
        
      • If the -t option is used with valid full path, a timestamped folder tmp-<timestamp> is created inside the full path where the data will be written to. Note: If the folder provided doesn’t exist, it is created provided atleast one of the preceeding paths exist. But if none of the preceding paths exist, the script runs to completion without writing data to disk. Please make sure you have enough space in the target folders in either case (for reference look at performance requirements section.
    • The -l option is used to specify whether you want to limit the collection of data by avoiding kubectl describe commands. The kubectl describe command outputs are collected by default. There is an issue in K8s with creating too many clients for describe commands (https://github.com/kubernetes/kubernetes/issues/91913). In internal testing these messages have not caused any issues. However, to avoid the warning messages as well to protect your network bandwidth if limited, you can set the -l option to true. Example of a warning message from the K8s cluster:

        W0419 14:46:10.239590   21927 exec.go:203] constructing many client instances from the same exec auth config can cause performance problems during cert rotation and can exhaust available network connections; 1478 clients constructed calling "aws-iam-authenticator"
      
    • The -s option is used to specify whether K8s secret object metadata needs to be collected. Secret object metadata is not collected by default. Set to true if you want the script to collect secret object metadata. Note: The sensitive secret data is NOTcollected.
  • After you run the script, wait until you see the message All data required collected under folder <target_folder>

Example script run:

bash#sh k8s-splunk-collector.sh -d "true"
Starting to collect data with diag true in folder /Users/akondur/Desktop/operator_training/Data_collection_debug/collect_data_k8s/tmp-2021-04-19-10-37 

Setting up directories 

Done setting up directories 

Started collecting logs and diags

Done collecting logs and diags

Started collecting cluster info

Done collecting cluster info

Started collecting kubectl get command outputs

Done collecting kubectl get command outputs

Started collecting kubectl describe command outputs

Done collecting kubectl describe command outputs 

All data required collected under folder /Users/akondur/Desktop/operator_training/Data_collection_debug/collect_data_k8s/tmp-2021-04-19-10-37

Target folder breakdown

**/k8s_data/get** - Contains outputs of kubectl get commands for K8s resources (deployments, statefulsets, configmaps, services, volumes, RBAC) including Splunk Enterprise CRs in the cluster. **/k8s_data/describe** - Contains outputs of kubectl describe commands for K8s resources (deployments, statefulsets, configmaps, services, volumes, RBAC) including Splunk Enterprise CRs in the cluster. This folder is not created if describe commands are avoided. **/pods_data/diags** - Contains diags from all Splunk Instances running inside of Splunk Enterprise CR pods deployed by the Splunk Operator for K8s. This folder is created only when diags is opted for. **/pod_data/logs** - Contains logs from all Splunk Enterprise CR pods deployed by the Splunk Operator for K8s and the Operator pod itself.

Note: All logs are appropriately named with proper prefixes. The folder k8s_data contains outputs of kubectl get and describe commands for multiple K8s resources. The folder pods_data contains logs for all pods in the K8s namespace and also diags from all Splunk pods if requested.

Performance

Splunk deployments on K8s Cluster: SHC (3 search heads, 1 deployer), IDXC (3 indexers, 1 cluster master), 1 Standalone, 1 License master

kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
splunk-default-monitoring-console-0   1/1     Running   0          17m
splunk-example-license-manager-0       1/1     Running   0          18m
splunk-operator-cb8d66765-tl6z2       1/1     Running   0          6h6m
splunk-test-cluster-manager-0          1/1     Running   0          19m
splunk-test-deployer-0                1/1     Running   0          6h3m
splunk-test-indexer-0                 1/1     Running   0          17m
splunk-test-indexer-1                 1/1     Running   0          17m
splunk-test-indexer-2                 1/1     Running   0          17m
splunk-test-search-head-0             1/1     Running   0          6h3m
splunk-test-search-head-1             1/1     Running   0          6h3m
splunk-test-search-head-2             1/1     Running   0          6h3m
splunk-test2-standalone-0             1/1     Running   0          6h5m

For performance testing, the script:

  • Collected diags i.e -d option true
  • Collected kubectl describe commands i.e -l not supplied
  • Collected secret data i.e -s option true

Performance Metrics:

Time taken - 6 mins 40 seconds
Memory - 774.4 MB

System info collector for K8s

The system info collector for K8s collects all the required K8s related system information. Only the K8s admin for the K8s cluster should be allowed to collect data via the script.

Requirements to run the script

  • The script has to be run on K8s node i.e ssh into the node is necessary to run the script. Setup of ssh into the node is not in the scope of the script
  • Admin access to the K8s cluster

Script run instructions

  • Run the script using the following command - sh k8s-systeminfo-collector.sh --ignore_introspection <ignore_introspection_flag> --ignore_metrics <ignore_metrics_flag>. There are two options configurable through the script:
    • The ignore_introspection option is used to specify whether the script should ignore collecting introspection data. Set to true to ignore. By default, the script collects introspection data.
    • The ignore_metrics option is used to specify whether the script should ignore collecting metrics data. Set to true to ignore. By default, the script collects metrics data.

Example script run:

sh-4.2$ sudo tools/k8s-log-collector.sh

        This is version 0.0.1. New versions can be found at https://github.com/splunk/splunk-operator/tools

Trying to collect common operating system logs...
Trying to collect kernel logs...
Trying to collect mount points and volume information...
Trying to collect SELinux status...
Trying to collect iptables information...
Trying to collect installed packages...
Trying to collect active system services...
Trying to Collect Containerd daemon information...
Trying to collect Docker daemon information...
Trying to collect kubelet information...
Trying to collect L-IPAMD introspection information... Trying to collect L-IPAMD prometheus metrics... Trying to collect L-IPAMD checkpoint... cp: cannot stat '/var/run/k8s-node/ipam.json': No such file or directory

Trying to collect sysctls information...
Trying to collect networking infomation... conntrack v1.4.4 (conntrack-tools): 253 flow entries have been shown.

Trying to collect CNI configuration information...
Trying to collect Docker daemon logs...
Trying to archive gathered information...

        Done... your bundled logs are located in /var/log/k8s__2022-03-10_1857-UTC_0.0.1.tar.gz

Target folder breakdown

The script creates a tar file in the present working directory i.e the folder from which the script is executed. Upon untarring the following information can be found:

  1. Kernel logs at **/kernel**
  2. Mount points and Volume Information at **/storage**
  3. SELinux status at **/system**
  4. IPtables at **/networking**
  5. Installed packages at **/system**
  6. System services at **/system**
  7. Containerd at **/containerd**
  8. Dockerd at **/docker**
  9. Kubelet at **/kubelet**
  10. Ipmand at **/ipmand**
  11. sysctls at **/sysctls**
  12. Networking (conntrack, ifconfig, routes etc..) at **/networking**
  13. CNI at **/cni**
  14. Docker logs for system at **/var_log**

Example target folder:

drwxr-xr-x 4 root root   4096 Mar 16 22:26 var_log
drwxr-xr-x 2 root root    137 Mar 16 22:26 system
drwxr-xr-x 2 root root     86 Mar 16 22:26 storage
drwxr-xr-x 2 root root     89 Mar 16 22:26 kernel
drwxr-xr-x 2 root root     61 Mar 16 22:26 containerd
drwxr-xr-x 2 root root     28 Mar 16 22:26 sysctls
drwxr-xr-x 2 root root    249 Mar 16 22:26 networking
drwxr-xr-x 2 root root     75 Mar 16 22:26 kubelet
drwxr-xr-x 2 root root    153 Mar 16 22:26 ipamd
drwxr-xr-x 2 root root    143 Mar 16 22:26 docker
drwxr-xr-x 2 root root     29 Mar 16 22:26 cni