byGink▻ Mon, 22 Jul 2024
It turns out pretty easy to send log data into an OpenSearch cluster. What we need is just a logging agent, which keeps tracking the source of log data and update to our cluster continuously. And the right permissions to write data into this cluster.
Let's get into your OpenSearch dashboard, with the master account we created before.
Go to Security ⤏ Roles, choose an existing one or create a new role with write data permission.
For the simplicity, I'll take all_access
, which is always available in OpenSearch.
Next, go to all_access
⤏ Mapped users ⤏ Manage mapping. You'll see 2 options here: Users and Backend roles
With Users, these are actually the IAM users from AWS. You can put the full IAM ARN of any user here. Then generate the AWS access key ID & secret for that user. Provide this key ID & secret to the logging agent so it can send data to the cluster without using username & password.
With Backend roles, this one is commonly an IAM role for EC2. We may also call it EC2 instance profile. The role that you commonly apply to the EC2 server, where you're deploying your services. With this option, any logging agent running inside that EC2 will be able to send log data directly to this cluster. No need any credential such as key ID and secret as above.
Make sure you put the right one (full ARN) and update it here.
With logging agents, there are many options out there. But I prefer to use fluent-bit
and filebeat
.
This one is really simple and lightweight. It can read various kind of data, parse it and then send to OpenSearch directly. The only downside is that it doesn't support Kubernetes (K8s) very well.
Therefore, if you need to collect log data from a simple server, such as a service deployed via EC2.
Fluent-bit
will be the best fit for this case.
Let's take a look from the official document to see how to install this tool. It depends on your running platform. For example with Linux Ubuntu, it can be done via this script
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
After installing successfully, the common config of fluent-bit
located in /etc/fluent-bit/
.
In this directory, the main configuration comes from fluent-bit.conf
.
If you're not familiar with fluent-bit
, I recommend to take a look at its document.
For now, what we really have to care about are the [INPUT]
and [OUTPUT]
.
In my case, I just want to keep reading all *.log
files located in /var/log/my-web-app/
and update to OpenSearch. So my configuration looks like this:
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File /etc/fluent-bit/parsers.conf
[INPUT]
name tail
refresh_interval 5
path /var/log/my-web-app/*.log
read_from_head true
Rotate_Wait 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Key message
[FILTER]
Name record_modifier
Match *
Record host.hostname ${HOSTNAME}
[OUTPUT]
Name opensearch
Match *
Logstash_Format On
Logstash_Prefix some-index-prefix
Suppress_Type_Name On
AWS_Auth On
AWS_Region ap-southeast-1
Host opensearch-hostname.com
tls On
Port 443
What really matters here is just the Host
in [OUTPUT]
which is the correct endpoint of OpenSearch cluster.
And AWS_Auth
enabled, because I'm sending log from an EC2 server, which is applied the role that's been updated in Backend roles.
If you need to send log data from a server that's not EC2, such as an external VPS.
You will need to configure AWS_Role_ARN
(which is the granted User IAM ARN) in [OUTPUT]
, then export the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
into your environment variables.
With services running in Kubernetes (k8s), Filebeat
is a good choice because of its flexibility. It can filter the log to send based on namespace or label.
So the OpenSearch cluster won't be flooded by unuseful log data.
But Filebeat
is not perfect anyway. Because we can't create different indices on OpenSearch based on log data with it.
That's why there's the combination of Logtash
and Filebeat
to do this.
In my case, I want to have log data of just a few namespaces inside the K8s cluster. And updating these log data into OpenSearch with different indices.
For example, each namespace will ship log data to OpenSearch with a specific index. Seems to be impossible with Filebeat
but Logstash
can step in and solve it very well.
Here are the configuration for Logtash
(deployment) and Filebeat
(deamonset) on K8s.
# Logstash deployment as an intermediary to forward log to OpenSearch
---
apiVersion: v1
data:
logstash.conf: |-
input {
beats {
port => 5044
}
}
filter {
date {
target => "@timestamp"
match => ["time", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"]
}
}
output {
opensearch {
hosts => ["https://opensearch-hostname.com:443"]
index => "k8s-%{[kubernetes][namespace]}-%{+YYYY.MM.dd}"
auth_type => {
type => "aws_iam"
region => "ap-southeast-1"
}
ecs_compatibility => disabled
}
}
kind: ConfigMap
metadata:
name: logstash-conf-cm
namespace: audits
---
apiVersion: v1
data:
logstash.yml: |-
http.host: "0.0.0.0"
pipeline.ecs_compatibility: disabled
kind: ConfigMap
metadata:
name: logstash-yml-cm
namespace: audits
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: audits
name: logstash-center
labels:
app: logger
spec:
replicas: 1
selector:
matchLabels:
app: logger
type: logstash
template:
metadata:
labels:
app: logger
type: logstash
spec:
containers:
- image: opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
imagePullPolicy: IfNotPresent
name: logstash
command: ["bin/logstash", "-f", "config/logstash.conf"]
ports:
- containerPort: 5044
envFrom:
- secretRef:
name: logstash-env
# kubectl create secret generic logstash-env --from-literal=AWS_ACCESS_KEY_ID=xxx --from-literal=AWS_SECRET_ACCESS_KEY=xxx
volumeMounts:
- mountPath: /usr/share/logstash/config/logstash.conf
name: logstash-conf
subPath: logstash.conf
- mountPath: /usr/share/logstash/config/logstash.yml
name: logstash-yml
subPath: logstash.yml
volumes:
- name: logstash-conf
configMap:
name: logstash-conf-cm
- name: logstash-yml
configMap:
name: logstash-yml-cm
---
apiVersion: v1
kind: Service
metadata:
name: logstash-svc
namespace: audits
labels:
app: logger
spec:
ports:
- port: 5044
protocol: TCP
targetPort: 5044
selector:
app: logger
type: logstash
type: ClusterIP
# Filebeat deamonset to run on all k8s nodes, crawl log data and send to Logstash
---
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: audits
labels:
app: filebeat
data:
filebeat.yml: |-
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
templates:
- condition:
or:
- equals:
kubernetes.namespace: first-ns
- equals:
kubernetes.namespace: second-ns
config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
exclude_lines: ["vault", "Vault"]
processors:
- add_cloud_metadata:
- add_host_metadata:
output.logstash:
hosts: ["logstash-svc:5044"]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: audits
labels:
app: filebeat
spec:
selector:
matchLabels:
app: filebeat
template:
metadata:
labels:
app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.16.2
args: ["-c", "/etc/filebeat.yml", "-e"]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
volumeMounts:
- name: filebeat-conf
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: container-log
mountPath: /var/lib/docker/containers
readOnly: true
- name: host-log
mountPath: /var/log
readOnly: true
volumes:
- name: filebeat-conf
configMap:
defaultMode: 0640
name: filebeat-config
- name: container-log
hostPath:
path: /var/lib/docker/containers
- name: host-log
hostPath:
path: /var/log
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: audits
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: filebeat
namespace: audits
subjects:
- kind: ServiceAccount
name: filebeat
namespace: audits
roleRef:
kind: Role
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: filebeat-kubeadm-config
namespace: kube-system
subjects:
- kind: ServiceAccount
name: filebeat
namespace: audits
roleRef:
kind: Role
name: filebeat-kubeadm-config
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: filebeat
# should be the namespace where filebeat is running
namespace: audits
labels:
app: filebeat
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: filebeat-kubeadm-config
namespace: kube-system
labels:
app: filebeat
rules:
- apiGroups: [""]
resources:
- configmaps
resourceNames:
- kubeadm-config
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: audits
labels:
app: filebeat
---
This one looks intimidating, huh?
Actually, it just grants appropriate permissions to Filebeat
so it can read log data from each node inside the cluster.
What's been done so far is creating Logtash
and Filebeat
inside audits
namespace.
And shipping all log data belong to first-ns
and second-ns
namespaces to the OpenSearch cluster. 🤯
And that's it.
© 2016-2024 GinkCode.com