Tech Blog

Monitoring Rosetta log files with Elastic stack

Introduction

Keeping an eye on your Rosetta log files can be quiet cumbersome. There is plenty of information and it is not always easy to find the lines that you are interested in. If you are running Rosetta on multiple load-balanced servers, the problem escalates as you may have to search in multiple files on different hosts.

The method described in this post helped us to ease the process of searching in the log files as well as visualize the log file activity over a longer period. Additionally, access to the log files is via a web browser which makes it accessible to less tech-savvy people or those that are not allowed to login to the server’s operating system. This article is a follow-up for a presentation given at the 2019 Rosetta User Group meeting at Melbourne. The presentation slides can be found here . For some background information, please have a look at the slides.

Overview

The solution exists of 4 parts, all components of the Elastic stack that are part of the free and basic Elastic subscription:

  • One or more Elasticsearch nodes that maintain an index of the Rosetta log files
  • A Logstash node that interprets the raw log information and creates a JSON object for each log entry and submits it to ElasticSearch for storage and indexing
  • A Filebeat node installed on each Rosetta server
  • A Kibana node for the front-end

Except for the Filebeat nodes these can all be installed on a single server or multiple nodes and configured in clusters, depending on you need. Our Rosetta installation is fairly small with log files of a couple of MB per average day and per server (1 of 2). For us, all of the nodes are installed on a single machine that also runs our Rosetta test installation and it can cope with the extra tasks easily. The index is not critical if you keep the log files and can be rebuild if needed, so there is probably no need to create redundancy or regular backups, except maybe for the configuration of the nodes.

Below we go through the different steps required to set up the index for monitoring your Rosetta log files with the Elastic stack.

Installation

The installation of the Elastic stack is standard and can be done by following the guides on the Elastic web site. For easy deployment or if you want to test this out first, there are docker images on the docker hub that can be easily deployed and removed. Note that docker does not run on Red Hat or CentOS versions earlier than 7.0. But you can install the nodes on any machine as long as there is network access between the nodes and the Filebeat node(s) have read access to the Rosetta log files. Running Filebeat on the same server as the Rosetta installation is recommended, but you can get away with remote access via NFS for instance.

Configuration

Elasticsearch

The configuration of the Elasticsearch node does not require ant customization. However you may want to look at two parameters in the /etc/elasticsearch/elasticsearch.yml configuration file: ‘path.data’ and ‘path.logs’. As you accumulate more and more log files in the index, the required storage for Elasticsearch may grow. The default locations in /var/lib resp. /var/log will probably not be what you want in a production environment.

Logstash

As with Elasticsearch, the default parameters for the data and log files are not ideal. They can be changed in /etc/logstash/logstash.yml. Additionally, you need to tell Logstash how it should interpret the raw log entries that it will get from the Filebeat nodes.

For the Logstash to properly parse the Rosetta log files, a filter pipeline should be created. In the folder /etc/logstash/conf.d create a new text file with the following content:

input {
  beats { port => 5044 }
}

filter {
  grok {
    match => [
      "message", "%{TIMESTAMP_ISO8601:timestamp_string}\s+%{LOGLEVEL:severity}\s+\[(?<service>[^\]]*)\]\s+\((?<process>[^)]*)\)\s+\[(?<field>[^\]]*)\]\s+(\|\s+%{USER:task}\s+(\|\s+%{USER:task_id}\s+)?\|\s+)?%{GREEDYDATA:text}"
    ]
  }

  date {
    match => ["timestamp_string", "ISO8601"]
  }

  mutate {
    remove_field => [message, timestamp_string]
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "rosetta-logs-%{+YYYY.MM.dd}"
  }
}

You can choose any file name you want, but the extension should be .conf. Change the ‘hosts’ line entry to reflect your installation’s Elasticsearch node name and port number. In the presentation on slide 5 there is a hint on how we got to the regular expression on line 8.

Note that you can always test the Logstash filter by replacing the input part with:input { stdin {} }and the output part with: output { stdout { codec => rubydebug } }. You can then test the filter by running logstash -f <your .conf file>. Type any fictitious log entry and see how Logstash interpretes and splits the entry.

Filebeat

The Elastic Beats are small programs that monitor local events and transmit the information to a Logstash node. The installation is lightweight and simple, but you do need to tell it which events to monitor and which Logstash node to send the info to. You should edit the file /etc/filebeat/filebeat.yml and change the ‘filebeat.inputs” section to the following:

filebeat.inputs:

- type: log

 enabled: true
 paths:
   - <Rosetta log folder>/server.log

 multiline.pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} [A-Z]+"
 multiline.negate: true
 multiline.match: after

The first part tells Filebeat which files to monitor for new lines, while the ‘multiline’ fields tell it how log entries may be spread over multiple lines. Replace <Rosetta log folder> with the path where your Rosetta log files are located.

As you are here, create a copy of this file called rosetta-backlog.yml, change the type key to: - type: stdin and remove the paths section under it. This configuration lets you (re-)index old log files with the following command:

zcat "<log file with .gz>" | /usr/share/filebeat/bin/filebeat -once -c /etc/filebeat/reosetta-backlog.yml

Next, in the Outputs section, comment out the Elasticsearch output and set up the Logstash output. It should look like this:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
#  hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

Change the Logstash hostname and port if required on line 18.

Kibana

Finally, configure Kibana by modifying the /etc/kibana/kibana.yml file. Make sure that you uncomment and correct the ‘elasticsearch.hosts’ to match your configuration if needed. If you are running Kibana behind an Apache or NGINX (you probably should), change the kibana.yml appropriately. Detailed information can be found on the Elastic web page, but in short: if you want to access the Kibana web site as ‘rosetta.mysite/logs’, you should set the following keys:

server.name: "rosetta.mysite"
server.basePath: "/logs"
server.rewriteBasePath: false

Starting and stopping

How you start and stop the Elastic stack services is highly dependent on your environment and preferences. Please see the Elastic documentation for more info.

In any case, the preferred order of startup is:

  1. Elasticsearch
  2. Logstash
  3. Filebeat
  4. Kibana

and the other way around for stopping.

Using Kibana

How to use Kibana and create queries and dashboards is beyond the scope of this blog post. It is the one thing you should invest some time in and learn Kibana. There are several good video tutorials for Kibana on the Elastic training site. I also recommend to enroll on the elastic mailing list, where you will be notified on upcoming webinars.

As a quick-and-dirty tutorial, these are some of the we have done:

Setup

All the Rosetta log entries will be indexed in a seperate index per day called rosetta-logs-<date in YYYY.MM.DD format>, at least when you used the same output.elasticsearch.index value as in the sample Logstash configuration above. We typically want to search in all these indexes at same time, so we need to create an index pattern in Kibana’s Management menu. We created the rosetta-logs-* index pattern and in the ‘source filters’ tab, we added the fields beat.*, host.*, agent.* , _*, input.*, log.*, ecs.* and tags to the source filter list. This index patter is our default index.

Visualizations

Here are some of the visualizations we created and placed on a dashboard:

Oracle errors

TypeMetric
Indexrosetta-logs-*
Filtertext.keyword : /ORA-.*/ (Lucene syntax)
Metric AggregationCount

Top 20 Errors

TypeData Table
Indexrosetta-logs-*
FilterNOT (text:at* or text:more) (KQL syntax)
Metric AggregationCount
Buckets AggregationTerms
Buckets Fieldstext.keyword
Buckets Order Bymetric: Count
Buckets OrderDescending
Buckets Size20
Buckets Custom LabelErrors

Message count histogram

TypeVertical Bar
Indexrosetta-logs-*
Filter
Metric Y-Axis AggregationCount
Buckets X-Axis AggregationDate Histogram
Buckets X-Axis Field@timestamp
Buckets X-Axis IntervalAuto
Buckets Split Series Sub AggregationTerms
Buckets Split Series Fieldseverity.keyword
Buckets Split Series Order byAlphabetical
Buckets Split Series OrderAscending
Buckets Split Series Size3

Security

By default the Kibana web site is public. Since version 7.1. the free option added login and rule based protection for the Kibana web site. This requires to set up both Kibana and Elasticsearch to listen on HTTPS. I found the procedure explained here to be comprehensive and complete.

Disclaimer

The method described above was created as a Proof-Of-Concept at Libis and is still in use today. Be aware that in order to run this in production, one has to also consider index lifetime management and backup strategies. By no means I advocate this as the only or best way to monitor Rosetta log files. There are many tools out there on the internet that can be configured to perform the same or similar tasks, sometimes even better. Our choice to use Elastic stack was driven by our familiarity with the software and the possibility to use it for free on our own premises. Your motivation may vary significantly.

Leave a Reply