Monitoring Rosetta log files with Elastic stack
Introduction
Keeping an eye on your Rosetta log files can be fairly cumbersome. There is plenty of information and it is not always easy to find the lines that you are interested in. If you are running Rosetta on multiple load-balanced servers, the problem escalates as you may have to search in multiple files on different hosts.
The method described in this post helped us to ease the process of searching in the log files as well as visualize the log file activity over a longer period. Additionally, access to the log files happens via a web browser which makes it more accessible to less tech-savvy people or those that are not allowed to login to the server’s operating system. This article is a follow-up for a presentation given at the 2019 Rosetta User Group meeting at Melbourne. The presentation slides can be found here . For some background information, please have a look at the slides.
Overview
The solution exists of 4 parts, all components of the Elastic stack that are part of the free and basic Elastic subscription:
- One or more Elasticsearch nodes that maintain an index of the Rosetta log files
- A Logstash node that interprets the raw log information, creates a JSON object for each log entry and submits it to ElasticSearch for storage and indexing
- A Filebeat node installed on each Rosetta server
- A Kibana node for the front-end
Except for the Filebeat nodes these can all be installed on a single server or multiple machines and configured in clusters, depending on your need. Our Rosetta installation is fairly small with log files of a couple of MB per average day and per server (1 of 2). For us, all of the nodes are installed on a single machine that also runs our Rosetta test installation and it can cope with the extra tasks easily. The index is not critical if you keep the log files and can be rebuild if needed, so there is probably no need to create redundancy or regular backups, except maybe for the configuration of the nodes.
Below we go through the different steps required to set up the index for monitoring your Rosetta log files with the Elastic stack.
Installation
The installation of the Elastic stack is standard and can be done by following the guides on the Elastic web site. For easy deployment or if you want to test this out first, there are docker images on the docker hub that can be easily deployed and removed. Note that docker does not run on Red Hat or CentOS versions earlier than 7.0. But you can install the nodes on any machine as long as there is network access between the nodes and the Filebeat node(s) have read access to the Rosetta log files. Running Filebeat on the same server as the Rosetta installation is recommended, but you can get away with remote access via NFS for instance.
Configuration
Elasticsearch
The configuration of the Elasticsearch node does not require any customization. However you may want to look at two parameters in the /etc/elasticsearch/elasticsearch.yml configuration file: ‘path.data’ and ‘path.logs’. As you accumulate more and more log files in the index, the required storage for Elasticsearch may grow. The default locations in /var/lib resp. /var/log will probably not be what you want in a production environment.
Logstash
As with Elasticsearch, the default parameters for the data and log files are not ideal. They can be changed in /etc/logstash/logstash.yml. Additionally, you need to tell Logstash how it should interpret the raw log entries that it will get from the Filebeat nodes.
For the Logstash to properly parse the Rosetta log files, a filter pipeline should be created. In the folder /etc/logstash/conf.d create a new text file with the following content:
input { beats { port => 5044 } } filter { grok { match => [ "message", "%{TIMESTAMP_ISO8601:timestamp_string}\s+%{LOGLEVEL:severity}\s+\[(?<service>[^\]]*)\]\s+\((?<process>[^)]*)\)\s+\[(?<field>[^\]]*)\]\s+(\|\s+%{USER:task}\s+(\|\s+%{USER:task_id}\s+)?\|\s+)?%{GREEDYDATA:text}" ] } date { match => ["timestamp_string", "ISO8601"] } mutate { remove_field => [message, timestamp_string] } } output { elasticsearch { hosts => ["localhost:9200"] index => "rosetta-logs-%{+YYYY.MM.dd}" } }
You can choose any file name you want, but the extension should be .conf. Change the ‘hosts’ line entry to reflect your installation’s Elasticsearch node name and port number. In the presentation on slide 5 there is a hint on how we got to the regular expression on line 8.
Note that you can always test the Logstash filter by replacing the input part with:input { stdin {} }
and the output part with: output { stdout { codec => rubydebug } }
. You can then test the filter by running logstash -f <your .conf file>
. Type any fictitious log entry and see how Logstash interprets and splits the entry.
Filebeat
The Elastic Beats are small programs that monitor local events and transmit the information to a Logstash node. The installation is lightweight and simple, but you do need to tell it which events to monitor and which Logstash node to send the info to. You should edit the file /etc/filebeat/filebeat.yml and change the ‘filebeat.inputs” section to the following:
filebeat.inputs: - type: log enabled: true paths: - <Rosetta log folder>/server.log multiline.pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} [A-Z]+" multiline.negate: true multiline.match: after
The first part tells Filebeat which files to monitor for new lines, while the ‘multiline’ fields tell it how log entries may be spread over multiple lines. Replace <Rosetta log folder> with the path where your Rosetta log files are located.
As you are here, create a copy of this file called rosetta-backlog.yml, change the type key to: - type: stdin
and remove the paths section under it. This configuration lets you (re-)index old log files with the following command:
zcat "<log file with .gz>" | /usr/share/filebeat/bin/filebeat -once -c /etc/filebeat/reosetta-backlog.yml
Next, in the Outputs section, comment out the Elasticsearch output and set up the Logstash output. It should look like this:
#================================ Outputs ===================================== # Configure what output to use when sending the data collected by the beat. #-------------------------- Elasticsearch output ------------------------------ #output.elasticsearch: # Array of hosts to connect to. # hosts: ["localhost:9200"] # Optional protocol and basic auth credentials. #protocol: "https" #username: "elastic" #password: "changeme" #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: ["localhost:5044"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"] # Certificate for SSL client authentication #ssl.certificate: "/etc/pki/client/cert.pem" # Client Certificate Key #ssl.key: "/etc/pki/client/cert.key"
Change the Logstash hostname and port if required on line 18.
Kibana
Finally, configure Kibana by modifying the /etc/kibana/kibana.yml file. Make sure that you uncomment and correct the ‘elasticsearch.hosts’ to match your configuration if needed. If you are running Kibana behind an Apache or NGINX (you probably should), change the kibana.yml appropriately. Detailed information can be found on the Elastic web page, but in short: if you want to access the Kibana web site as ‘rosetta.mysite/logs’, you should set the following keys:
server.name: "rosetta.mysite" server.basePath: "/logs" server.rewriteBasePath: false
Starting and stopping
How you start and stop the Elastic stack services is highly dependent on your environment and preferences. Please see the Elastic documentation for more info.
In any case, the preferred order of startup is:
- Elasticsearch
- Logstash
- Filebeat
- Kibana
and the other way around for stopping.
Using Kibana
How to use Kibana and create queries and dashboards is beyond the scope of this blog post. It is the one thing you should invest some time in and learn Kibana. There are several good video tutorials for Kibana on the Elastic training site. I also recommend to enroll on the elastic mailing list, where you will be notified on upcoming webinars.
As a quick-and-dirty tutorial, these are some of the things we have done:
Setup
All the Rosetta log entries will be indexed in a separate index per day called rosetta-logs-<date in YYYY.MM.DD format>, at least when you used the same output.elasticsearch.index value as in the sample Logstash configuration above. We typically want to search in all these indexes at same time, so we need to create an index pattern in Kibana’s Management menu. We created the rosetta-logs-* index pattern and in the ‘source filters’ tab, we added the fields beat.*, host.*, agent.* , _*, input.*, log.*, ecs.* and tags to the source filter list. This index pattern is our default index.
Visualizations
Here are some of the visualizations we created and placed on a dashboard:
Oracle errors
Type | Metric |
Index | rosetta-logs-* |
Filter | text.keyword : /ORA-.*/ (Lucene syntax) |
Metric Aggregation | Count |
Top 20 Errors
Type | Data Table |
Index | rosetta-logs-* |
Filter | NOT (text:at* or text:more) (KQL syntax) |
Metric Aggregation | Count |
Buckets Aggregation | Terms |
Buckets Fields | text.keyword |
Buckets Order By | metric: Count |
Buckets Order | Descending |
Buckets Size | 20 |
Buckets Custom Label | Errors |
Message count histogram
Type | Vertical Bar |
Index | rosetta-logs-* |
Filter | |
Metric Y-Axis Aggregation | Count |
Buckets X-Axis Aggregation | Date Histogram |
Buckets X-Axis Field | @timestamp |
Buckets X-Axis Interval | Auto |
Buckets Split Series Sub Aggregation | Terms |
Buckets Split Series Field | severity.keyword |
Buckets Split Series Order by | Alphabetical |
Buckets Split Series Order | Ascending |
Buckets Split Series Size | 3 |
Security
By default the Kibana web site is public. Since version 7.1. the free option added login and rule based protection for the Kibana web site. This requires to set up both Kibana and Elasticsearch to listen on HTTPS. I found the procedure explained here to be comprehensive and complete.
Disclaimer
The method described above was created as a Proof-Of-Concept at Libis and is still in use today. Be aware that in order to run this in production, one has to also consider index lifetime management and backup strategies. By no means I advocate this as the only or best way to monitor Rosetta log files. There are many tools out there on the internet that can be configured to perform the same or similar tasks, sometimes even better. Our choice to use Elastic stack was driven by our familiarity with the software and the possibility to use it for free on our own premises. Your motivation may vary significantly.