While I had the privilege of using ElasticSearch to implement media analytics in a past role, it is only recently that I have started looking at it for log analysis.
The ELK stack from Elastic.co provides three key components that make up the log analysis stack.
- (E) – ElasticSearch: The search engine which will index the log contents
- (L) – LogStash: Will work as a collector agent on any machine where you have logs to transmit to ElasticSearch
- (K) – Kibana: A web front end that can be used to visualize the log data, query it and build dashboards.
All of the three are open source products that you can download and use. Enterprises would probably be interested in getting a support agreement with Elastic. A key plugin that comes along with the support plans is the Shield and Marvel plugins. If you need your indexes to be locked down using LDAP or AD then you need Shield. Marvel is a passive monitoring application that gives you a view into cluster health and even run queries against your index.
Lets leave the commercial plugins aside now. The general architecture for an ELK stack would look something like this…
- Install Logstash agents (aka collectors or shippers) on any machine that you want logs to be sourced from. Logstash has a lot of supported input plugins that can be configured to collect logs. In this architecture we write out the logs to a temporary broker such asRedis (you can use other brokers such as Kafka orRabbitMQ). At the ingestion point we can have Logstash break the logs into attributes which can be indexed as fields within a document in ElasticSearch.
- Logstash processing pipeline involves: Inputs -> Filters -> Outputs. In the example above we can point Logstash to say a log4j input source, apply some filters to break down the log statement into attributes and use an output plugin to write the logs to a sink.
- If you are standing up a high velocity log analysis platform, it is recommended to use a broker (like Redis) between the log sources and the indexer components. This ensures that the architecture can scale under high volume.
- The Logstash indexer reads the messages from the broker and indexes them into ElasticSearch. Logstash indexer will perform batch updates into ElasticSearch for improved performance vs. indexing one log statement at a time. The indexer uses a separate thread to index into ElasticSearch.
- The ElasticSearch index is the heart of our platform. It contains the log data indexed and ready to query.
- Finally Kibana can be configured to read the data from the index and present it to the end user.
A key question you have to answer for yourself is “Do I need a log analysis tool?”. Answer is, it depends. For example if you have many logs that you need to make sense of and you need it asap without wasting time collecting individual log files, then you will need a log analysis solution. Or you have use case which requires you to understand application behavior then you might want to consider ELK.
Note: If you need Application Performance Monitoring (APM) then you are better off with something like NewRelic. Use the right tool for the right purpose (assuming you have the dollars to spend).
A few things to remember when implementing a log analysis solution (whether you decide to use ELK or Splunk or Loggly ,etc).
- Pay attention to your log formats, especially when you have control of them. Write out enough log data that can be used to query effectively later.
- Use standard log formats where possible.
- If you have multiple components then consider adding a unique identifier(s) that can be used to co-relate the logs.
- Start small and simple and then grow. Don’t try to boil the ocean on day#1.
- Ensure you know exactly what use cases you want to use ELK for. Don’t repeat what another tool already does – in case you have other available tools in the enterprise.
- If you don’t like Logstash then evaluate other tools such as Fluentd to ingest the logs into ElasticSearch.