AWS Beanstalk (running Spring Boot jar) and Log aggregation with ElasticSearch & Filebeat

Most serious applications (and distributed microservices style architectures) will require to provide a log aggregation & analysis feature to its dev & operations teams. Reviewing log entires from 10s or 100s of server instances is not something to take lightly. Whether you choose to use a commercial product or an open source offering – that does not matter; just make sure you have one available.

Recently I have been deploying applications using AWS Beanstalk. You can definitely configure CloudWatch Logs to send log streams over to AWS ElasticSearch service. Log messages can be routed to a Lambda function which would break the log messages into individual attributes suitable for indexing. I wanted to try a slightly different route where I depend less on CloudWatch Logs and more on open source tools. Enter filebeat on Beanstalk.

In standard ELK architecture one would use Logstash agents on each server instance to collect the logs, break (grok) the logs into attributes on a central set of Logstash instances and then ingest them into ElasticSearch (and finally serve them up using Kibana). In this blog I show how you can eliminate the use of Logstash completely. Filebeat will watch the logs and send them directly to an ElasticSearch Pipeline endpoint (pipelines were introduced in ElasticSearch 5 –

Not having to manage a central set of Logstash instances to perform data prep (grok’ing) will simplify the architecture. Note that ElasticSearch open source supports many ingest processors. Unfortunately AWS ElasticSearch service only supports a very few of these. Hopefully over time this will change. But good news is that the Grok processor is supported and that is what helps us eliminate Logstash.


  1. Install filebeat on the Beanstalk EC2 instances using ebextensions (the great backdoor provided by AWS to do anything and everything on the underlying servers :))
    • It is important that you never SSH into the individual servers and configure them individually. This is critical since we want to be prepared for scale up or down situations using Auto Scaling. Using ebextensions serves that purpose.
  2. Use the same ebextensions to create the /etc/filebeat/filebeat.yml file on the EC2 instance.
  3. Finally the ebextension will start the filebeat process.
  4. For this example I am interested in collecting application logs from Beanstalk instances (running a Spring Boot jar service) and nginx web server logs. The filebeat.yml focuses on just these two files. But the sample gives you a good idea how to extend to many more log files each with different formats.
  5. Each log file is routed to a specific ElasticSearch ingest pipeline. Each of the pipelines uses Logstash Grok patterns to parse the log format into individual attributes.
  6. Go into the AWS Console and create yourself a ElasticSearch domain. A single node is enough for this excercise. If using a single node the status of the ElasticSearch cluster will show as Yellow which indicates that we dont have a second node to copy the replicas over. Cause there aint no point in copying replicas to the same node that has the primary shard. If you add a second node then the cluster state will move to Green.
  7. Each pipeline uses one processor (Grok processor in this case). See the two in pipeline_accesslogs.json and pipeline_applogs.json. Use Kibana (dev tools feature) to create the two pipelines. Post this only will we be able to ingest directly using ElasticSearch.

Here is the configuration code…

For filebeat.yml the ElasticSearch endpoint url is fake so dont bother to try that. Also I do not have SSL turned on for this example (another day).


A list of common Grok ingest patterns can be found at

The code sample can be found on my gitbub account at . For the purposes of this blog you can ignore all the eureka stuff.

Final note. AWS makes it easy to setup and use ElasticSearch. For that ease it takes away the ability to use many common plugins or create your own custom plugins. If you can live within those limitations then this service is fine for you. But if you need more control of your cluster then you need to setup your own cluster barebones or maybe consider’s hosted solution. For serious production use you will need to turn on dedicated master nodes. For this blog don’t bother with that.