How to Integrate AWS Cloudtrail logs in Logstash

Sarath Pillai's picture
Integrate Cloudtrail Logs to Logstash

Amazon recently announced a service called as AWS CloudTrail. Its basically a service provided by AWS cloud, so that a user can have logs that contain all API calls. Some of the information provided by AWS Cloudtrail are mentioned below.

  • API request parameters,
  • Source Address of the request
  • Service involved in the request
  • Request user agent
  • User Identity and ARN, from where the request originated
  • Time and region and much more.

 

The location where aws Cloudtrail stores these log files are S3.

If you are new to AWS, then s3 is nothing but highly scalable, object storage solution provided by AWS cloud.

Read: What is Object Storage and how it differs from Block Storage

 

Now if you want to view those log files, created by AWS Cloudtrail, you will have to download them, and then uncomress it and view it in raw text format. The default format in which aws Cloudtrail stores these log messages are JSON.

A central logging server like Loggly, or Logstash is needed to store it, for future analysis. Also viewing logs using Central log server like logstash can provide an end user with an analytical data. Like its easy to search, query what happened in your desired time interval etc. 

If you are new to logstash and its central logging features, then i would recommend reading the below article to set that up. Because explaining logstash in here is not possible as it requires special attention.

 

Read: How to Configure Logstash central Logging server with Kibana

 

In my particular use case, i wanted AWS cloudtrail logs to be viewable inside my logstash Kibana web interface. At start it sounded a bit easy to configure, because there were already features in logstash to connect to s3 bucket and pull logs from there. 

Although it looked like a cakewalk in the beginning, nothing worked. So i started googling for a possible fix to integrate cloud trail to logstash. I could only see a couple of blog/forum posts regarding the same.

 

https://groups.google.com/forum/#!topic/logstash-users/DjtAsS5Wi2o

 

I got a couple of ideas from the above Google forum, so that i can make a workaround for my problem of integrating cloudtrail to logstash. My workaround is highly inspired from an idea shared by a user on the above forum.

 

I found the below statement in AWS Cloudtrail Official FAQ section. Which was quite evident from the files that i saw being created inside my s3 bucket.

 

Cloudtrail delivers log files to s3 bucket, approximately every 5 minutes.  - AWS  Cloud Trail FAQs

 

An important fact to note here about the cloudtrail log delivery to your s3 bucket is the fact that, it will not deliver any log files, in case there were no API calls made during the last 5 minutes. 

In my case, i have a couple of scripts running in my environment, which fetches data from AWS API, every second. So i am rest assured that there will be a log file delivery every five minutes(Due to the API calls made every second by my environment specific scripts).

 

Now the logic is below.. We will have to do the below stuff to pull those logs from s3 bucket, and feed it to our elasticsearch/logstash.

 

  1. I will have to fetch the latest cloudtrail log file delivered to my s3 bucket every 6 minutes.
  2.  I will have to uncompress that JSON file
  3. Then feed it to another file (from where logstash will be picking log events)

 

Its a good thing that the cloudtrail logs are in JSON format...But in a way its a bad thing as well, while you integrate it to logstash. This is because, by default cloud trail log files are in the form of a continues series of events in an array. Each log file will have multiple events inside a single array called as Records.

 

Now the problem is, logstash does not deal with this continues array of json events in a nice manner. I have tried but it was not getting injected into logstash in the way i wanted. 

 

As i told before, its a good thing that cloudtrail stores logs in JSON format, because can take JSON data from a file quite nicely and feed it to elasticsearch. The good thing with json events is that logstash will create awesome filtered logs for the user to view inside kibana. As its in JSON format, you do not have to worry about writing your own custom grok regex filters.  Things i have used to get this cloudtrail to logstash working are mentioned below. 

 

  • s3cmd (this is a command line tool to access s3 in linux)
  • Bash scripting. This is basically to pull files from s3 every 6 minites, decompress it, make it in a format so that logstash can read the JSON events as single line events, and then feed it to another file, for logstash to pick from. 
  • JQ. Its nothing but a tool used to modify JSON data to your requirement. Its something like grep, sed and awk(but for JSON )

 

# cat /opt/scripts/pull_cloudtrail.sh
#!/bin/bash
year=`date +%Y`
month=`date +%m`
day=`date +%d`
filetopull=`s3cmd ls  s3://my-s3-bucket/AWSLogs/<put your account number here>/CloudTrail/us-east-1/$year/$month/$day/* | tail -1 | awk '{print $4}'`
fileforgunzip=`basename $filetopull`

finalfile=`basename $filetopull .gz`
s3cmd get $filetopull
gunzip $fileforgunzip
cat $finalfile  | /opt/scripts/jq -r -M .Records[] -c >> /var/log/cloudtrail/cloudtrail.json
rm -f $finalfile

 

The above script assumes that you have s3cmd installed on your logstash server (from, where you will be running this script as a cron), with proper credentials, and has full read access to your cloudtrail s3 bucket.

 

AWS cloud trail s3 bucket has the following style of directory structure. Its organized in a account number --> Region --> Year --> Month --> Day format. For example the below URL will contain log files for 16th June, 2014 for an aws account with an imaginary account number 937635387538.

 

 s3://my-cloudtrail-bucket/AWSLogs/937635387538/CloudTrail/us-east-1/2014/06/16/

 

So our script will fetch files from our cloudtrail bucket, based on the current date reported.  Once it reaches the correct folder, then it will fetch the name of the last created file. Then using s3cmd, as shown in the script, we will pull that file, and unzip it.

 

Now we need another tool called as JQ, So that we can split the events inside the array Records to single line events. So that logstash can easily pick it. You can download JQ from the below link.

 

Download JQ

 

Simply download JQ from the above link, and give it executable permission in linux (chmod +x jq). Now we will use JQ to break the array of Records to single line events as shown below in the script.

cat $finalfile  | /opt/scripts/jq -r -M .Records[] -c >> /var/log/cloudtrail/cloudtrail.json

 

Finally remove the downloaded file from s3 bucket from your server (evident from the script shown above.)

 

Now you need to add input and JSON filters to your logstash central configuration, so that it can pick cloudtrail events from our final log file /var/log/cloudtrail/cloudtrail.json. Add the below in your logstash input section, on your logstash central server (where we are pulling and saving s3 cloudtrail logs)

 

file {
type => "cloudtrail"
path => "/var/log/cloudtrail/cloudtrail.json"
codec => "json"
}

 

 

You do not need to add any filters other than the above shown input stuff, to your logstash central configuration file. Adding the codec of JSON will take care of the rest.

 

Once you are done with adding the above input config, restart your logstash central server process, and you will be able to see the cloudtrail events now coming inside kibana interface.

 

In the kibana search box, enter type:"cloudtrail"  So that kibana will show all events with type cloudtrail from elasticsearch.  As all of these events, that we are directing to our cloudtrail.json file are trimmed using JQ into single line json events, kibana will show all those JSON filters, given by cloudtrail.

 

Things to note about this Cloudtrail Logstash integration.

 

  1. Add a logrotate for our pulled logs /var/log/cloudtrail/cloudtrail.json(So that they do not grow above a certain size)
  2. In my specific architecture, am sure that the logs will be created every five minutes(as i have a couple of scripts running everytime, that queries the AWS API...As per aws faqs, if there are no API requests during a period, then no logs will be delivered to S3...) In such case our pulling down of last created file from S3 can cause duplicate entries.

 

So i would recommend to add another logic in the script to compare the previously downloaded file with the current file being downloaded. If the file name is same, then do not download, otherwise it will add duplicate events.  Hope this article was helpful in getting cloudtrail integrated with your logstash central log server.

 

There is another working method to achieve this, which is discussed in the below link(which involves a custom logstash jar file to work). Let me know your reviews through comments.

http://techblog.mdsol.com/2014/01/27/parsing-amazon-cloudtrail-json-logs-with-a-customized-logstash-build.html

Rate this article: 
Average: 3.5 (40 votes)

Comments

hi , I am getting an error while executing the scrpit

s3://s195-cloudtrail/AWSLogs/858677348233/CloudTrail/us-east-1/2014/08/20/858677348233_CloudTrail_us-east-1_20140820T0645Z_H8JS8vTDcxJwAdB9.json.gz -> ./858677348233_CloudTrail_us-east-1_20140820T0645Z_H8JS8vTDcxJwAdB9.json.gz [1 of 1]
4734 of 4734 100% in 0s 172.05 kB/s done
./pull_cloudtrail.sh: line 11: /opt/scripts/jq: is a directory
cat: 858677348233_CloudTrail_us-east-1_20140820T0645Z_H8JS8vTDcxJwAdB9.json : No such file or directory

please help

We wrote a container called TrailDash (https://github.com/AppliedTrust/traildash) that does exactly this!

Hi,
I'm trying to achieve this by using a sample cloudtrail file stored locally, its not current working. I do hope you can guide me if i still need to instal any plugin?

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.