If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.
But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute, each with as few as one event in them. It comes out looking like this:
2019-09-14 13:26:38 835 s3/www.pa-furry.org/2019-09-14-17-26-37-5B75705EA0D67AF7
2019-09-14 13:26:46 333 s3/www.pa-furry.org/2019-09-14-17-26-45-C8553CA61B663D7A
2019-09-14 13:26:55 333 s3/www.pa-furry.org/2019-09-14-17-26-54-F613777CE621F257
2019-09-14 13:26:56 333 s3/www.pa-furry.org/2019-09-14-17-26-55-99D355F57F3FABA9
At that rate, you will easily wind up with 10s of thousands of logfiles per day. Yikes.
Dealing With So Many Logfiles
Wouldn’t it be nice if there was a way to perform rollup on those files so they could be condensed into fewer bigger files?
git clone email@example.com:dmuth/aws-s3-server-access-logging-rollup.git
npm install -g serverless
Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.
So once you have the code, here’s how to deploy it:
cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
You’ll want to edit
serverless.xml to list your source and destination buckets, as well as the level of rollup desired (hourly, daily, or monthly). I recommend starting with daily rollups.
Once the app is deployed, you can either wait for the first invocation of the script, or if you’re impatient and want to see results now, you can invoke the script manually:
serverless invoke -f rollup -l
If the app runs without any errors, you will see rolled up logfiles in the destination directory and the original logfiles will be removed. This will keep the number of logfiles to a minimum. Also, the app will recursively go through your source directory, and preserve that directory structure when writing the rolled up logfiles:
2019-09-10 20:02:56 3930277 rollup-day/www.pa-furry.org/2019-09-10
2019-09-11 20:02:56 4304119 rollup-day/www.pa-furry.org/2019-09-11
2019-09-12 20:02:56 3991237 rollup-day/www.pa-furry.org/2019-09-12
Analyzing The Rolled Up Logfiles
Now that there are far fewer logfiles, how does one go about analyzing them? Guess what? I built an app for that too, because of course I did. The app in question is built on top of Splunk Lab and is fairly straightforward to install.
aws s3 sync s3://my-accesslogs/rollup-day/ logs
bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-aws-s3-server-accesslogs/master/go.sh)
The above commands will pull down your logs into a directory called
logs/, and then start a Docker container for the Splunk image. The logs will be ingested automatically, and you’ll then be able to log into https://localhost:8000/ and view some pretty graphs:
So there you have it–a way to consolidate your AWS logfiles and then perform some analytics on them.
Do you have another way to consolidate or do analytics on your AWS logs? Let me know in the comments!
View on my blog at: https://www.dmuth.org/doing-rollups-of-aws-s3-server-access-logs/