Analyzing WAF logs at scale
Analyzing WAF logs at scale
September 12, 2021
We can push AWS WAF logs to S3 using Kinesis Data Firehose and then use Amazon Athena and AWS Glue on top of it to query the data efficiently.
However, we might need to use Glue Crawlers initially to identify the schema of WAF logs and then write a script to manually add partitions. This approach eliminates the need to run crawlers periodically, which can be quite expensive!
Once we add partitions, we might see some unusual files being generated in S3. Since we are pre-creating partitions, Hive adds temporary folders with naming patterns such as <name>_$folder$
. We can safely delete these folders once the partitions are created.
Command to delete those temporary files:
aws s3 rm s3://<bucket-name/prefix.../> --recursive --exclude "*" --include "*$folder$"
References
Last updated on