Skip to main content

Posts

Showing posts from October, 2017

AWS Athena and S3 Partitioning

Athena is a great tool to query your data stored in S3 buckets. To have the best performance and properly organize the files I wanted to use partitioning. Although very common practice, I haven't found a nice and simple tutorial that would explain in detail how to properly store and configure the files in S3 so that I could take full advantage of the Athena partitioning features. After some testing, I managed to figure out how to set it up. This is what I did: Starting from a CSV file with a datetime column, I wanted to create an Athena table, partitioned by date. A basic google search led me to this  page , but It was lacking some more detailing. The biggest catch was to understand how the partitioning works. Based on a datetime column(processeddate), I had to split the date into the year, month and day components to create new derived columns, which in turn I'll use as the partition keys to my table Example of d...