Why doesn't my MSCK REPAIR TABLE query add partitions to the AWS Glue Data Catalog?

2 minute read
0

When I run my MSCK REPAIR TABLE query, Amazon Athena returns a list of partitions. However, Athena fails to add the partitions to the table in the AWS Glue Data Catalog.

Short description

The following common reasons cause Athena to fail to add partitions to tables in the Data Catalog:

  • The AWS Identity and Access Management (IAM) user or role doesn't have a policy that allows the glue:BatchCreatePartition action.
  • The Amazon Simple Storage Service (Amazon S3) path is in camel case instead of lower case.
  • The partitions aren't formatted as the Hive style format, year=2023/month=01/day=01. Instead, the partitions are in the non-Hive style format, /2023/01/01/.

Resolution

Allow glue:BatchCreatePartition in the IAM policy

Review the IAM policies that are attached to the user or role that's used to run MSCK REPAIR TABLE. When you use the Data Catalog with Athena, the IAM policy must allow the glue:BatchCreatePartition action. If the policy doesn't allow this action, then Athena can't add partitions to the metastore. For an example of an IAM policy that allows the glue:BatchCreatePartition action, see AWS managed policy: AmazonAthenaFullAccess.

Change the Amazon S3 path to lower case

For MSCK REPAIR TABLE to add the partitions to Data Catalog, the Amazon S3 path name must be in lower case. 

For example, if the Amazon S3 path is in camel case, userId, then the following partitions aren't added to the Data Catalog:

  • s3://awsdoc-example-bucket/path/userId=1/
  • s3://awsdoc-example-bucket/path/userId=2/
  • s3://awsdoc-example-bucket/path/userId=3/

To resolve this issue, use the lower case userid:

  • s3://awsdoc-example-bucket/path/userid=1/
  • s3://awsdoc-example-bucket/path/userid=2/
  • s3://awsdoc-example-bucket/path/userid=3/

Run the ALTER TABLE ADD PARTITION command

Run the ALTER TABLE ADD PARTITION command to add partitions to the glue table for both Hive style and non-Hive style partitioned data.

The LOCATION clause is required for non-Hive style partitioned data. Make sure to include the full Amazon S3 path for the prefix that contains the partition data:

ALTER TABLE orders ADD
 
PARTITION (year = '2023', month = '01', day ='01') 
LOCATION 's3://mystorage/path/to/test/'

PARTITION (year = '2023', month = '01', day ='02') 
LOCATION 's3://mystorage/path/to/test/';

Related information

Partitioning data in Athena

Actions, resources, and condition keys for Amazon Athena

Actions, resources, and condition keys for AWS Glue

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago