How do I resolve processing errors in Amazon Neptune Bulk Loader?

3 minute read
0

I'm trying to use Amazon Neptune Bulk Loader to load data from an Amazon Simple Storage Service (Amazon S3) bucket. However, some of the requests fail. How do I troubleshoot this?

Short description

To troubleshoot data requests that keep failing, check the status of each job. Then, identify the failed jobs by doing the following:

  • Use the default Bulk Loader API for each individual load and check each job's status.
  • Use an admin script and an automated script in one job. You can create and run the automated script on a Linux or UNIX system.

Note these limitations:

  • The Neptune Bulk Loader API doesn't provide a snapshot view of all load operations.
  • If AWS Identity and Access Management (IAM) authorization is enabled on the Neptune cluster, then the requests to the Bulk Load API must be signed.
  • The Bulk Loader API caches information only on the last 1024 load jobs. It only stores error details for the last 10,000 errors per job.

Resolution

Use the default Bulk Loader API

1.    Retrieve the loader IDs:

$ curl -G  'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader'|jq
{
  "status": "200 OK",
  "payload": {
    "loadIds": [
      "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "6f6342fb-4ea3-452c-ac69-b4d117e37d5a",
      "647114a6-6ed4-4018-896c-e84a08fcf864",
      "521d33fa-7050-44d7-a961-b64ef4e2d1db",
      "d0d4714e-7cf8-415e-89f5-d07ed2732bf2"
    ]
  }
}

2.    Check each job's status, one by one, to verify that the job was successful:

curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader/c32bbd24-99a7-45ee-972c-21b7b9cab3e2?details=true&errors=true&page=1&errorsPerPage=3'|jq
{
  "status": "200 OK",
  "payload": {
    "feedCount": [
      {
        "LOAD_COMPLETED": 2
      }
    ],
    "overallStatus": {
      "fullUri": "s3://demodata/neptune/",
      "runNumber": 5,
      "retryNumber": 0,
      "status": "LOAD_COMPLETED",
      "totalTimeSpent": 3,
      "startTime": 1555574461,
      "totalRecords": 8,
      "totalDuplicates": 8,
      "parsingErrors": 0,
      "datatypeMismatchErrors": 0,
      "insertErrors": 0
    },
    "errors": {
      "startIndex": 0,
      "endIndex": 0,
      "loadId": "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "errorLogs": []
    }
  }
}

Use an admin script

You can use an admin script to identify a failed Neptune Bulk Loader job in your production process. The admin script generates an output in the following format for all load jobs:

Startime-loadid:status,S3location,Errors

Note: The admin script can be used from any Linux system that has access to the Neptune cluster.

Create and run the automated script on a Linux or UNIX system

1.    Create the script using a text editor:

$ vi script

2.    Be sure that you replace cluster-endpoint:Port with the appropriate values:

cluster_ep="https://cluster-endpoint:Port/loader"

for loadId in $(curl --silent -G "${cluster_ep}?details=true" | jq '.payload.loadIds[]');
do
        clean_loadId=$(echo -n ${loadId} | tr -d '"')
        time=$(date -d@$(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.startTime'))
        echo -n $time '-'
        echo -n ${clean_loadId}: $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.status')
        echo -n ',S3 LOCATION': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.fullUri')
        echo -n ',ERRORS': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=truei&errors=true&page=1&errorsPerPage=3" | jq '.payload.errors.errorLogs')

        echo
done

3.    Save the script, and then provide permissions for the script to run:

chmod +x script

4.    Install the dependent library:

sudo yum install jq

5.    Run the script:

$ ./script

This is example output:

Thu Apr 18 08:01:01 UTC 2019 -c32bbd24-99a7-45ee-972c-21b7b9cab3e2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:04:00 UTC 2019 -6f6342fb-4ea3-452c-ac69-b4d117e37d5a: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:01:30 UTC 2019 -647114a6-6ed4-4018-896c-e84a08fcf864: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:36:02 UTC 2019 -521d33fa-7050-44d7-a961-b64ef4e2d1db: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:35:45 UTC 2019 -d0d4714e-7cf8-415e-89f5-d07ed2732bf2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null

Related information

Example: Loading data into a Neptune DB instance

Neptune Loader Get-Status API

AWS OFFICIAL
AWS OFFICIALUpdated 3 years ago