AWS for Industries
Ingest Amazon Retail Data into a Serverless Modern Data Architecture
Consumer packaged goods (CPG) companies, which sell a vast number of products through Amazon.com, need scalable, cost-effective, and accessible data visibility into Amazon Seller and Vendor Central. This enables the management of orders, keeping product catalogs up-to-date, accessing inventory insights, and tracking sales, shipments, and payments.
Over the past several years, CPG companies have outsourced or built tools to scrape data from Amazon to get the necessary information. Recently, Amazon created the Selling Partner API (SP-API) to make Amazon retail data programmatically accessible for developers.
In this blog post, we will provide an overview of a Retail and CPG client application leveraging Amazon Web Services, Inc. (AWS) serverless and managed services, and how to integrate that application with a Modern Data Architecture using SP-APIs. By leveraging AWS serverless services, an IDC study found that organizations were able to drive business value in four main areas: cost savings, staff productivity, operational resilience, and business agility. This solution provides the additional value of pulling in valuable Amazon retail data, which can help organizations to make better data-driven decisions.
This solution provides a best practice approach to help CPG companies achieve their Amazon retail data analytics goals by providing an alternative to scraping data—enabling a more modern and efficient API approach. This solution is relevant to sellers, vendors, and third-party Amazon sellers who manage multi-brand Amazon sites.
Overview of solution
Using this solution customers are able to land their Amazon data in their own storage bucket or into their data lake on AWS. AWS works closely with Amazon SPDS (Selling Partner Development Services) and can stay ahead of any changes to the data APIs for interfacing and ingesting data from Amazon Seller and Vendor Central. The AWS solution removes this complexity and speeds up the delivery of the solution for customers.
Figure 1 – High-level overview of Amazon Seller and Vendor Central Data Producer
Walkthrough
The AWS solution consists of four main components:
- Authentication and Authorization
- Serverless Reports Application
- Serverless Catalog Items and Listing Items Applications
- Data Storage, Movement, and Insights
Figure 2 – High-level overview of Amazon Seller and Vendor Central Data Producer Authentication and Authorization
Authentication and Authorization
In order to interact with the Selling Partner APIs (SP-APIs), we must first register as a developer. Since this is either a private seller or vendor application, we follow the steps from the SP-API documentation entitled To register as a private developer for private seller applications or To register as a private developer for private vendor applications.
The core component of our Application is AWS Step Functions. This is a serverless orchestration service which allows us to centrally manage a workflow. The steps of our AWS Step Function that make API calls to the SP-API endpoint. To register our application with the SP-API, follow the steps from the SP-API documentation, entitled Registering your Application.
The authorization model for the SP-API is based on Login with Amazon, Amazon’s implementation of OAuth 2.0. Since it is a private application, we use a self-authorization procedure from the SP-API documentation entitled To self-authorize your application (seller application) or To self-authorize your application (vendor application). When we authorize the application, a Login with Amazon refresh token appears each time we choose “Authorize App.” A Login with Amazon refresh token is a long-lived token that we will exchange later for an access token. It’s important to mention that choosing “Authorize App” multiple times will generate a new refresh token each time. Generating a new refresh token does not invalidate previous refresh tokens. If we have multiple seller accounts or vendor groups, we can save a refresh token for each one. To securely store our refresh tokens, we create a secret in AWS Secrets Manager. AWS Secrets Manager is a secrets management service which enables us to rotate, manage, and retrieve our Login with Amazon refresh and access tokens.
For each API call we make to the SP-API, we must include a Login with Amazon access token. To do this, for each application we create, an AWS Lambda function will be used as an authentication function. The authentication function follows this workflow:
- Check with AWS Secrets Manager for a valid Login with Amazon access token
- If no valid Login with Amazon access token exists, call AWS Secrets Manager to retrieve a Login with Amazon refresh token
- Make secure HTTP POST to the Login with Amazon authentication server
- A successful response includes a Login with Amazon access token along with an expires_in value represented in seconds (a Login with Amazon access token expires in one hour after it is issued)
- The Login with Amazon access token is then cached in AWS Secrets Manager and can be used for additional calls before it expires to avoid having to retrieve a new access token before each call
Figure 3 – High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Reports Application
Serverless Reports Application
Now that we have discussed how to register and authorize an SP-API application, we will cover how to build out different applications, starting with the Reports Application. This Reports Application is a serverless application which will interact with the Reports API of the SP-API. For Sellers, our Reports Application is designed to automatically retrieve and process the reports we create through the use of notifications. However, Vendor applications do not yet support the REPORT_PROCESSING_FINISHED notification type, and must instead use a polling method to retrieve reports. This section will cover building the automated notifications workflow for seller applications.
In order to receive these notifications, we must first subscribe to the notification type of interest. This architecture leverages an AWS Lambda function to subscribe our application to the REPORT_PROCESSING_FINISHED notification type. To automate this workflow, we leverage Amazon Simple Queue Service (Amazon SQS) by following this tutorial: Set up notifications (Amazon Simple Queue Service workflow).
Now that we have our notifications configured, we can create reports. To do this, AWS Step Functions is used as our serverless orchestration service to centrally manage the workflow. Within our AWS Step Functions workflow, an AWS Lambda function gets a Login with Amazon access token (described in the Authentication and Authorization section prior). It is used to make the createReport API call to the Reports API using regional endpoint, marketplace ID, and report configuration data stored in AWS Systems Manager Parameter Store. The SP-API will then create this report, and upon completion a REPORT_PROCESSING_FINISHED notification event will be sent to our Amazon SQS queue, which provides information when the report processing is CANCELLED, DONE, or FATAL. This notification event triggers an AWS Lambda function which processes the notification. If the notification event has a status of DONE, a reportDocumentId will be included. This will be passed to a data processing function in our AWS Step Functions workflow. The data processing function uses the reportDocumentId to make a getReportDocument call to the SP-API. The SP-API returns a pre-signed URL for the location of the report document and the compression algorithm used if the report document contents have been compressed. This is then passed to our next AWS Lambda function, a storage function which downloads the report, decompresses it if needed, and stores the report document in an Amazon Simple Storage Service (Amazon S3) bucket.
Now that the report data is in S3, it can be consumed by downstream analytics applications, which we talk more about later. AWS Key Management Service (AWS KMS) is used throughout this architecture to provide secure encryption. AWS KMS allows us to centrally manage encryption keys, which can be used to encrypt our secrets in AWS Secrets Manager and our data stored in Amazon S3 and AWS Systems Manager Parameter Store.
Figure 4 – High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Catalog Items and Listing Items Applications
Serverless Catalog Items and Listing Items Applications
The Catalog Items and Listing Items applications are slightly different than the Reports application because the SP-API does not support notifications for these APIs. However, the same design principles were used for creating these applications. AWS Step Functions is used as a serverless orchestration service to centrally manage our workflow. Within this AWS Step Functions workflow, an AWS Lambda authentication function obtains a Login with Amazon access token (described in the Authentication and Authorization section prior). This token is passed to the data processing function which makes an API call to the SP-API using regional endpoints and marketplace IDs stored in AWS Systems Manager Parameter Store, and Amazon Standard Identification Number (ASIN), Stock Keeping Unit (SKU), and Seller IDs stored in Amazon DynamoDB. When a response is returned, the data is passed to a storage function, which then stores the data in Amazon S3.
Figure 5 – High-level overview of Amazon Seller and Vendor Central Data Producer Data Storage, Movement and Insights
Data Storage, Movement, and Insights
Now that our Amazon retail data has been ingested by way of the serverless applications we created, we can use AWS analytics services to structure, move, and gain insights from that data. Amazon S3 is the main storage service used for our data lake. Amazon S3 is an object storage service capable of storing and retrieving any amount of data from anywhere. AWS Lake Formation is used to create our scalable and secure data lake. With AWS Lake Formation we can ingest, clean, catalog, transform, and secure our data. Lake Formation provides a central location for us to configure granular data access policies, enabling us to protect our data regardless of which services are accessing the data.
For seamless data movement, AWS Glue and AWS Glue DataBrew are used. AWS Glue is a serverless integration service that makes it straightforward for data engineers and ETL (extract, transform, and load) developers to create, run, and monitor ETL workflows with AWS Glue Studio. AWS Glue DataBrew provides an interactive point-and-click visual interface that enables data to be enriched, cleaned, and normalized without writing code.
After processing and preparing our data, there is a whole suite of purpose-built AWS analytics services we can use to consume this data. To learn which AWS purpose-built analytics services may be the best fit for your organizations use case, under the AWS Analytics service section please view the “Predictive analytics and Machine Learning” section and under the Solutions Areas view the “Analytics and Data Warehousing” tab.
Conclusion
In this blog post, we demonstrated how CPG companies can build applications leveraging AWS serverless and managed services to securely integrate with the Selling Partner API. This solution shows how to design for authentication, authorization, and API integration in order to ingest your Amazon retail data into your AWS account. Once the data is ingested, it is then stored in a secure and scalable data lake. Purpose-built AWS analytics services can then be incorporated to move, process, and gain valuable insights from your Amazon retail data by using best practices for building a Modern Data Architecture on AWS.
This solution enables CPG companies to push towards becoming a more data-driven organization by pulling in valuable Amazon retail data to promote more data-driven decisions. In addition, this AWS serverless-based modern architecture approach can also help companies to realize additional business value by way of cost savings, staff productivity, operational resilience, and business agility.
To learn more about AWS ecommerce solutions for CPG, contact an AWS Representative to get started today, or visit the AWS for Consumer Packaged Goods homepage.