This Guidance demonstrates how Consumer Packaged Goods (CPG) companies can ingest data into a modern data environment to enable advanced analytics. Companies can develop efficient and secure applications that integrate with Amazon Seller and Vendor Central and gain valuable insights into their Amazon Retail data, including product catalog updates, sales, shipments, and payments.
Architecture Diagram
-
Overview
-
Authentication and Authorization
-
Serverless Reports Application
-
Serverless Catalog Items and Listing Items Applications
-
Data Storage, Movement, and Insights
-
Overview
-
Please note: This is the overview architecture. For diagrams highlighting different aspects of this architecture, open the other tabs.
Overview
Consumer Packaged Goods (CPG) companies who sell on Amazon.com need access to seller and vendor central data to manage orders, keep product catalogs updated, and to keep track of sales, shipments, and payments.
With this architecture, CPG companies can ingest data into a modern data platform to enable advanced analytics. Each of the following diagrams highlights a different aspect of this work, comprising:
-
Authentication and Authorization
-
Step 1
AWS Step Functions is used to create a serverless application to interact with the SP-API. The app is registered and authorized in Amazon Seller Central and Amazon Vendor Central.Step 2
Once authorized, you get a Login with Amazon (LWA) refresh token. The LWA refresh token is a long-lived token which you can store in AWS Secrets Manager.
Step 3
In order to make API calls to the SP-API, your application needs an LWA access token. An AWS Lambda authentication function is used to first check with Secrets Manger to see if a valid LWA access token exists. If no valid LWA access token exists, the function retrieves an LWA refresh token from Secrets Manager, then exchanges the LWA refresh token for an LWA access token from the SP-API authentication server.Step 4
The LWA access token expires one hour after it is issued. To avoid having to retrieve an access token for each API call, you can cache the LWA access token in Secrets Manager, which can then be used for successive calls until expiry.
-
Serverless Reports Application
-
Please note: currently, the REPORT_PROCESSING_FINISHED notification type only works for seller applications. Vendor applications will have to use a polling method.
Step 1
Step Functions is used as a serverless orchestration service to centrally manage the workflow for integrating with the Selling Partner API (SP-API).Step 2
The SP-API Reports API supports notifications to automate reports workflows. For this, a Lambda function is used to subscribe the application to the REPORT_PROCESSING_FINISHED notification type.Step 3
In order to make calls to the SP-API, an authentication Lambda function is used to obtain a Login with Amazon (LWA) access token as described in the previous Authentication and Authorization diagram.
Step 4
The LWA access token from the authentication function is passed to a report creator Lambda function. This function uses regional endpoints, marketplace IDs, and report configurations data stored in Parameter Store, a capability of AWS Systems Manager, along with the LWA access token to make a createReport call to the SP-API.
Step 5
The SP-API will then generate the report and upon completion, a REPORT_PROCESSING_FINSIHED notification event is sent to a Amazon Simple Queue Service (Amazon SQS) queue, which provides information when report processing is CANCELLED, DONE, or FATAL. This triggers a Lambda function to process the event. If the notification event has a status of DONE, a reportDocumentId will be included.Step 6
The notification event is then passed to a data processing Lambda function in our Step Functions workflow. The data processing function uses the reportDocumentId to make a getReportDocument call to the SP-API. The SP-API returns a pre-signed URL for the location of the report document and the compression algorithm used, if the report document contents have been compressed.
Step 7
This response is then passed to a storage Lambda function which downloads the report document, decompresses it if applicable, and stores the report document in Amazon Simple Storage Service (Amazon S3).Step 8
AWS Key Management Service (AWS KMS) is used to centrally mange encryption keys, which can be used to encrypt our secrets in Secrets Manager. Data is stored in Amazon S3 and Parameter Store.Step 9
SP-API requests are limited using the token bucket algorithm, so an API client is recommended for rate limiting. -
Serverless Catalog Items and Listing Items Applications
-
Step 1
Step Functions is used as a serverless orchestration service to centrally manage the workflow for integrating with the Selling Partner API (SP-API).Step 2
In order to make calls to the SP-API, an authentication Lambda function is used to obtain a Login with Amazon (LWA) access token as described in the previous Authentication and Authorization diagram.Step 3
The LWA access token from the authentication function is passed to a data processing Lambda function. This function uses regional endpoints and marketplace IDs stored in Parameter Store, and ASINs, SKUs, and Seller IDs stored in Amazon DynamoDB along with the LWA access token to make an API call to the Catalog Items or Listing Items API of the SP-API.
Step 4
When a response is returned, it is then passed to a storage Lambda function which stores the data in Amazon S3.
Step 5
AWS KMS is used to centrally mange encryption keys, which can be used to encrypt our secrets in Secrets Manager and our data stored in Amazon S3, DynamoDB, and Parameter Store.
Step 6
SP-API requests are limited using the token bucket algorithm, so an API client is recommended for rate limiting.
-
Data Storage, Movement, and Insights
-
Step 1
AWS Lake Formation is used to build the scalable data lake, and Amazon S3 is used as the data lake storage.Step 2
Lake Formation is also used to enable unified governance to centrally manage the security, access control, and audit trails.Step 3
AWS Glue and AWS Glue DataBrew are used to catalog, transform, enrich, move, and replicate data across multiple data stores and the data lake.Step 4
Amazon Athena enables interactive querying, analyzing, and processing capabilities.
Step 5
Amazon Redshift is used as a Cloud Data Warehouse.
Step 6
Amazon QuickSight provides machine learning-powered business intelligence.
Step 7
Amazon EMR provides the cloud big data platform for processing vast amounts of data using open source tools.Step 8
Amazon OpenSearch Service can be used for operational analytics.Step 9
Amazon SageMaker can be used to build, train, and deploy machine learning models, and add intelligence to your applications.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance consists of serverless services such as Lambda, Step Functions, and Amazon S3 that are loosely coupled and have built-in version control capabilities for implementing changes.
-
Security
This Guidance uses a self-authorization model with Amazon Vendor Central. Applications you create are registered in Vendor Central, where you receive a Login with Amazon (LWA) refresh token. These refresh tokens are securely stored in Secrets Manager. LWA refresh tokens are exchanged for LWA access tokens. The LWA access tokens, along with IAM and AWS STS, are used to securely make API calls to Amazon Vendor Central, leveraging well-defined user access permissions.
-
Reliability
This Guidance consists of serverless and fully managed services with built-in reliability due to a combination of a service-oriented architecture (like the use of Step Functions to create a serverless application) and microservices (where Step Functions uses AWS STS to execute the call). Selling Partner API requests are limited using the token bucket algorithm, so an API client is recommended for rate limiting.
-
Performance Efficiency
Scalable and highly available services such as Amazon S3, Lambda, DynamoDB, and Amazon SQS are used as core components to increase performance.
-
Cost Optimization
This architecture is designed with a serverless-first approach, leveraging services such as Step Functions, Lambda, DynamoDB, and Amazon S3 for cost efficiency.
-
Sustainability
Consisting of mostly serverless services, this Guidance reduces the number of resources consumed, contributing to greater sustainability.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Ingest Amazon Retail Data into a Serverless Modern Data Architecture
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.