Amazon Comprehend provides natural language processing, Personal Identifiable Information (PII) detection and redaction, Custom Classification and Entity detection, and topic modeling, enabling a broad range of applications that can analyze raw text, and with some APIs, document formats like PDF and Word.
- Natural language processing: Amazon Comprehend APIs for entity recognition, sentiment analysis, syntax analysis, key phrase extraction, and language detection can be used to extract insights from natural language text. These requests are measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
- Personal Identifiable Information (PII): The detect PII API finds locations of chosen Personally Identifiable Information (“PII”) entities inside a document and can be used to create redacted versions of documents. The contains PII API tells you if a document contains the chosen PII or not. These requests are also measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
- Custom Comprehend: The Custom Classification and Entities APIs can train a custom NLP model to categorize text and extract custom entities. Asynchronous inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request. You are charged $3 per hour for model training (billed by the second) and $0.50 per month for custom model management. For synchronous Custom Classification and Entities inference requests, you provision an endpoint with the appropriate throughput. You are charged from the time that you start your endpoint until it is deleted.
- Topic Modeling: Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic. You are charged based on the total size of documents processed per job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.
- Trust and Safety (new): Comprehend toxicity detection API can be used to detect toxic content from text. Similarly, Comprehend prompt safety classification feature can be used to detect unsafe input prompts to large language models and applications. These requests are measured in units of 100 characters (1 unit = 100 characters), with a 3 unit (300 character) minimum charge per request.
- For Amazon Comprehend Medical pricing, learn more here.
- You can estimate your costs using the AWS Pricing Calculator.
- Select US East (N.Virginia) region in the region selector below to view pricing for all APIs
With Amazon Comprehend APIs, you can process both unstructured, raw text and, with some APIs, other text files like PDF and Word documents.
Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.
Inference requests are measured in units of 100 characters, with a 3 unit (300 character) minimum charge per request.
Endpoints are billed on one second increments, with a minimum of 60 seconds. Charges will continue to incur from the time you start the endpoint until it is deleted even if no documents are analyzed.
One inference unit (IU) provides a throughput of 100 characters/second on your managed endpoint. You can provision additional IUs for more throughput. Each IU will incur $0.0005 per second.
*to extract text from scanned PDF documents Amazon Textract Detect Document Text API is called.
For the first 100MB
For every MB above 100MB
You are charged based on the total size of documents processed per topic modeling job. The first 100 MB is charged a flat rate. Above 100 MB, you are charged per MB.
Amazon Comprehend offers a free tier covering 50K units of text (5M characters) per API per month.
Eligible APIs include Key Phrase Extraction, Sentiment, Targeted Sentiment, Entity Recognition, Language Detection, Event Detection, Syntax Analysis, Detect PII, Contains PII, and Prompt Safety Classification.
Note: Custom Comprehend (custom entities and custom classification) does not offer a free tier. This includes model training, inference, and model management.
The Amazon Comprehend free tier is available to both new and existing AWS customers for 12 months, starting from the date of their first Amazon Comprehend request.
Amazon Comprehend pricing examples
Example 1 - Analyzing customer comments
Let us assume you have built an application using Amazon Comprehend to analyze customer comments on your online store. You have received 10,000 customer comments that are 550 characters each, and you are in the second year of your use of the service.
Total charge calculation:
Size of each request = 550 characters
Number of units per request = 6
Total Units: 10,000 (requests) x 6 (units per request) = 60,000
Price per unit = $0.0001
Total cost = [No. of units] x [Cost per unit] = 60,000 x $0.0001 = $6.00
Example 2 - Categorizing documents by topics
Let us say you have a set of research documents totaling 240 MB in size that you want to categorize by topic and recommend documents to your customers based on their area of interest. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.
Total charge calculation:
Total megabytes processed = 240
Megabytes billed at a flat rate of $1 = 100
Megabytes billed at $0.004/MB = 140 [240-100]
Total cost of the job = $1.00 + [140 x $0.004] = $1.00 + $0.56 = $1.56
Example 3 - Classifying customer feedback using the custom classification API
Let us say you want to train a classifier to automatically organize new customer feedback that comes in from your website. 10 customers enter feedback every minute, and each piece of feedback is 300 characters. It takes one hour to train the custom model, and you are planning to keep this model for a month. So, model training costs will be $3 and model storage costs will be $0.5 for the month. Let us also assume that you are in the second year of your use of the service and are not eligible for the free tier offering.
To classify the feedback asynchronously you pay by number of characters in your documents. To classify in real time you provision an endpoint with enough throughput to handle your use case and pay for the time that the end point is up.
Inference cost calculation for asynchronous classification:
Size of each request per day = 4,320,000 characters [300 characters * 10 docs * 1,440 minutes]
Number of units per request = 43,200 units [432,000 characters ÷ 100 character per unit]
Price per unit = $0.0005
Total inference cost for units = $21.60 [43,200 units x $0.0005]
Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]
Total charge calculation for synchronous classification:
First, let’s calculate the required throughput. Every minute we’re classifying 10 documents of 300 character each. So that’s:
50 characters per second [300 characters x 10 documents ÷ 60 seconds]
So, you will need to provision an endpoint with 1 Inference Unit (IU), which gives a throughput of 100 characters/second.
Price for 1 IU = $0.0005 per second
You will incur costs depending on how long you’re keeping your real time classification endpoint active, regardless of how many inference calls are made.
If you’re running your real time classification endpoint for 12 hours per day:
Total inference cost = $21.60 [$0.0005 x 3600 seconds x 12 hours]
Total cost = $25.10 [$21.60 inference + $3 model training + $0.50 model storage]
Note that you incur cost for the throughput provisioned and for the amount of time the endpoint is active. If you needed to provision more throughput, the price would be:
Price for 2 IU = $0.001 per second [$0.0005 x 2]
Price for 3 IU = $0.0015 per second [$0.0005 x 3]
Example 4 - Analyzing customer comments using the custom entities API
Total charge calculation:
Size of each request = 5,500,000 characters
Number of units per request = 55,000 units [5,500,000 characters ÷ 100 character per unit]
Price per unit = $0.0005
Total cost for units = $27.5 [55,000 units x $0.0005]
Total hours for model training = 1.5 hours
Price per hour = $3
Total cost for model training = $4.5 [1.5 hours x $3]
Number of months for model management = 1 month
Price per month = $0.50
Total cost for model management = $0.50 [1 month x $0.50]
Total cost = $37 [$27.5 + $4.5 + $0.50]
Example 5 – Extracting events and the associated information using event detection
Total charge calculation:
Number of characters processed = 1,500,000 characters [3,000 articles x 500 characters]
Number of units processed = 45,000 units [1,500,000 x 3 event types ÷ 100 characters per unit]
Price per unit = $0.003
Total cost for units = $135 [45,000 units x $0.003]
Example 6 – Identifying documents with PII using the contains PII API
Total charge calculation:
Size of each request = 550 characters
Number of units per request = 6
Total Units = 60,000 [10,000 requests x 6 units per request]
Price per unit = $0.000002
Total cost = $0.12 [60,000 units x $0.000002]
Example 7 – Redacting PII from documents using the detect PII API
Total charge calculation:
Size of each request = 550 characters
Number of units per request = 6
Total Units = 60,000 [10,000 requests x 6 units per request]
Price per unit = $0.0001
Total cost = $6 [60,000 units x $0.0001]
Example 8 – Extracting mortgage application entities using the custom entity API
Inference cost calculation for asynchronous classification:
Size of each request per day = 2,500,000 characters [100 applications/day * 10 docs * 2,500 characters]
Number of units per request = 25,000 units [2,500,000 characters ÷ 100 character per unit]
Price per unit = $0.0005
Total inference cost for units = $12.50 [25,000 units x $0.0005]
Amazon Textract cost for Detect Document Text API= $1.50 [100 applications/day * 10 docs * $0.0015 price per page, up to 1M pages]
Total cost = $17.50 [$12.50 inference + $1.50 Textract + $3 model training + $0.50 model storage]
Example 9 – Analyzing employee survey responses
Total charge calculation:
Size of each request = 350 characters
Number of units per request = 4
Total Units: 100,000 (requests) x 4 (units per request) = 400,000
Price per unit = $0.0001 (from 0-10M units)
Total cost = [No. of units] x [Cost per unit] = 400,000 x $0.0001 = $40.00
Example 10 - Detecting toxicity in online comments on website
Total charge calculation:
Size of each request = 100 characters
Number of units per request = 1 Total
Units= 100M IUs [100M comments x 1 unit per request]
Price per unit = $0.0001 [from 0 - 10M IUs] + $0.00005 [from 10M - 50M IUs] + $0.000025 [from 50M – 100M IUs]
Total cost = [No. of units] x [Cost per unit]
= [10MX$0.001]+[40MX$0.00005]+[50MX$0.000025]
= $1,000 + $2,000 + $1,250
= $4,250
Example 11 - Detecting unsafe prompts in generative AI application
Total charge calculation:
Size of each request = 500 characters
Number of units per request = 5
Total Units= 50M IUs [10M comments x 5 unit per request]
Price per unit = $0.0001 [from 0 - 10M IUs] + $0.00005 [from 10M - 50M IUs] + $0.000025 [from 50M – 100M IUs]
Total cost = [No. of units] x [Cost per unit]
= [10M X $0.001]+[40M X $0.00005]
= $1,000 + $2,000
= $3,000
Learn more about Amazon Comprehend features