Amazon Neptune features

What is Neptune?

Amazon Neptune is a graph analytics and serverless database that makes it easy to create and manage interactive graph applications at any scale.

Amazon Neptune Database

Neptune Database is a fully managed graph database that you can use to search and query billions of relationships in milliseconds across thousands of concurrent queries. It provides high-availability configurations, dynamic scalability with serverless, multi-Region support for increased resiliency, and integrations with other AWS services such as Amazon SageMaker and Amazon OpenSearch Service.

Neptune automatically scales storage, growing storage and rebalancing I/O operations to provide consistent performance without the need for overprovisioning. Neptune storage is fault-tolerant and self-healing, and disk failures are repaired in the background without loss of database availability. Neptune is designed to automatically detect database crashes and restart without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Neptune will automatically fail over to one of up to 15 read replicas.

SQL queries for highly connected data are complex and hard to tune for performance. Instead, Neptune allows you to use the popular graph query languages Apache TinkerPop Gremlin and W3C’s SPARQL and openCypher to run powerful queries that are easy to write and perform well on connected data. This significantly reduces code complexity, and allows you to more quickly create applications that process relationships. You can quickly launch a Neptune database instance with a few steps in the Neptune console.

Amazon Neptune Analytics

Neptune AnalyticsNEW is an analytics database engine that supports graph analytics, graph algorithms, and vector search of graph data stored in Amazon S3 buckets or a Neptune database. You can analyze tens of billions of relationships in seconds. With Neptune Analytics, you can load data from an existing Neptune database or Amazon Simple Storage Service (Amazon S3) with a few simple API calls and satisfy the most demanding graph analytic workloads. You can select an existing Neptune database as the data source, which will be automatically loaded into Neptune Analytics. You can also choose to have Neptune Analytics load graph data directly from Amazon S3 using CSV files in common graph export formats.

Amazon Neptune ML

Neptune ML is an integration between a Neptune database and SageMaker. Neptune ML trains graph neural networks (GNNs), a machine learning (ML) technique purpose built for graphs, to make fast and more accurate predictions using your graph data. Neptune ML supports real-time predictions on nodes, edges, and properties (entities) that were added to the graph after the ML model training process, giving you predictions on new data without retraining your ML models each time.

High performance and scalability

Amazon Neptune Serverless is an on-demand deployment option that automatically adjusts database capacity based on an application’s needs. Neptune Serverless can scale graph database workloads instantly to hundreds of thousands of queries. Neptune Serverless adjusts capacity to provide just the right amount of database resources that the application needs, and you pay only for the consumed capacity, saving up to 90% in database costs compared to peak capacity.

Neptune is a purpose-built, high-performance graph database. Neptune efficiently stores and navigates graph data and uses a scale-up, in-memory optimized architecture to allow for fast query evaluation over large graphs. With Neptune Database, you can use either Gremlin, openCypher, or SPARQL to run powerful queries that are easy to write and perform well. With Neptune Analytics, you can use openCypher.

With a few steps in the AWS Management Console, you can scale the compute and memory resources powering your production cluster up or down. With Neptune Database, you can scale by creating new replica instances of the desired size or by removing instances. Compute scaling operations typically complete in a few minutes.

Neptune Database uses a distributed and shared storage architecture that will automatically grow as your database storage needs grow. Neptune data is stored in a cluster volume that has Multi-Availability Zone (Multi-AZ) high availability. When a Neptune DB cluster is created, it is allocated a single segment of 10 GiB. As the volume of data increases and exceeds the currently allocated storage, Neptune automatically expands the cluster volume by adding new segments. A Neptune cluster volume can grow to a maximum size of 128 TiB in supported AWS Regions except China and GovCloud. You don't need to provision excess storage for your database to handle future growth.

With Neptune Database, you can increase read throughput to support high-volume application requests by creating up to 15 database read replicas. Neptune replicas share the same underlying storage as the source instance, lowering costs and avoiding the need to perform writes at the replica nodes. This frees up more processing power to serve read requests and reduces the replica lag time—often down to single-digit milliseconds. Neptune also provides a single endpoint for read queries so the application can connect without having to keep track of replicas as they are added and removed.

High availability and durability

The health of your Neptune database and its underlying Amazon EC2 instance is continually monitored. If the instance powering your database fails, the database and associated processes are automatically restarted. Neptune recovery does not require the potentially lengthy replay of database redo logs, so your instance restart times are typically 30 seconds or less. It also isolates the database buffer cache from database processes, allowing the cache to survive a database restart.

On instance failure, Neptune automates failover to one of up to 15 Neptune replicas you have created in any of three AZs. If no Neptune replicas have been provisioned, in the case of a failure, Neptune will attempt to automatically create a new database instance for you.

For Neptune Database, each 10 GiB chunk of your database volume is made durable across three AZs. Neptune Database uses fault-tolerant storage that transparently handles the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability. Neptune Database storage is also self-healing—data blocks and disks are continually scanned for errors and automatically replaced.

Backup capability in Neptune Database enables point-in-time recovery for your instance. This allows you to restore your database to any second during your retention period, up until the last 5 minutes. Your automatic backup retention period can be configured up to 35 days. Automated backups are stored in Amazon S3, which is designed for 99.999999999% durability. Neptune backups are automatic, incremental, and continual and have no impact on database performance.

Database snapshots are user-initiated backups of your instance stored in Amazon S3 that will be kept until you explicitly delete them. They use the automated incremental snapshots to reduce the time and storage required. You can create a new instance from a database snapshot whenever you desire.

Amazon Neptune Global Database is designed for globally distributed applications, allowing a single Neptune database to span multiple Regions. It replicates the graph data with little impact to database performance, enables fast local reads with low latency in each Region, and provides disaster recovery in case of Region-wide outages.

Highly secure

Neptune Database runs in Amazon Virtual Private Cloud (Amazon VPC), which allows you to isolate your database in your own virtual network and connect to your on-premises IT infrastructure using industry-standard, encrypted IPsec VPNs. In addition, by using the Neptune VPC configuration, you can configure firewall settings and control network access to your database instances.

Neptune is integrated with AWS Identity and Access Management (IAM) and provides you with the ability to control the actions that your IAM users and groups can take on specific Neptune resources including database instances, database snapshots, database parameter groups, database event subscriptions, and database options groups. In addition, you can tag your Neptune resources and control the actions that your IAM users and groups can take on groups of resources that have the same tag (and tag value). For example, you can configure your IAM rules to ensure developers are able to modify "Development" database instances, but only database administrators can modify and delete "Production" database instances.

Neptune provides fine-grained access to users retrieving Neptune data plane APIs with IAM for performing graph-data actions, such as reading, writing, and deleting data from the graph, and non-graph-data actions, such as starting and monitoring Neptune ML activities and checking the status of ongoing data plane activities. For example, create a policy with read-only access for data analysts who do not need to manipulate the graph data, a policy with read and write access for developers using the graph for their applications, and a policy for data scientists who need access to Neptune ML commands.

Neptune supports encryption in transit with TLS version 1.2. Neptune allows you to encrypt your databases using keys you create and control through AWS Key Management Service (AWS KMS). On a database instance running with Neptune encryption, data stored at rest in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.

Neptune allows you to log database events with minimal impact on database performance. Logs can later be analyzed for database management, security, governance, regulatory compliance and other purposes. You can also monitor activity by sending audit logs to Amazon CloudWatch.

Fully managed

You can get started with Neptune by launching a new Neptune database instance or Neptune Analytics graph using the AWS Management Console. Neptune database instances are preconfigured with parameters and settings appropriate for the database instance class you have selected. You can launch a database instance and connect your application within minutes without additional configuration. Database parameter groups provide granular control and fine-tuning of your database.

Neptune makes it easier to operate a high-performance graph database. With Neptune, you do not need to create custom indexes over your graph data. Neptune provides timeout and memory usage limitations to reduce the impact of queries that consume too many resources.

Neptune provides CloudWatch metrics for your database instances. You can use the console to view over 20 key operational metrics for your database instances, including compute resources, memory, storage, query throughput, and active connections.

Neptune will keep your database up to date with the latest patches. You can control if and when your instance is patched through database engine version management.

Neptune can notify you by email or SMS of important database events such as automated failover. You can use the console to subscribe to different database events associated with your Neptune databases.

Neptune supports quick, efficient cloning operations, where entire multiterabyte database clusters can be cloned in minutes. Cloning is useful for a number of purposes including application development, testing, database updates, and running analytical queries. Immediate availability of data can significantly accelerate your software development and upgrade projects and make analytics more accurate.

You can clone a Neptune database with just a few steps in the console, without impacting the production environment. The clone is distributed and replicated across three AZs.

ML and generative AI

Neptune ML is powered by SageMaker, which uses GNNs, an ML technique purpose-built for graphs, to make fast and more accurate predictions using graph data. With Neptune ML, you can improve the accuracy of most predictions for graphs by over 50% when compared to making predictions using nongraph methods.

Making accurate predictions on graphs with billions of relationships can be difficult and time-consuming. Existing ML approaches such as XGBoost can’t operate effectively on graphs because they are designed for tabular data. As a result, using these methods on graphs can take time, require specialized skills from developers, and produce suboptimal predictions.

Vector search makes it easy for you to build ML-augmented search experiences and generative AI applications. Use vector search if you want to build generative AI applications that combine data in an application domain and similarity search on vector embeddings. Vector search over graph data gives you an overall lower total cost of ownership and simpler management overhead because you do not need to manage separate data stores, build pipelines, or worry about keeping the data stores in sync.

Customers building generative AI applications can use vector search to augment their large language models (LLMs) by integrating graph queries for domain-specific context with the results from low-latency, nearest-neighbor similarity search on embeddings imported from LLMs hosted in Amazon Bedrock, GNNs in GraphStorm, or other sources. Neptune is integrated with LangChain, an open-source Python framework that makes it easier to develop generative AI applications using LLMs.

Developer productivity

Property graphs are popular because they are familiar to developers that are used to relational models. Gremlin traversal language provides a way to quickly traverse Property Graphs. Neptune supports the property graph model using the open source Apache TinkerPop Gremlin traversal language and provides a Gremlin Websockets server that supports TinkerPop version 3.3. With Neptune, you can quickly build fast Gremlin traversals over property graphs. Existing Gremlin applications can easily use Neptune by changing the Gremlin service configuration to point to a Neptune instance.

Resource Description Framework (RDF) is popular because it provides flexibility for modeling complex information domains. There are a number of existing free or public datasets available in RDF including Wikidata and PubChem, a database of chemical molecules. Neptune supports the W3C’s Semantic Web standards of RDF 1.1 and SPARQL 1.1 (Query and Update), and provides an HTTP REST endpoint that implements the SPARQL Protocol 1.1. With Neptune, you can easily use the SPARQL endpoint for both existing and new graph applications.

Neptune supports building graph applications using openCypher, currently one of the most popular query languages for developers working with graph databases. Developers, business analysts, and data scientists like openCypher’s SQL-inspired syntax because it provides a familiar structure to compose queries for graph applications. For Neptune Database, openCypher and Gremlin query languages can be used together over the same property graph data. Support for openCypher is compatible with the Bolt protocol to continue to run applications that use the Bolt protocol to connect to Neptune.

Neptune supports fast, parallel bulk loading for property graph data that is stored in Amazon S3. You can use a REST interface to specify the Amazon S3 location for the data. It uses a CSV-delimited format to load data into the nodes and edges. See the Neptune property graph bulk loading documentation for more details.

Neptune Database supports fast, parallel bulk loading for RDF data that is stored in Amazon S3. You can use a REST interface to specify the Amazon S3 location for the data. The N-Triples (NT), N-Quads (NQ), RDF/XML, and Turtle RDF 1.1 serializations are supported. See the Neptune RDF bulk loading documentation for more details.

Neptune Analytics supports algorithms for path finding, detecting communities (clustering), identifying important data (centrality), and quantifying similarity. Path finding algorithms efficiently determine the shortest or most optimal route between two nodes. Path finding algorithms allow you to model real-world situations, such as road networks or social networks, as interconnected nodes and edges. Finding the shortest or most optimal paths between various points is crucial in applications such as route planning for GPS systems, logistics optimization, and even in solving complex problems in fields like biology or engineering.

Community detection algorithms calculate meaningful groups or clusters of nodes within a network, revealing hidden patterns and structures that can provide insights into the organization and dynamics of complex systems. This is valuable in fields such as social network analysis, biology (for identifying functional modules in protein-protein interaction networks), and even in understanding information flow and influence propagation in various domains.

Centrality algorithms help identify the most influential or important nodes within a network, providing insights into key players or critical points of interaction. This is valuable in fields such as social network analysis, where it helps pinpoint influential individuals, or in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.

Graph similarity algorithms allow you to compare and analyze the structural similarities or dissimilarities between different graph structures, enabling insights into relationships, patterns, and commonalities across diverse datasets. This is invaluable in various fields such as biology (for comparing molecular structures), social networks (for identifying similar communities), and recommendation systems (for suggesting similar items based on user preferences).

Compliance programs

Neptune is in scope for over 20 international compliance standards ranging from FedRAMP (Moderate and High) to SOC (1, 2, 3), and it is also HIPAA eligible. The full list of standards that Neptune is compliant with can be found in the AWS Services in Scope by Compliance Program list.

Cost-effective

There is no upfront commitment with Neptune; you pay an hourly charge for each instance that you launch or the database resources you consume for serverless. When you’re finished with a Neptune database instance, you can delete it. You do not need to overprovision storage as a safety margin, and you only pay for the storage you actually consume. To see more details, visit the Neptune Pricing page.