Amazon DataZone: Automate Data Discovery
Overview
Remove time from manual entry of data attributes in the data catalog, which also introduces potential errors. Generate business context and recommend analysis for datasets, which boosts data discovery results. Understand where your data came from, and which sources will be impacted by changes. More, richer data in the business data catalog also improves the search experience. Reduce your time searching for and using data from weeks to days.
Use cases
Videos
FAQs
What kind of information is in the Amazon DataZone business data catalog?
In the Amazon DataZone business data catalog, business metadata provides information authored or used by business people and gives context to organizational data. This could include the following information:
- Ownership: Modern data-centric organizations employ a distributed data stewardship process where lines of business (LOBs) are responsible for managing their own data. A catalog tracks that ownership so interested parties can find and request access to data as part of their business tasks.
- Classification: Data discovery is a key task that business metadata can support. Data discovery uses centrally defined corporate ontologies and taxonomies to classify data sources and helps you find relevant data objects.
- Relationships: You can use the Amazon DataZone business data catalog to add relationship information as metadata. As with a technical dataset schema, the business data catalog shows relationships between objects in the catalog, such as those between databases, datasets, and their columns.
- Schema: AI recommendations for descriptions can use the technical and business schema to generated recommended descriptions and usage for data.
- Origin and consumption: Data lineage and impact analysis, as well as custom mappings from OpenLineage, are linked to in the business data catalog.
What can I catalog with Amazon DataZone?
Amazon DataZone supports data assets published directly from the AWS Glue Data Catalog and Amazon Redshift. These two sources can be used to catalog data in the following locations:
- Amazon Simple Storage Service (Amazon S3) data lakes
- Many of the AWS purpose-built databases like Amazon Relational Database Service (Amazon RDS) through an AWS Glue crawler
- Over 100-plus Amazon AppFlow connectors, to bring in data from third-party applications like Snowflake, Salesforce, and Google Analytics