What is data sharing?

Data sharing is the process of making the same data resources available to multiple applications, users, or organizations. It includes technologies, practices, legal frameworks, and cultural elements that facilitate secure data access for multiple entities without compromising data integrity. Data sharing improves efficiency within an organization and fosters collaboration with vendors and partners. Awareness of the risks and opportunities of shared data is integral to the process.

Why is data sharing important for enterprises?

Organizations have been sharing data since before the advent of the internet. However, progress in digital literacy, technology, and cloud adoption has resulted in data sharing on an unprecedented scale. Here are three key factors that contributed to the growth of data sharing:

  • Data storage, processing, and transfer technologies are increasingly available and affordable
  • A new industry mindset treats data as a resource and an asset
  • Policies and regulations have changed and aim to reduce the risks of data sharing

Modern enterprises understand data sharing is vital for improved community relations and new business opportunities. We outline some of the benefits below.

Better value to customers

Combining information from different data sources has the potential to increase both the value and performance of services. This approach fosters better research and product development. For example, WB Games, the video game division of Warner Bros., uses data sharing to help inform the creative process of its game development. It captures, ingests, analyzes, and actions insights to assist its developers in becoming more opportunistic and agile with their storytelling.

Read how WB Games uses AWS »

Data-driven decision-making

By sharing information transparently, teams break down data silos and contribute to improved analytics. Business intelligence improves, and stakeholders make impactful long-term decisions. For example, GE Renewable Energy has over 49,000 wind turbines installed and generating wind electricity across the globe. GE turbines are equipped with sensors and connected to advanced networks that collect data on temperature, wind speeds, electricity, and other factors related to turbine performance. The GE data analytics system facilitates decision-making for turbine maintenance and productivity.

Read how GE uses AWS »

Positive social impact

Public authorities and organizations can share more data in a secure, lawful, and respectful manner. This creates new opportunities for collaboration that benefit the broader community. For example, data-sharing efforts in the health sector contribute positively to medical research, leading to things like tremendous progress in the field of genomic research.

Read how the genomics industry uses AWS »

What are the risks of data sharing?

Data disclosure has potential regulatory, competitive, financial, and security risks. We outline some critical threats below.

Privacy disclosure

Every single organization has legal and ethical obligations to safeguard the privacy of the customer data they own. They have to take appropriate measures to share data without compromising privacy. Privacy-preserving technologies like encryption and redaction allow for safe data sharing.

Data misinterpretation

Lack of communication between data producers and consumers can result in analytical misinterpretation. Analysts may make incorrect assumptions when explaining reports and outcomes. For example, a reduction in customer orders in a particular month may be attributed to a lower marketing budget, although the real reason could be a delay in product availability.

Low data quality

Data consumers may have limited control over the quality and availability of data. They may have to deal with missing or duplicate data, questions about validity, lacking data documentation, and similar issues. Hidden biases against a particular gender, race, religion, or ethnic group may also be present in the dataset.

What are some data sharing technologies?

There are many technologies that reduce friction between producers and consumers, mitigate risks, and enhance the value of data sharing. We give some examples below.

Data warehousing

A data warehouse is a central repository to store data from multiple business units. Data warehouse architecture is made up of tiers. The top tier is the frontend client that presents results through reporting, analysis, and data mining tools. The middle tier consists of the analytics engine that is used to access and analyze the data. The bottom tier of the architecture is the database server, where data is loaded and stored. Top- and middle-tier applications can share common datasets stored in the bottom tier.

Data warehouses are useful for internal data sharing. Workloads accessing shared data can be isolated from each other.

APIs

An API is a mechanism that allows two software components to communicate with each other using a set of definitions and protocols. The interface can be thought of as a contract of service between two applications. This contract defines how the two communicate using requests and responses. Data sharing APIs support fine-grain access controls and specify exactly what data consumers can request.

Read about APIs »

Federated learning

Federated learning is machine learning (ML) technology that allows artificial intelligence systems to train on distributed datasets. Data producers retain control while contributing to collaborative technological advances. For example, ML algorithms that detect cancer train on cancer tissue images from various medical institutions.

Read about machine learning »

Blockchain technology

Blockchain technology is an advanced database mechanism that allows transparent information sharing within a business network. A blockchain database stores data in blocks linked together in a chain. The data is chronologically consistent because you cannot delete or modify the chain without consensus from the network. As a result, you can use blockchain technology to create an unalterable or immutable ledger for tracking orders, payments, accounts, and other transactions. In addition, the system has built-in mechanisms that both prevent unauthorized transaction entries and create consistency in the shared view of these transactions.

Read about blockchain »

Data exchange platforms

Open data platforms allow different entities to register their datasets for public consumption; you only have to prepare and submit the data. The platform provides the infrastructure for storage and access. Anyone can access your data.

How can AWS support your data sharing effort?

When data is shared on AWS, anyone can analyze and build services on top of it using a broad range of compute and data analytics products. These include Amazon Cloud Compute (Amazon EC2), Amazon Athena, AWS Lambda, and Amazon EMR. Cloud data sharing lets your users spend more time on data analysis than data acquisition. We give some example technologies below.

  • Amazon Redshift is a data warehousing technology that enables instant, granular, and fast data access without the need to copy or move it. Your users always see the most up-to-date and consistent information as it’s updated in the data warehouse.
  • Amazon Managed Blockchain is a fully managed service that makes it easy to create and manage scalable blockchain networks and distributed ledger technology.
  • AWS Data Exchange allows you to easily find datasets made publicly available through AWS services.

Get started with cloud data sharing on AWS by creating a free account today.

Data Sharing Next Steps

Check out additional product-related resources
Check out Analytics Services 
Sign up for a free account

Instant get access to the AWS Free Tier.

Sign up 
Start building in the console

Get started building in the AWS management console.

Sign in