Genentech Maximizes the Value of Clinical Biomarker Data Using AWS
2022
When most people think of translational research, they think of bench-to-bedside translation – insights that start in the laboratory and eventually make their way into the clinic as new therapeutics or treatment strategies. But in the age of big data in the life sciences, researchers can also think about “reverse translation,” where information gathered in the clinic leads to new discoveries in the laboratory. However, reaping these insights requires the clinical data to be secure, accessible, stable, and searchable – often easier said than done.
“You have to have the right high-quality data. If you put in a bunch of garbage data into a sophisticated analytical algorithm, you're still going to get garbage out,” said Christina Lu, head of data management and engineering in development sciences informatics at Genentech.
Genentech, a member of the Roche Group, is a leading biotechnology and pharmaceutical company. It has amassed a wealth of real-world biological data from years of clinical trials and research. In 2017, the development sciences group within Genentech implemented a strategy to optimize that data for research and development, aiming to answer key questions such as “What is our next drug target?” and “How can data from completed clinical trials inform future trial designs?” This strategy involved building a data ecosystem on AWS platforms to retrospectively curate data into searchable repositories and put tools and processes in place for prospective data management going forward.
Clinical data analyses that used to take weeks now only require a few hours for researchers. This is how we make every data point count to deliver the right drug to the right patient at the right time.”
Christina Lu
Head of Data Management and Engineering, Development Sciences Informatics, Genentech
Data Curation Unlocks Clinical Insights
“We’re at an inflection point in clinical research – now, if you don’t have your data in place, you’re actually losing out on significant opportunities to develop new treatments and improve patient care,” said Lu. In Genentech’s case, many of those opportunities center around analyzing biomarkers – measurable molecules in the human body associated with specific biological states – to understand disease processes and molecular mechanisms.
The target for “having data in place” is to meet the standards of being FAIR: Findable, Accessible, Interoperable, and Reusable. A 2018 report estimated that the European Union economy alone forfeits €10.2 billion per year by using non-FAIR research data. But as it is, data scientists spend up to 80% of their time gathering, cleaning, and organizing data by hand, when their expertise would be better used building models or performing other specialized tasks.
“Legacy data curation is costly, time consuming, and not scalable,” said Lu. To save costs and achieve scalability and efficiency, Genentech consolidated and transferred existing biomarker data from their network of contract research organizations (CROs) into a well-managed, centralized repository hosted on Amazon Simple Storage Service (Amazon S3), an object storage service. This gives Genentech scientists and external researchers streamlined access to a larger amount of information, which in turn adds statistical power to studies that could reveal new gene therapy or cancer drug targets.
“If we can apply these data curation strategies prospectively, we can accomplish a lot more to impact human health,” said Lu.
Building a Data Ecosystem on AWS
“The biomarker data repository that we have built on AWS houses petabytes of exploratory biomarker data and provides an interface where scientists can easily find the data they need for a specific study,” said Lu.
The biomarker repository is stored on Amazon S3, while Amazon Relational Database Service (Amazon RDS), a managed relational database service, handles the associated metadata, with indexing and fast searching enabled by the Amazon Elasticsearch Service. Genentech uses the Amazon API Gateway to create APIs that deliver secure, study-specific access to researchers as needed.
“Thanks to this streamlined architecture, clinical data analyses that used to take weeks now only require a few hours for researchers,” said Lu. “This is how we make every data point count to deliver the right drug to the right patient at the right time.”
Learn More
About Genentech
Genentech, a member of the Roche group, is a biotechnology company dedicated to developing new treatments for serious and life-threatening diseases.
Benefits of AWS
- Securely stored petabytes of exploratory biomarker data
- Curated data to be Findable, Accessible, Interoperable, and Reusable (FAIR), both retrospectively and prospectively
AWS Services Used
Amazon RDS
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Amazon API Gateway
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.
AWS Lambda
AWS Lambda is a compute service that lets you build applications that respond quickly to new information and events.
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 500 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.
Amazon Elasticsearch Service
Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.