R Shiny: Downstream Processing Dashboard
Purpose
The goal of this project was to build a dashboard for my current employer's Downstream Processing Team (DSP) to enable tracking of Key Performance Indicators (KPI) and querying of experimental data without writing SQL commands. Although developed specifically for my employer, this dashboard could be applied to other biotechnology or biopharmaceutical companiesβ DSP teams, which often quantify similar metrics. This code can be easily tweaked to pull from a relational database rather than a CSV file, enabling real-time dashboarding.
Checklist for this tool:
- Expedite analysis and visualization for individual experiments
- Enable characterization of unit operations via aggregation of historical data
- Increase data transparency across organization
- Enable querying within organizationβs relational databases without SQL
Project links:
Introduction
My current employer is a biotech company that leverages synthetic biology, which is a cross-functional field that uses genetically modified microorganisms (i.e. bacteria, yeast) to create bio-manufactured products for drugs, materials, food, and other consumer goods. At a high-level, synthetic biology companiesβ tech-stacks are generally comprised of three teams:
- Molecular Biology: responsible for modifying microorganismsβ genome to produce products that are foreign to that microbe
- Upstream Processing (USP): responsible for scaling-up the fermentation of that modified organism, building biomass and generating high quantities of that bio-manufactured product of interest
- Downstream Processing (DSP): responsible for the purification of that bio-manufactured product, removing impurities generated during the fermentation process
Together, these teams are capable of generating large quantities of pure bio-manufactured products, which can then be used as the raw ingredients in drugs, materials, food and other goods. This projectβs dashboard was focused on the Downstream Processing team and its various data sources.
In order to create a purification process, DSP teams daisy chain multiple unit operations that will result in a pure, final product. There are many types of unit operations (e.g. centrifugation, distillation, precipitation, filtration, drying, etc.), but their implementation is dependent on their application, scalability, and cost. Furthermore, each unit operation has its own set of parameters that can impact its performance based on the application and metric involved. Due to the volume of experimentation and investment poured not only into choosing the appropriate unit operations, but also the specific parameters for each one, developed purification processes can be highly proprietary. To protect the intellectual property of the company, the data used in these proprietary processes had to be manipulated.
Data and Preprocessing
Within DSP, there are various types of data generated:
- Offline data: measurements taken after an experiment (i.e. submitting a sample to an Analytics team or Contract Research Organization (CRO))
- In-process data: measurements generated on various streams during the experiment (i.e. measuring a solutionβs pH using a probe)
- Instrument data: measurements captured during the running of laboratory instrumentation, often time-series data
- Unit operation meta-data: descriptive data about how a unit operation was executed
This dashboard is built on each of these data types. Consequently, it was necessary to query data tables from multiple schemas in my organizationβs two relational databases. Rather than pulling the entire data table from those relational databases, I designed the dashboard to insert the userβs data selection into an SQL command, reducing the size and cost of querying data. As a result, the dashboard communicates with our relational databases in real-time.
However, as mentioned earlier, the specific parameters and data generated from the DSP process is sensitive information. Therefore, the app deployed for my company had to be altered to be able to share during this projectβs presentation and GitHub. The following changes were applied:
- Data used:
- Employer: entire datasets available
- NYCDSA: trimmed/redacted/modified datasets in order to protect organizationβs IP
- Data source:
- Employer: data was pulled directly from warehouses through userβs inputs + SQL commands to minimize size of query (lighter weight/cheaper)
- NYCDSA: data was imported through CSV into global data frames
Despite the differences in data and how it was sourced, both appβs functionality and appearances are exactly the same.
App Buildout
The following steps were implemented to build both my employer and NYCSDA apps:
For my employer:
- I consulted with the DSP team to understand their needs.
- I created a connection between the organization's 2 x relational databases and query schemas of interest.
- I developed prototype visualizations and confirmed them with the DSP team.
- I built the app.
- I presented a draft of it with the DSP team.
- I put in necessary adjustments and final tweaks.
- I launched the app on the companyβs server.
For the NYCDSA project:
- I had to trim/redact/modify datasets to protect the company's IP and download them as CSV files.
- I changed the appβs script to import downloaded CSV files rather than directly querying from the data warehouse.
- I modified the script to protect the company's intellectual property.
- I launched the app.
App Terminology
Step recovery (%) = percent of product mass recovered post a unit operation.
- Formula: Product Streamβs g of product β Feed Streamβs g of product
Mass balance (%) = percent of product accounted for post an unit operation.
- Formula: (Product Streamβs g of product + Waste Streamβs g of product) β Feed Streamβs g of product
pH = measure of a solutionβs acidity/basicity.
Conductivity (mS/cm) = measure of a solutionβs ability to conduct an electric charge.
OD600 = OD stands for optical density and 600 is the wavelength (nm) of light for the measurement, a measure of the turbidity of a solution.
Total solids (g/kg) = mass of solids in x kg of a solution.
NWP (LMH): stands for Normalized Water Permeability and its units, LMH, stands for liters of permeated water per square meter of membrane per hour, which is measured under standardΒ temperature and pressure to ensure membrane integrity prior to running.
Flux (LMH) = rate at which a solution passes through a membrane. LMH stands for liters of permeated solution per square meter of membrane per hour.
TMP (psi) = stands for transmembrane pressure, which is calculated by taking the difference of the average feed pressure and permeate pressure. This is the driving force for a solutionβs flux through a membrane.
App Description
The app consists of multiple tabs shown on the left-hand panel, each of which contains different data types and/or visualizations. Throughout the app, each plot has a βDownload Plotβ button below it, so a user can easily transfer the visualization to a slide deck or external document. As there is some data missing from this dataset, plots will sometimes show no or missing results depending on the metric and selected rows.
βDSP KPIβ Tab
The landing page and first tab, βDSP KPIβ, enables the user to select specific unit operation IDs to visualize their step recoveries and mass balances. After hitting the βLoad dataβ button and selecting unit operation IDs, one can scroll to the bottom of the page, select a tab (ex. βMass Balance - Unit Op IDβ), and visualize the following plots:
- Barplot: Unit Operation ID vs. Protein Step Recovery (%)
- Barplot: Unit Operation ID vs. Protein Mass Balance (%)
- Barplot: Unit Operation ID vs. Protein Step Recovery/Mass Balance (%)
- Barplot: Unit Operation Type vs. Protein Step Recovery (%)
- Barplot: Unit Operation Type vs. Protein Mass Balance (%)
Within tabs, βStep Recovery - Unit Op Typeβ and βMass Balance - Unit Op Typeβ, not only will the user find their respective plots but also, a summary statistics table that includes that metricβs mean, standard deviation, %CV, minimum value, maximum value, upper outlier level, and lower outlier level. This information is vital to characterizing a unitβs operation and identifying potential steps to further optimize.
βIn-Process Data - Expt Viewβ Tab
The βIn-Process Data - Expt Viewβ tab enables the user to select a specific condition ID and visualize how the measured in-process data changed throughout the purification. The in-process measurements included in this analysis are pH, conductivity, OD600, and total solids. Itβs important to note that the user cannot multi-select in this tab. As not all conditions have the same unit operations, an x-axis for one condition might not work for another.
βIn-Process Data - Stream Viewβ Tab
The βIn-Process Data - Stream Viewβ tab uses the same data as the βIn-Process Data - Expt Viewβ tab, though its visualizations are different. Rather than selecting for a unique condition ID, this tab enables the user to select specific stream IDs and characterize a stream typeβs in-process measurement. After selecting stream IDs, one can scroll to the bottom of the page, select a tab like βConductivityβ and visualize the following plots:
- Barplot: Stream Type vs. pH
- Barplot: Stream Type vs. Conductivity
- Barplot: Stream Type vs. OD600
- Barplot: Stream Type vs. Total Solids
Within each tab, not only will the user find their respective plots but also a summary statistics table that includes that metricβs mean, standard deviation, %CV, minimum value, maximum value, upper outlier level, and lower outlier level.
βMembrane Dataβ Tab
The βMembrane Data β tab enables the user to select specific membrane IDs to visualize their normalized water permeability (NWP). NWP is a measure of the liters of water that permeates through a membraneβs surface area over time, a metric used to characterize the integrity of a membrane. If the NWP is too high or low, this could be a sign that the membrane has been damaged and should not be used for the next experiment.
After hitting the βLoad dataβ button and selecting membrane IDs, one can scroll to the bottom of the page, select a tab (ex. βNWP - Membrane IDβ), and visualize the following plots:
- Boxplot: NWP vs. Membrane Part Number
- Boxplot: NWP vs. Membrane Serial Number
- Boxplot: NWP vs. Membrane ID/Experiment
- Boxplot: NWP vs. Membrane Surface Area
- Barplot: Number of Uses vs. Membrane Part Number
Within most tabs, not only will the user find their respective plots but also a summary statistics table that includes that metricβs mean, standard deviation, %CV, minimum value, maximum value, upper outlier level, and lower outlier level.
βUnit Op Time-Seriesβ Tab
TheβUnit Op Time-Seriesβ tab enables the user to select and visualize time series data captured during filtration unit operations. After the user selects a run, plots will automatically populate, showing the relationship between flux and TMP vs. time, concentration factor, and diavolumes. Users can even select multiple runs at once, overlaying the results to see how different batchesβ perform.
βProcess Flow Diagramsβ Tab
The βProcess Flow Diagramsβ tab enables the user to select a specific experiment ID and create a process flow diagram that depicts the unit operations within that experiment. This is extremely helpful for building presentations and allowing data analysts to recall the executed steps within an experiment.
βData Downloadβ Tab:
The final tab, βData Download,β enables the user to download any of the datasets used in the app without having to know SQL. The analyst can select a specific experiment, membrane, or run ID for a targeted query. Itβs even possible to pull all the data if needed.
Conclusions
The original goals of this DSP Dashboard were the following:
- Expedite analysis and visualization for individual experiments
- Enable characterization of unit operations via aggregation of historical data
- Increase data transparency across the organization
- Enable querying within organizationβs relational databases without SQL
Overall, this tool has accomplished each of these objectives. It has been deployed on my companyβs server and is currently being beta-tested by the DSP team. Iβve found it extremely rewarding to apply the skills gained in this bootcamp to a problem within my current employer.
Next Steps
- Continue to consult DSP team during beta-testing
- Expand DSP dashboard as their technology/metrics develop
- Build dashboards for the remaining teams