This report provides Quality Assessment (QA) of the HMEC240L_SS1 MEP-LINCS Microenvironment Microarray (MEMA) data. After QA filtering, there are 2688 Microenvironment Perturbations (MEPs) that are pairwise combinations of 48 ECM proteins and 56 ligands or growth factors.
The MEP-LINCS HMEC240L SS1Layout1Set1 experiment was performed with cells grown in 8 8-well plates. The SS1Layout1Set1 staining set channels are DAPI, 488, 555, 647, and 750. The endpoints are DAPI, phalloidin, CellMask, mitoTracker, and NA. Color images of the cells at each spot were gathered on a Nikon automated microscope.
Intensity, position and morphology data are gathered for the nucleus and cytoplasm of each cell, merged with the experiment metadata, normalized, filtered and summarized. The dataset is organized into the four LINCS imaging categories as follows:
Level 1 - Raw annotated cell level data
Level 2 - Raw annotated spot level data
Level 3 - Raw and normalized annotated spot level data
Level 4 - Raw and normalized replicate (MEP) level data
The data merging and analysis is done in R using open source software and is available at the GitHub repository https://github.com/MEP-LINCS/MEP_LINCS.
Each well is scored for even cell seeding according to the count of the DAPI-stained nuclei. A detailed explanation of the QA method is in the supplemental material. In brief, cell counts per spot and locally-averaged neighborhoods are used to score the wells and filter the dataset. QA Scores range from 0 to 1 and represent the proportion of the spots that have at least one cell and are not in low cell count neighborhoods.
The following plots are pseudo images of each MEMA’s spot cell count along with histograms of the loess model with its QA score.
The following histograms show the raw spot cell counts for each array.
Boxplots of the spot cell counts show the values stratified by plate, well and ligand. The blue lines show the dataset median. Each boxplot summarizes data from ~700 spots.
DAPI stains the DNA of each nucleus and is used to idenitify the nuclear regions and quantify the amount of DNA. Boxplots of the DAPI intensities show the values stratified by plate and well. Each boxplot summarizes data from ~700 spots.
This staining set includes MitoTracker which stains mitochondria.
Each signal of the dataset was first independently normalized between arrays using the RUV method with the array as the unit of study and spatial residuals as the negative controls. Then each array was loess normalized using the row and colum positions and the resiudlas from the median of each set of replicates.
The following figures show the effects of the normalization. The data are organized first by barcode and then by ligand. The barcode is replaced by the label “Multiple” when the values come from replicates that are in more than one plate.
Depending on the staining set, selected signals are stratfied first by ligand and then by ECM protein. Each ligand boxplot contains data from pairing with all ECM proteins while each ECM protein boxplot conatins data from pairing with all ligands.
All cells are stained with DAPI and classified as DNA 2n or 4n. The proportion of 2n and 4n cells at each spot is calculated and will always sum to 1. The proportions are plotted below and detailed in the datatables.
A Principal Component Analysis can be used to evaluate the normalization process for plate and well effects.
All MEMAs in the experiment are in separate wells and have the same design of 48 ECM proteins spotted in 35 rows and 20 columns. The proteins are randomly assigned to spots in the top 30 rows. Rows 31-35 are replicates of rows 1-5. There is a higher number of COL1 spots (shown as black triangles) throughout the array that are control spots used to evaluate spatial variations. The upper left and bottom right corners of each MEMA are image fiducials in the 488nm channel and there are four blank spots for checking orientation in all channels.
The median replicate count is 13.00 and the range of the replicate counts is 8, 99.
The variance of the signal in MEMA data comes from biological and technical factors. The technical factors create regions of low cell counts per spot and uneven staining across the array. The goal of the QA pipeline is to quantify the technical factors to identify wells or plates that need to be removed from downstream processing and/or be replaced by wells from a new experiment.
The hypothesis for the MEMA QA process is that the biological signal comes from individual spots while the technical variations come from regions of low signal. A bivariate loess model can be used to quantify the number of spots in low signal regions, leading to a MEMA QA score.
The loess model of a MEMA is the mean value of a weighted version of each spot’s region or neighborhood. In a 700 spot array, a loess span value of 0.1 sets the size of the neighborhood to be the nearest 70 points (within approximately 5 spots in all directions). The weights are a tricubic function of the euclidean distance between the spot being modeled and the neighborhood spots. These weights vary from 1 to 0 as distances increase from the nearest to the farthest neighbor. In other words, each spot in the model takes on the mean value of its 70 nearest neighbors with the closest neighbors having the largest impact. Therefore, the loess model is dominated by the technical regional factors as opposed to individual biological responses.
A MEMA’s QA score is derived from the loess model of the control-well-normalized values by calculating the proportion of spots in low signal regions(LSR). A threshold for classifying spots as LSR is based on the median of each plate’s control well. To have higher scores reflect increasing quality, the MEMA QA score is defined as the proportion of non-LSR spots to total spots. This value will be 1 for MEMAs with no low signal regions and approach 0 as the number of LSR spots increases.
In the plots below, the LSR spots are those to the left of the blue vertical line at the threshold value of 0.7 in the histogram.
The pseudo images of each well’s raw signals are shown in the plots below. Wells that could not be sucessfully imaged due to focus issues are missing from the pseudoimages.