################################################################
Dataset Information
################################################################

Dataset Name:
Breast Cancer Profiling Project, Gene Expression 1: Baseline mRNA sequencing on 35 breast cell lines

Dataset Description:
We measured the baseline mRNAseq profiles of two non-malignant breast cell lines and 33 breast cancer cell lines of which twenty were triple negative, six were hormone receptor positive, four were Her2 amplified, and three were established from triple negative patient-derived xenografts. 

--Data in Package:
20348.csv

--Metadata in Package:
Cell_Line_Metadata.txt

################################################################
Center-specific Information
################################################################

Center-specific Name:
HMS_LINCS

Center-specific Dataset ID:
20348

Center-specific Dataset Link:
http://lincs.hms.harvard.edu/db/datasets/20348/

################################################################
Assay Information
################################################################

Assay Protocol:
1. Cells in mid-log phase of the growth cycle from 35 breast cancer cell lines were plated at appropriate densities to achieve ~40% confluence at the time of harvest in a 10 cm plate in their recommended growth media. The growth conditions used are detailed in this file download: <a href="http://lincs.hms.harvard.edu/data/HMS_Dataset_20343-20344_GrowthConditions.csv" target="_blank">Growth Conditions</a>. Three technical replicates (cell lines PDX1258, PDX1328, PDX HCCI002), and one biological replicate (cell line MCF 10A) were included for a total of 39 samples.<br /> 
2. The cells were grown for 24-48 hours at 37C in the presence of 5% CO<sub>2</sub> (all MDAMB lines except MDAMB231 were grown in the absence of CO<sub>2</sub>).<br />
3. Plates were put on ice, and washed twice with 10 ml cold PBS. All residual PBS was aspirated following the second wash.<br />
4. 600 ul of RLT (Qiagen) + bME (1:100 (v/v)) was added to each plate and cells were scraped off using cell lifters (Corning).  Cell suspensions were flash frozen and stored at -80C until further processing.<br />
5. Once all samples were collected, cell suspensions were thawed and homogenized using QiaShredder columns according to the manufacturers instructions.<br />
6. RNA extraction was completed with an RNeasy kit according to the manufacturers instructions with a 20 min on-column DNase digestion at room temperature.<br />
7. RNA concentrations were measured with a NanoDrop (Thermo), and extract quality was evaluated by Bioanalyzer (Agilent). Samples with RNA integrity numbers>9 were deemed to be of sufficient quality.<br />
8. 500 ng of each sample diluted with ERCC spike-in mix 2 (Ambion) was used for library preparation with a High Throughput TruSeq Stranded mRNA Library Prep Kit (Illumina) following the manufacturers protocol at 1/3 reaction volume. The final amplification was run for 10 cycles.<br />
9. Libraries were quantified using the Qubit dsDNA HS assay (Thermo Fisher Scientific).<br />
10. Library size and quality were spot checked for a subset of samples by Bioanalyzer (Agilent). The average size of cDNA fragments in the libraries was 360 base pairs.<br />
11. Libraries were pooled at equimolar concentrations and then quantitated using the KAPA library quantification kit (KAPA Biosystems) at the Harvard University Bauer Core Facility.<br />
12. Paired end 75 base pair reads were sequenced on a NextSeq instrument (Illumina) at the Harvard University Bauer Core Facility.<br />
13. The STAR algorithm in the bcbio-Nextgen toolkit was used to map high quality reads to human genome build GRCh37. Refer to: <a href="http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#rna-seq" target = "_blank">http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#rna-seq</a> for details.<br />
14. The resulting counts table was normalized to reads per kilobase of transcripts per million mapped reads (RPKM). Where replication was performed (technical replication for cell lines PDX1258, PDX1328, PDX HCCI002 and biological replication for cell line MCF 10A), the mean of the two replicates is reported.

Date Updated:
2018-07-06

Date Retrieved from Center:
8/24/2018

################################################################
Metadata Information
################################################################

Metadata information regarding the entities used in the experiments is included in the accompanied metadata. A metadata file per entity category is included in the package. For example, the metadata for all the cell lines that were used in the dataset are included in the Cell_Lines_Metadata.txt file.
Descriptions for each metadata field can be found here: http://www.lincsproject.org/data/data-standards/
[/generic/experimental_metadata]
[/generic/reagents_studied]
