Documentation for automated cell cycle gating (cell_cycle_gating
)¶
The Deep Dye Drop assay provides detailed readouts on the cell cycle in addition to the live/dead counts. Cells are treated with Hoechst (to stain nuclei) and LIVE/DEAD Red (LDR, to stain dead cells) dyes in multi-well plates. In addition, cells can be treated with EdU (to stain S-phase cells) and an antibody for phospho-histone H3 (to stain M-phase cells). Live cells are assigned to a phase of the cell cycle based on their DNA content, and EdU and pH3 intensities. Subsequently, cells are imaged at the single-cell level. The cell_cycle_gating script described in this documentation takes the resultant single cell imaging data as input and automatically classifies cells into live or dead and further classifies live cells into G1, G2, S, or M phase of the cell cycle.
License and funding¶
Deep Dye Drop’s automated gating package is currenlty available under the MIT license .The package was developed with funding from U54 grant HL127365, “The Library of Integrated Network-Based Cellular Signatures” under the NIH Common Fund program, and NCI U54 grant CA225088 for the Harvard Medical School (HMS) Center for Cancer Systems Pharmacology (CCSP).
Installation¶
Deep Dye Drop’s automated gating script requires Python 3.
In order to install, cd into a local folder of your chocice using the command line. As shown below, clone the repository to the local folder and use pip to install the cell cycle gating scripts.
git clone https://github.com/datarail/DrugResponse.git
pip install -e DrugResponse/python
Getting started¶
Working with data generated from Columbus¶
- Object level data for a given well is saved as a
.txt
file. The data for all wells in a given plate is saved in a folder. The name of the folder typically takes the formabcdef[123]
. Theabcdef
prefix in the folder name should correspond to the barcode assigned to the plate.cd
into the directory that contains the object level data folders.- Start a Jupyter notebook or Ipython session and execute the lines of code below:
from cell_cycle_gating import run_cell_cycle_gating as rccg
obj_folder = 'abcdef[123]' # Name of object level folder
# Map user defined channel names to standarized names required by the script
ndict = {'Nuclei Selected - EdUINT': 'edu',
'Nuclei Selected - DNAcontent': 'dna',
'Nuclei Selected - LDRTXT SER Spot 8 px' : 'ldr',
'Nuclei Selected - pH3INT': 'ph3'}
# Run gating script
dfs = rccg.run(obj_folder, ndict)
- The dataframe
dfs
returns well-level summary of number of live/dead cells and fraction of cells in each phase of the cell cycle. - The script saves a pdf showing the gating on each DNA v EDU scatter plot for review. By default the pdf file uses the name of the folder as the file name i.e
summary_abcdef[123].pdf
- The dataframe df is also saved as a .csv file with the same name as the object level folder i.e
summary_abdef[123].csv
Working with data generated from IXM¶
- The standard column names generated in Metaexpress differ from the Operetta. Please update
ndict
as below and add an additional argument torccg.run()
. Further, the entire dataset for a given plate is saved as a single .txt file instead of seperate files per well in a folder. See below:
from cell_cycle_gating import run_cell_cycle_gating as rccg
obj_file = 'filename.txt' # Name of object level file
# Map user defined channel names to standarized names required by the script
ndict = {'Well Name' : 'well',
'Cell: EdUrawINT (DDD-bckgrnd)' : 'edu',
'Cell: LDRrawINT (DDD-bckgrnd)' : 'ldr',
'Cell: DNAcontent (DDD-bckgrnd)' : 'dna'}
# Run gating script
dfs = rccg.run(obj_file, ndict, system='ixm', header=7)
Merging metadata information to output¶
If you have well level metadata that maps each well to sample conditions, then the above code block can be modified as follows:
# Load metadata file
import pandas as pd
dfm = pd.read_csv('metadata.csv')
# Run gating script, this time passing dfm as an additional argument.
dfs = rccg.run(obj, ndict, dfm)
- Note that the metadata file should contain the following header columns:
barcode
,well
,cell_line
,agent
,concentration
.- Fields in the
barcode
column should match the prefix in the folder name. i.eabdcdef
Not really Deep Dye Drop¶
By default, the gating code expects that you have all 4 channels i.e DNA, EdU, LDR, pH3. However, if you do not have LDR and/or pH3 channels, modify the main line of the code as shown below:
dfs = rccg.run(obj, ndict, dfm,
ph3_channel=False, # If no pH3 channel
ldr_channel=False # If no LDR channel
)
Gating corrections using control wells¶
Automated gating does not always work well, in which case you can apply the automated gating from control wells for a given cell line across corresponding treatment wells. In the first line of code below, by setting control_based_gating=True
, automated gating is run only on the control plates. The results are saved in .csv and .pdf files with the prefix control_summary_
. The function also returns a second dataframe dfg
that contains information on the gates in the control wells. In the second line, the script is run a second time with the arguement control_gates=dfg
so that errors in automated gating are corrected based on control gating.
dfs, dfg = rccg.run(obj, ndict, dfm,
control_based_gating=True)
dfs2 = rccg.run(obj, ndict, dfm, control_gates=dfg)
If you want to manually adjust gates across all wells, you can provide a list of fudge factors i.e. by what magnitude and in which direction you want to change the DNA gates. There are 4 gates you can adjust; G1-left, G1-right, G2-left, and G2-right. For instance, if you want to move G2-left (3rd gate) furrther left by a magnitude of 0.05, set fudge_gates=[0, 0, -0.05, 0]
. If you want to move G2-right (4th gate) by a magnitue of 0.2 to the right, set fudge_gates=[0, 0, 0, 0.2]
. In the code example below, we have applied gates from the control but also decided to move the 3rd and 4th gates to the left by 0.05 and 0.1 units in log(DNA) scale.
dfs2 = rccg.run(obj, ndict, dfm, control_gates=dfg,
fudge_gates=[0, 0, -0.05, -0.1])
Additional plotting functions¶
- To plot cell cycle fractions:
import pandas as pd
from cell_cycle_gating import plot_fractions
dfs = pd.read_csv('summary_abcdef[123].csv')
plot_fractions(dfs)
Run cell cycle gating (automated)¶
Manual gating¶
Cell cycle phases (based on DNA and EdU staining)¶
Dead cell filter (based on DNA and LDR staining)¶
pH3 filter¶
-
cell_cycle_gating.ph3_filter.
compute_log_ph3
(ph3, x_ph3=None)[source]¶ Compute log of pH3 intensities
Parameters: - ph3 (1d array) – ph3 intensities across all cells in a well
- x_ph3 (1d array) – uniformly distributed 1d grid based on expected range of pH3 intensities
Returns: log_ph3 – log pf pH3 intensities across all cells in a well
Return type: 1d array
-
cell_cycle_gating.ph3_filter.
evaluate_Mphase
(log_ph3, ph3_cutoff, cell_identity, ax=None)[source]¶ Reassigns membership of each cell based on M phase identified by pH3
Parameters: - log_ph3 (1d array) – log of ph3 intensities across all cells in a well
- ph3_cutoff (1d array) – pH3 gating on kernel density minima
- cell_identity (1d array) – membership of each cell in cell cycle phase (1=G1, 2=S, 3=G2)
- ax (subplot object) – relative positional reference of subplot in master summary plot
Returns: fractions – keys are cell cycle phases (G1, G2, S, M) and values are fractions of cells in each phase
Return type:
-
cell_cycle_gating.ph3_filter.
get_ph3_gates
(ph3, cell_identity, x_ph3=None, ph3_cutoff=None)[source]¶ Gating based on pH3 intensities
Parameters: - ph3 (1d array) – ph3 intensities across all cells in a well
- cell_identitity (1d array) – membership of each cell in cell cycle phase (1=G1, 2=S, 3=G2)
- x_ph3 (1d array) – uniformly distributed 1d grid based on expected range of pH3 intensities
- ph3_cutoff (Optional[numpy float]) – User defined pH3 gating
Returns: - f_ph3 (1d array) – kernel density estimate of pH3 distribution
- ph3_cutoff (numpy float) – pH3 gating on kernel density minima
- ph3_lims (list of floats) – bounds on pH3 intensities used as x_lim for plots
-
cell_cycle_gating.ph3_filter.
plot_summary
(ph3, cell_identity, x_ph3=None, ph3_cutoff=None, well=None)[source]¶ Summary plot of pH3 based gating
Parameters: - ph3 (1d array) – ph3 intensities across all cells in a well
- cell_identity (1d array) – membership of each cell in cell cycle phase (1=G1, 2=S, 3=G2)
- x_ph3 (1d array) – uniformly distributed 1d grid based on expected range of pH3 intensities
- ph3_cutoff (1d array) – (optional) USER defined pH3 gating
- well (str) – name of well on 96/384 well plate
Returns: fractions – keys are cell cycle phases (G1, G2, S, M) and values are fractions of cells in each phase
Return type:
Peak identification¶
-
cell_cycle_gating.findpeaks.
findpeaks
(signal, npeaks=None, thresh=0.25)[source]¶ Returns the amplitude , location and half-prominence width of peaks from the input signal
Parameters: Returns: - peak_amp (array of float) – amplitude of peaks
- peak_loc (array of float) – location of peaks
- width (array of float) – list of widths at half-prominence
-
cell_cycle_gating.findpeaks.
get_prominence_reference_level
(signal, peak, peak_loc)[source]¶ Returns the amplitude and location of the lower reference level of a peaks prominence. Note that prominence is the the length from the reference level upto the peak
Parameters: Returns: - reference_loc (float) – location of X-axis of peak whose prominence is to be computed. Should equal peak_loc
- reference_level (float) – lower reference level of peak prominence.