Zpracování dat z real-time PCR analýzy

Zpět

(Mnorozměrná - multivariační a vícecestná - multiway) analýza dat z kvantitativní real-time PCR pomocí statistického programovacího jazyka R (Multivariate analysis of qPCR data using the R statistical programming language)

 

Naše analytické schopnosti vychází z mnohaleté praxe našich pracovníků v produkci a zpracování dat z real-time kvantitativní i kvalitativní PCR a jsou doplněné nástroji, které poskytuje statistický jazyk R a další nástroje, které jsme si osvojili v renomovaných praktických kurzech zpracování dat z qPCR a dále jsou doplněné kvalitním výpočetním vybavením, které umožňuje zpracovávat i rozsáhlejší data z dlouhodobých klinických studií.

 

  • import dat, počáteční kontrola kvality, základní a některé pokročilé analýzy jsou prováděny pomocí softwarových balíčku "qpcR", "HTqPCR", "ddCt" a "qpcrNorm":

Základní metody v balíčku qpcR jsou zaměřené především na výběr a vyhodnocení vhodné metody fitování křivek na qPCR amplifikační křivky před kalkulací Ct hodnot a dalších základních parametrů. Balíček qpcR dále obsahuje základní funkce a metody pro vyhodnocení diferenciální exprese genů.

Fit sigmoidal (three-, four-, five- and six-parameter) models to the raw fluorescence data and display the curves with various options.

Calculate essential PCR parameters (efficiency, threshold cycles, initial template fluorescence F0) from the sigmoidal fits and display comprehensive graphics.
Conduct a model selection process in which the best sigmoidal model is chosen by nested F-tests on the residual variance or other criteria such as Akaike weights.
Derive values from more classical quantitation methods, such as the ‘window-of-linearity’ method, exponential fitting of the identified exponential region or a calibration curve from diluted samples.
In calibration curve analysis, find the threshold fluorescence value which maximizes the linearity of the dilution curve 'threshold cycles'.
Further optimize the fitting process by eliminating cycles in the ground and plateau phase, using all possible combinations.
Calculate many measures for the goodness-of-fit, such as the residual variance, R-squared, adjusted R-squared, Akaike Information Criterion (AIC), corrected AIC (AICc), Bayesian Information Criterion (BIC), root-mean-squared-error (RMSE) and Allen's PRESS statistic. 
Make goodness-of-fit tests such as 'lack-of-fit' or Neill's test for nonreplicates.
Do a batch analysis of many runs with all methods (this often reveals dramatic differences in the estimated parameters!).
Predict either fluorecence or cycle values from data.
Calculate the goodness-of-fit (by means of RMSE) of all different sigmoidal models within the exponential region of the qPCR curve.
Conduct gaussian error propagation with Monte Carlo simulation using multivariate normal distributions if a covariance matrix is given.
Calculate ratios and their propagated errors for qPCR runs, using single or replicated data. If reference PCRs are supplied, the ratios are normalized against these.
Calculate ratios with a permutation approach such as in the popular REST software.
Build an averaged model from several housekeeping PCRs.
Calculate model selection measures such as Likelihood Ratios (nested) or Akaike weights (non-nested).
Calculate the Cy0 value as described in Guescini et al and do a maxRatio analysis as in Shain et al.
Bootstrap qPCR data and obtain confidence intervals for all estimated parameters, including those from efficiency and threshold cycle analysis.
Simulate qPCR curves starting from a fitted curve and including defined homo/heteroscedastic noise.
Do automatic plotting of large-scale batch PCRs by using 3D-plots or plot matrices.
Identify deviating qPCR runs within a group of replicates by Kinetic Outlier Detection and non-replicated runs by Sigmoidal Outlier Detection .
Conduct batch ratio analysis from 96- or 384-well plates that contain different numbers of control/treatment samples or gene-of-interests/reference genes with automatic sample recognition from the column headers.
Do a complete melting curve analysis of qPCR runs, including graphical display of melt curves and automatic Tm identification of the products.
 
 
 
  • Seznam nejvýznamnějších funkcí ve všech výše zmíněných balíčcích:
Package ‘qpcR’:
Calculation of qPCR efficiency by dilution curve analysis
Calculation of qPCR efficiency by dilution curve bootstrapping
Building a model which averages a batch of qPCR curves
Cy0 alternative to threshold cycles as in Guescini et al. (2008)
The amplification efficiency curve of a fitted object
Calculation of qPCR efficiency and other important qPCR parameters
Calculation of PCR efficiency by fitting an exponential model
Calculation of qPCR efficiency by the ’linear regression of efficiency’ method
The chi-square goodness-of-fit
The maxRatio method as in Shain et al. (2008)
Amalgamation of single data models into an averaged model
Melting curve analysis with (iterative) Tm identification and peak area calculation/cutoff
Calculation of the ’midpoint’ region
Sigmoidal model selection by different criteria
Neill’s lack-of-fit test when replicates are lacking
Parameters that can be changed to tweak the kinetic outlier methods
Batch calculation of qPCR efficiency and other qPCR parameters
Bootstrapping and jackknifing qPCR data
Summarize measures for the goodness-of-fit
Advanced qPCR data import function
Combinatorial elimination of plateau and ground phase cycles
Elimination of qPCR cycles with low/high impact on fitted parameters
Simulation of sigmoidal qPCR data with goodness-of-fit analysis
Plotting qPCR data with fitted curves/confidence bands/error bars
Allen’s PRESS (Prediction Sum-Of-Squares) statistic, aka P-square
General error analysis function using different methods (Monte-Carlo simulation, Permutation approach, Error propagation)
The nonlinear/mechanistic models
Calculation of ratios in a batch format for multiple genes/samples
Calculation of ratios from qPCR runs with/without reference genes
Calculation of ratios in a batch format from external PCR parameters
Averaging of multiple reference genes
Amalgamation of single data models into a model containing replicates
An (overlayed) residuals barplot
Residual variance of a fitted model
Root-mean-squared-error of a fitted model
R-square value of a fitted model
Adjusted R-square value of a fitted model
Residual sum-of-squares of a fitted model
Calculation of qPCR efficiency by the ’window-of-linearity’ method
Calculation of the qPCR takeoff point
Updating and refitting a qPCR model
Batch calculation of qPCR fit parameters/efficiencies/threshold cycles with simple output, especially tailored to high-throughput data
Hannan-Quinn Information Criterion
Outlier summary
(K)inetic (O)utlier (D)etection using several methods
Calculation of likelihood ratios for nested models
Formal lack-Of-Fit test of a nonlinear model against a one-way ANOVA model
Evidence ratio for model comparisons with AIC, AICc or BIC
Comparison of all sigmoidal models within the exponential region
 
 
Package ‘qpcrNorm’:
Calculates the Average Gene-Specific Coefficient of Variation
Function for Housekeeping Gene Normalization of qPCR Data
Function for Quantile Normalization of qPCR Data.
Function for Rank-Invariant Set Normalization for qPCR Data.
Constructs scatter plot to compare the effects of two normalization algorithms on a qPCR dataset.
Data Input Function for a Batch of qPCR Experiments.
 
Package ‘HTqPCR’:
Combine qPCR set objects
Clustering of qPCR Ct values
Filter Ct values based on their feature categories.
Filter out features (genes) from qPCR data
Heatmap of deltadeltaCt values from qPCR data.
Differentially expressed features with qPCR: limma
Differentially expressed features with qPCR: Mann-Whitney
Normalization of Ct values from qPCR data.
Image plot of qPCR Ct values from an array format
Boxplots for qPCR Ct values.
Image plot of qPCR Ct values from a card format
Summarising the feature categories for Ct values.
Correlation between Ct values from qPCR data
Distribution plot for qPCR Ct values.
Heatmap of qPCR Ct values
Histrogram of Ct values from qPCR experiments
Plotting Ct values from qPCR across multiple samples.
Overview plot of qPCR Ct values across multiple conditions
Pairwise scatterplot of multiple sets of Ct values from qPCR data
PCA for qPCR Ct values
Scatter plot of features analysed twice during each qPCR experiment.
Plot the relative quantification of Ct values from qPCR experiments.
Scatterplot of two sets of Ct values from qPCR data.
Barplot with Ct values between genes from qPCR.
Plot variation in Ct values across replicates
Boxplots of CV for qPCR Ct values
Example processed qPCR data
Example raw qPCR data.
Reading Ct values from qPCR experiments data into a qPCRset
Assign categories to Ct values from qPCR data.
Differentially expressed features with qPCR: t-test
 
Package ‘ddCt’:
Barplot with error bars.
Absolute quantification for Taqman data
ddCt Expression
Apply the ddCt algorithm for a given data set
Draw barchart of relative expression level with error-bars
 
 
 
  • Systematický přehled souvisejících analýz:

• určení minimálního počtu vzorků k dosažení dostatečné statistické síly

Power analysis

• Preprocesování dat
 
Interplate calibration
PCR efficiency correction
Normalize to sample amount
Normalize to reference genes/samples
Normalize to spike
Logarithmic transformation to fold differences
Missing data handling and primer dimer correction (in progress)
Relative quantities and fold changes
 
• Nalezení optimálního referenčního genu
 
geNorm
NormFinder
Plots
Scatterplots
Line plots
Bar plots
Box and whiskers plot
 
• Principal component analysis
 
PCA
P-curve
 
• Klastrová analýza
 
Hierarchical clustering/agglomerative hierarchical clustering (nesting)/dendogram
Heatmap analysis
 
• Sítě
 
Self-organizing map (SOM)
Artificial neural networks (ANN)
Support vector machine (SVM)
 
• Regresní analýza
 
Standard curve
Reverse calibration
Limit of detection (LOD)
Partial least square (PLS)
Passing-Bablok regression
 
Non-linear regression models for bioassay data:
Three-parameter Gompertz
Four-parameter Gompertz
Three-parameter logistic
Four-parameter logistic (default)
Five-parameter logistic
Brain-Cousens three-parameter logistic
Brain-Cousens four-parameter logistic
 
Fitting: 
Single dose response curve
Simultaneous fitting
Fitting user-defined function
 
• Relative quantification and analysis of significance
 
Expression ratio
Permutation-based error propagation 
 
• Třícestná analýza
 
Trilinear decomposition
 
• Statistika
 
Descriptive statistics
Parametric t-test
Non-parametric tests
One-way ANOVA
Two-way ANOVA
Nested ANOVA
 
• Multivariační analýza
 
• Korelace
 
Spearman rank correlation coefficient
Pearson correlation coefficient
 
• Experimentální design
 
Sample size
Experimental design optimization

 

• Expression profiling / Identification of candidate genes or gene signatures / Survival-related gene selection

Supervised and unsupervised methods
Pre-selection of survival-related clusters and genes by clustering
 
Methods for selection of strongest classifiers/predictors:
Receiver Operator Characteristics (ROC)
Cox model fitting (Cox proportional-hazards regression for survival data)
Concordance index
Random (survival) forests
Support vector machine algorithms
Multivariate variable selection a maximum likelihood (MLHD)
Recurrence Score
Two-Gene Ratio
Literature mining (Literature Gene Selection)
Combinatorial gene selection methods
Risk score algorithm with Naive Bayes
Log-rank test for the disease-free survival (DFS) curve
Stepwise discriminant analysis
Leave-one-out cross-validation
R packages GALGO, GeneNet, klaR, randomForest...