Differential Gene Expression :

Differential Gene Expression :#

Main goal of differential expression tools is : Idntifying differential gene expression between conditions by reducing effect of unwanted variance (like variance by expremental conditions, systemic biases , batches etc..).

Lmma, edgeR, and DESeq2 are all popular R packages in R programming for the analysis of differential gene expression from RNA-seq data. They all serve a similar purpose, but they have differences in terms of their underlying statistical methods, assumptions, and use cases. The choice of which one to use depends on your specific research context and data characteristics.

Package	Advantages	Use Cases and Data Characteristics
limma	- Robust with small sample sizes	- Small sample sizes - Stable variances
	- Efficient statistical methodology
	- Works well with limited replicates
———–	———————————————–	—————————————————————–
edgeR	- Accommodates complex experimental designs	- Moderate to large sample sizes - Complex experimental
	- Robust dispersion estimation	designs - Moderate to high biological variability
	- Suitable for both small and large datasets
———–	———————————————–	—————————————————————–
DESeq2	- Rigorous normalization	- Larger number of replicates - Subtle changes in expression
	- Detects subtle expression changes
	- Suitable for larger-scale studies

Here’s a high-level overview of how DESeq2 works:

Normalization: RNA-seq data can have systematic biases due to various factors, such as differences in library size and gene length. DESeq2 applies a method called “size factor normalization” to account for these differences. This step ensures that the read counts for each gene are adjusted to a common scale, allowing valid comparisons between samples.
Statistical Modeling: DESeq2 uses a negative binomial distribution to model the variability in the count data. It estimates the dispersion parameter, which describes how the variance of the counts changes with the mean. This modeling approach is well-suited for count data, as it accounts for the inherent overdispersion often observed in RNA-seq data.
Differential Expression Analysis: DESeq2 performs hypothesis testing to identify genes that are differentially expressed between experimental conditions. It calculates a “log2 fold change” for each gene, which represents how much the expression of the gene changes between the conditions. It also computes a p-value to quantify the significance of the observed differences.
Multiple Testing Correction: Since you’re usually testing thousands of genes simultaneously, there’s a risk of observing false positives (genes that appear differentially expressed due to chance). DESeq2 applies multiple testing correction methods, such as the Benjamini-Hochberg procedure, to control the false discovery rate (FDR) and maintain a reasonable balance between true positives and false positives.
Results Interpretation: DESeq2 generates various outputs, including tables of differentially expressed genes, their fold changes, p-values, and adjusted p-values. Researchers typically focus on genes with significant changes (adjusted p-value < 0.05 or a chosen threshold) and substantial fold changes (e.g., greater than 2-fold or -2-fold) for further analysis and interpretation.

Library Correction and Library composition Normalization

some some cases like only gene expressed in one type normal but not in other . then we have to Normalize by Library composition

Suppose if we have Counts data as like below .#

Original Count Values Table:

Gene	Cancer_1	Cancer_2	Normal_1	Normal_2
Gene1	0	15	7	9
Gene2	8	7	20	18
Gene3	5	3	6	5

First we log transofrmed the data . In order to detect ouliers and Library composition and helps to remove from further downstream effect .

Log2-Transformed Values Table:

Gene	log_Cancer_1	log_Cancer_2	log_Normal_1	log_Normal_2
Gene1	-Inf	3.906890596	2.807354922	3.169925001
Gene2	3.000000000	2.807354922	4.321928095	4.169925001
Gene3	2.321928095	1.584962501	2.584962501	2.321928095

Gene 1 is case of - Library COmposition , we remove them in further normalization technique, to preserve cell type effect.

Average of each gene calculated (Which is called Geometric Mean as below)

Log2-Transformed Values Table with Geometric Mean (Corrected):

Gene	log_Cancer_1	log_Cancer_2	log_Normal_1	log_Normal_2	Geometric Mean
Gene2	3.000000000	2.807354922	4.321928095	4.169925001	3.325056021
Gene3	2.321928095	1.584962501	2.584962501	2.321928095	2.203186297

We get Library Composition Correction Table

Library Composition Correction Table:

Gene	log_Cancer_1	log_Cancer_2	log_Normal_1	log_Normal_2	Geometric Mean
Gene2	3.000000000	2.807354922	4.321928095	4.169925001	3.325056021
Gene3	2.321928095	1.584962501	2.584962501	2.321928095	2.203186297

We devide each count with Median

Devide each count with with Median Row:

Gene	log_Cancer_1	log_Cancer_2	log_Normal_1	log_Normal_2	Geometric Mean
Gene2	0.902227576	0.843225108	1.296601193	1.252445295	1.000000000
Gene3	1.054735087	0.719199912	1.173406967	1.056871709	1.000000000
Median	0.978481332	0.781212510	1.235004080	1.154658502	1.000000000

We take Median value for each sample as below

Size Factors Table:

Measurement	log_Cancer_1	log_Cancer_2	log_Normal_1	log_Normal_2	Geometric Mean
Median	0.978481332	0.781212510	1.235004080	1.154658502	1.000000000
Size Factor	2.657449966	2.182818389	3.444949732	3.176176431	1.000000000

Devide orginal counts value with Size factors to get Normalized counts table




    
        Original Count Values Table:

        | Gene  | Cancer_1 | Cancer_2 | Normal_1 | Normal_2 |
        |-------|----------|----------|----------|----------|
        | Gene1 | 0        | 15       | 7        | 9        |
        | Gene2 | 8        | 7        | 20       | 18       |
        | Gene3 | 5        | 3        | 6        | 5        |
    
    
    
        Normalized Counts Table:

        | Gene  | Cancer_1 | Cancer_2 | Normal_1 | Normal_2 |
        |-------|----------|----------|----------|----------|
        | Gene1 | 0.000000 | 6.705593 | 2.098512 | 2.828428 |
        | Gene2 | 3.010626 | 3.210221 | 5.799378 | 5.670599 |
        | Gene3 | 1.882102 | 1.373534 | 1.744955 | 1.575154 |

Differential Gene Expression :

Contents

Differential Gene Expression :#

Suppose if we have Counts data as like below .#