EBI Summary page | Google Scholar | Pubmed | ORCHID: 0000-0001-8681-9110 | Impactstory

Integrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets.


BACKGROUND: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. RESULTS: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. CONCLUSIONS: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data.


A potentially curative hepatic resection is the optimal treatment for hepatocellular carcinoma (HCC), but most patients are not candidates for resection and most resected HCCs eventually recur. Until recently, neoadjuvant systemic therapy for HCC has been limited by a lack of effective systemic agents. Here, in a single arm phase 1b study, we evaluated the feasibility of neoadjuvant cabozantinib and nivolumab in patients with HCC including patients outside of traditional resection criteria (NCT03299946). Of 15 patients enrolled, 12 (80%) underwent successful margin negative resection, and 5/12 (42%) patients had major pathologic responses. In-depth biospecimen profiling demonstrated an enrichment in T effector cells, as well as tertiary lymphoid structures, CD138+ plasma cells, and a distinct spatial arrangement of B cells in responders as compared to non-responders, indicating an orchestrated B-cell contribution to antitumor immunity in HCC.


As the single cell field races to characterize each cell type, state, and behavior, the complexity of the computational analysis approaches the complexity of the biological systems. Single cell and imaging technologies now enable unprecedented measurements of state transitions in biological systems, providing high-throughput data that capture tens-of-thousands of measurements on hundreds-of-thousands of samples. Thus, the definition of cell type and state is evolving to encompass the broad range of biological questions now attainable. To answer these questions requires the development of computational tools for integrated multi-omics analysis. Merged with mathematical models, these algorithms will be able to forecast future states of biological systems, going from statistical inferences of phenotypes to time course predictions of the biological systems with dynamic maps analogous to weather systems. Thus, systems biology for forecasting biological system dynamics from multi-omic data represents the future of cell biology empowering a new generation of technology-driven predictive medicine.


Single-cell technologies are emerging as powerful tools for cancer research. These technologies characterize the molecular state of each cell within a tumor, enabling new exploration of tumor heterogeneity, microenvironment cell-type composition, and cell state transitions that affect therapeutic response, particularly in the context of immunotherapy. Analyzing clinical samples has great promise for precision medicine but is technically challenging. Successfully identifying predictors of response requires well-coordinated, multi-disciplinary teams to ensure adequate sample processing for high-quality data generation and computational analysis for data interpretation. Here, we review current approaches to sample processing and computational analysis regarding their application to translational cancer immunotherapy research.


BACKGROUND: The majority of pancreatic ductal adenocarcinomas (PDAC) are diagnosed at the metastatic stage, and standard therapies have limited activity with a dismal 5-year survival rate of only 8%. The liver and lung are the most common sites of PDAC metastasis, and each have been differentially associated with prognoses and responses to systemic therapies. A deeper understanding of the molecular and cellular landscape within the tumor microenvironment (TME) metastasis at these different sites is critical to informing future therapeutic strategies against metastatic PDAC. RESULTS: By leveraging combined mass cytometry, immunohistochemistry, and RNA sequencing, we identify key regulatory pathways that distinguish the liver and lung TMEs in a preclinical mouse model of metastatic PDAC. We demonstrate that the lung TME generally exhibits higher levels of immune infiltration, immune activation, and pro-immune signaling pathways, whereas multiple immune-suppressive pathways are emphasized in the liver TME. We then perform further validation of these preclinical findings in paired human lung and liver metastatic samples using immunohistochemistry from PDAC rapid autopsy specimens. Finally, in silico validation with transfer learning between our mouse model and TCGA datasets further demonstrates that many of the site-associated features are detectable even in the context of different primary tumors. CONCLUSIONS: Determining the distinctive immune-suppressive features in multiple liver and lung TME datasets provides further insight into the tissue specificity of molecular and cellular pathways, suggesting a potential mechanism underlying the discordant clinical responses that are often observed in metastatic diseases.


Hepatocellular carcinoma (HCC) is the fourth leading cause of cancer death worldwide with a minority of patients being diagnosed early enough for curative-intent interventions. We report the first use of preoperative cabozantinib plus nivolumab to successfully downstage what presented as unresectable HCC as part of an ongoing phase 1b study. Preoperative treatment with cabozantinib and nivolumab led to >99% reduction in alpha-fetoprotein, -37.3% radiographic reduction by RECIST 1.1 and a near complete pathologic response (80% to 100% necrosis). An integrated immunological analysis was performed on the post-treatment surgical tumor sample and matched pre-treatment and post-treatment peripheral blood samples with high-dimensional imaging and cytometry techniques. Bayesian non-negative matrix factorization (CoGAPS, Coordinated Gene Activity in Pattern Sets) and self-organizing map (FlowSOM) algorithms were used to distinguish changes in functional markers across cellular neighborhoods in the single cell data sets. Brisk immunological infiltration into the tumor microenvironment was observed in non-random, organized cellular neighborhoods. Systemically, combination therapy led to marked promotion of effector cytotoxic T cells and effector memory helper T cells. Natural killer cells also increased with therapy. The patient remains without disease recurrence and with a normal alpha-fetoprotein approximately 2 years from presentation. Our study provides proof-of-concept that borderline resectable or locally advanced HCC warrants consideration of downstaging with effective neoadjuvant systemic therapy for subsequent curative resection.


Parallel processing circuits are thought to dramatically expand the network capabilities of the nervous system. Magnocellular and parvocellular oxytocin neurons have been proposed to subserve two parallel streams of social information processing, which allow a single molecule to encode a diverse array of ethologically distinct behaviors. Here we provide the first comprehensive characterization of magnocellular and parvocellular oxytocin neurons in male mice, validated across anatomical, projection target, electrophysiological, and transcriptional criteria. We next use novel multiple feature selection tools in Fmr1-KO mice to provide direct evidence that normal functioning of the parvocellular but not magnocellular oxytocin pathway is required for autism-relevant social reward behavior. Finally, we demonstrate that autism risk genes are enriched in parvocellular compared with magnocellular oxytocin neurons. Taken together, these results provide the first evidence that oxytocin-pathway-specific pathogenic mechanisms account for social impairments across a broad range of autism etiologies.


The development of single-cell RNA sequencing (scRNA-seq) has allowed high-resolution analysis of cell-type diversity and transcriptional networks controlling cell-fate specification. To identify the transcriptional networks governing human retinal development, we performed scRNA-seq analysis on 16 time points from developing retina as well as four early stages of retinal organoid differentiation. We identified evolutionarily conserved patterns of gene expression during retinal progenitor maturation and specification of all seven major retinal cell types. Furthermore, we identified gene-expression differences between developing macula and periphery and between distinct populations of horizontal cells. We also identified species-specific patterns of gene expression during human and mouse retinal development. Finally, we identified an unexpected role for ATOH7 expression in regulation of photoreceptor specification during late retinogenesis. These results provide a roadmap to future studies of human retinal development and may help guide the design of cell-based therapies for treating retinal dystrophies.


Better understanding of the progression of neural stem cells (NSCs) in the developing cerebral cortex is important for modeling neurogenesis and defining the pathogenesis of neuropsychiatric disorders. Here, we use RNA sequencing, cell imaging, and lineage tracing of mouse and human in vitro NSCs and monkey brain sections to model the generation of cortical neuronal fates. We show that conserved signaling mechanisms regulate the acute transition from proliferative NSCs to committed glutamatergic excitatory neurons. As human telencephalic NSCs develop from pluripotency in vitro, they transition through organizer states that spatially pattern the cortex before generating glutamatergic precursor fates. NSCs derived from multiple human pluripotent lines vary in these early patterning states, leading differentially to dorsal or ventral telencephalic fates. This work furthers systematic analyses of the earliest patterning events that generate the major neuronal trajectories of the human telencephalon.


MOTIVATION: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. RESULTS: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. AVAILABILITY AND IMPLEMENTATION: projectR is available on Bioconductor and at CONTACT: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Autoimmune uveoretinitis is a significant cause of visual loss, and mouse models offer unique opportunities to study its disease mechanisms. Aire (-/-) mice fail to express self-antigens in the thymus, exhibit reduced central tolerance, and develop a spontaneous, chronic, and progressive uveoretinitis. Using single-cell RNA sequencing (scRNA-seq), we characterized wild-type and Aire (-/-) retinas to define, in a comprehensive and unbiased manner, the cell populations and gene expression patterns associated with disease. Based on scRNA-seq, immunostaining, and in situ hybridization, we infer that 1) the dominant effector response in Aire (-/-) retinas is Th1-driven, 2) a subset of monocytes convert to either a macrophage/microglia state or a dendritic cell state, 3) the development of tertiary lymphoid structures constitutes part of the Aire (-/-) retinal phenotype, 4) all major resident retinal cell types respond to interferon gamma (IFNG) by changing their patterns of gene expression, and 5) Muller glia up-regulate specific genes in response to IFN gamma and may act as antigen-presenting cells.


Tumor heterogeneity provides a complex challenge to cancer treatment and is a critical component of therapeutic response, disease recurrence, and patient survival. Single-cell RNA-sequencing (scRNA-seq) technologies have revealed the prevalence of intratumor and intertumor heterogeneity. Computational techniques are essential to quantify the differences in variation of these profiles between distinct cell types, tumor subtypes, and patients to fully characterize intratumor and intertumor molecular heterogeneity. In this study, we adapted our algorithm for pathway dysregulation, Expression Variation Analysis (EVA), to perform multivariate statistical analyses of differential variation of expression in gene sets for scRNA-seq. EVA has high sensitivity and specificity to detect pathways with true differential heterogeneity in simulated data. EVA was applied to several public domain scRNA-seq tumor datasets to quantify the landscape of tumor heterogeneity in several key applications in cancer genomics such as immunogenicity, metastasis, and cancer subtypes. Immune pathway heterogeneity of hematopoietic cell populations in breast tumors corresponded to the amount of diversity present in the T-cell repertoire of each individual. Cells from head and neck squamous cell carcinoma (HNSCC) primary tumors had significantly more heterogeneity across pathways than cells from metastases, consistent with a model of clonal outgrowth. Moreover, there were dramatic differences in pathway dysregulation across HNSCC basal primary tumors. Within the basal primary tumors, there was increased immune dysregulation in individuals with a high proportion of fibroblasts present in the tumor microenvironment. These results demonstrate the broad utility of EVA to quantify intertumor and intratumor heterogeneity from scRNA-seq data without reliance on low-dimensional visualization. SIGNIFICANCE: This study presents a robust statistical algorithm for evaluating gene expression heterogeneity within pathways or gene sets in single-cell RNA-seq data.


Precise temporal control of gene expression in neuronal progenitors is necessary for correct regulation of neurogenesis and cell fate specification. However, the cellular heterogeneity of the developing CNS has posed a major obstacle to identifying the gene regulatory networks that control these processes. To address this, we used single-cell RNA sequencing to profile ten developmental stages encompassing the full course of retinal neurogenesis. This allowed us to comprehensively characterize changes in gene expression that occur during initiation of neurogenesis, changes in developmental competence, and specification and differentiation of each major retinal cell type. We identify the NFI transcription factors (Nfia, Nfib, and Nfix) as selectively expressed in late retinal progenitor cells and show that they control bipolar interneuron and Muller glia cell fate specification and promote proliferative quiescence.


Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity.


Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor:


The mammalian CNS is capable of tolerating chronic hypoxia, but cell type-specific responses to this stress have not been systematically characterized. In the Norrin KO (Ndp (KO) ) mouse, a model of familial exudative vitreoretinopathy (FEVR), developmental hypovascularization of the retina produces chronic hypoxia of inner nuclear-layer (INL) neurons and Muller glia. We used single-cell RNA sequencing, untargeted metabolomics, and metabolite labeling from (13)C-glucose to compare WT and Ndp (KO) retinas. In Ndp (KO) retinas, we observe gene expression responses consistent with hypoxia in Muller glia and retinal neurons, and we find a metabolic shift that combines reduced flux through the TCA cycle with increased synthesis of serine, glycine, and glutathione. We also used single-cell RNA sequencing to compare the responses of individual cell types in Ndp (KO) retinas with those in the hypoxic cerebral cortex of mice that were housed for 1 week in a reduced oxygen environment (7.5% oxygen). In the hypoxic cerebral cortex, glial transcriptome responses most closely resemble the response of Muller glia in the Ndp (KO) retina. In both retina and brain, vascular endothelial cells activate a previously dormant tip cell gene expression program, which likely underlies the adaptive neoangiogenic response to chronic hypoxia. These analyses of retina and brain transcriptomes at single-cell resolution reveal both shared and cell type-specific changes in gene expression in response to chronic hypoxia, implying both shared and distinct cell type-specific physiologic responses.


We addressed the precursor role of aging-like spontaneous promoter DNA hypermethylation in initiating tumorigenesis. Using mouse colon-derived organoids, we show that promoter hypermethylation spontaneously arises in cells mimicking the human aging-like phenotype. The silenced genes activate the Wnt pathway, causing a stem-like state and differentiation defects. These changes render aged organoids profoundly more sensitive than young ones to transformation by Braf(V600E), producing the typical human proximal BRAF(V600E)-driven colon adenocarcinomas characterized by extensive, abnormal gene-promoter CpG-island methylation, or the methylator phenotype (CIMP). Conversely, CRISPR-mediated simultaneous inactivation of a panel of the silenced genes markedly sensitizes to Braf(V600E)-induced transformation. Our studies tightly link aging-like epigenetic abnormalities to intestinal cell fate changes and predisposition to oncogene-driven colon tumorigenesis.


Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


BACKGROUND: Targeted therapies specifically act by blocking the activity of proteins that are encoded by genes critical for tumorigenesis. However, most cancers acquire resistance and long-term disease remission is rarely observed. Understanding the time course of molecular changes responsible for the development of acquired resistance could enable optimization of patients' treatment options. Clinically, acquired therapeutic resistance can only be studied at a single time point in resistant tumors. METHODS: To determine the dynamics of these molecular changes, we obtained high throughput omics data (RNA-sequencing and DNA methylation) weekly during the development of cetuximab resistance in a head and neck cancer in vitro model. The CoGAPS unsupervised algorithm was used to determine the dynamics of the molecular changes associated with resistance during the time course of resistance development. RESULTS: CoGAPS was used to quantify the evolving transcriptional and epigenetic changes. Applying a PatternMarker statistic to the results from CoGAPS enabled novel heatmap-based visualization of the dynamics in these time course omics data. We demonstrate that transcriptional changes result from immediate therapeutic response or resistance, whereas epigenetic alterations only occur with resistance. Integrated analysis demonstrates delayed onset of changes in DNA methylation relative to transcription, suggesting that resistance is stabilized epigenetically. CONCLUSIONS: Genes with epigenetic alterations associated with resistance that have concordant expression changes are hypothesized to stabilize the resistant phenotype. These genes include FGFR1, which was associated with EGFR inhibitors resistance previously. Thus, integrated omics analysis distinguishes the timing of molecular drivers of resistance. This understanding of the time course progression of molecular changes in acquired resistance is important for the development of alternative treatment strategies that would introduce appropriate selection of new drugs to treat cancer before the resistant phenotype develops.


Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: or or Supplementary information: Supplementary data are available at Bioinformatics online.


Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies.


SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: or or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Patients with oncogene driven tumors are treated with targeted therapeutics including EGFR inhibitors. Genomic data from The Cancer Genome Atlas (TCGA) demonstrates molecular alterations to EGFR, MAPK, and PI3K pathways in previously untreated tumors. Therefore, this study uses bioinformatics algorithms to delineate interactions resulting from EGFR inhibitor use in cancer cells with these genetic alterations. We modify the HaCaT keratinocyte cell line model to simulate cancer cells with constitutive activation of EGFR, HRAS, and PI3K in a controlled genetic background. We then measure gene expression after treating modified HaCaT cells with gefitinib, afatinib, and cetuximab. The CoGAPS algorithm distinguishes a gene expression signature associated with the anticipated silencing of the EGFR network. It also infers a feedback signature with EGFR gene expression itself increasing in cells that are responsive to EGFR inhibitors. This feedback signature has increased expression of several growth factor receptors regulated by the AP-2 family of transcription factors. The gene expression signatures for AP-2alpha are further correlated with sensitivity to cetuximab treatment in HNSCC cell lines and changes in EGFR expression in HNSCC tumors with low CDKN2A gene expression. In addition, the AP-2alpha gene expression signatures are also associated with inhibition of MEK, PI3K, and mTOR pathways in the Library of Integrated Network-Based Cellular Signatures (LINCS) data. These results suggest that AP-2 transcription factors are activated as feedback from EGFR network inhibition and may mediate EGFR inhibitor resistance.


Endogenous neural stem cells (NSCs) in the adult hippocampus are considered to be bi-potent, as they only produce neurons and astrocytes in vivo. In mouse, we found that inactivation of neurofibromin 1 (Nf1), a gene mutated in neurofibromatosis type 1, unlocked a latent oligodendrocyte lineage potential to produce all three lineages from NSCs in vivo. Our results suggest an avenue for promoting stem cell plasticity by targeting barriers of latent lineage potential.


Patterns in time-course gene expression data can represent the biological processes that are active over the measured time period. However, the orthogonality constraint in standard pattern-finding algorithms, including notably principal components analysis (PCA), confounds expression changes resulting from simultaneous, non-orthogonal biological processes. Previously, we have shown that Markov chain Monte Carlo nonnegative matrix factorization algorithms are particularly adept at distinguishing such concurrent patterns. One such matrix factorization is implemented in the software package CoGAPS. We describe the application of this software and several technical considerations for identification of age-related patterns in a public, prefrontal cortex gene expression dataset.