Integration and Visualization of Pharmacogenomic Data of Human Cancer Cell Lines and Tissues
- Technical advancement in biology has led to generation of enormous amount of multiomics datasets. Integration of these multiomics datasets with biological databases is essential for biologists since it allows them to uncover hidden connections between biological entities. However, the process of integrating these datasets is challenging because of their diverse and heterogeneous nature. Since each biological database and omics dataset is developed and generated independently to cover specific biological and omics domain, therefore, their structure – how data is organized – differs from each other. Because of their heterogeneous nature, integration of omics databases has been one of the challenging tasks for omics data scientists.
Resource Description framework (RDF) is de-facto stranded that enables linking heterogeneous resources by providing unified mechanism to publish data in form of triples. Databases containing triples is known as triple store. EBI-RDF platform enabled interpretable and integrated access to six independent biological databases by publishing their triple stores using RDF technology. However querying these triple stores requires in depth knowledge about their schema and SPARQL query language. To overcome this limitation in the first part of this dissertation presents cMapper, a gene centric platform to visualize integrated biological databases in biologist friendly fashion. cMapper allows biologists to query six biological databases -- (1) UniProt, (2) Expression Atlas, (3) REACTOME, (4) ChEMBL, (5) BioModels and (6) Biosamples -- in an integrated fashion without technical knowledge of RDF and SPARQL query language.
The second part of dissertation presents IPCT -- an extended version of cMapper --, a framework that integrates pharmacogenomics data with other biological databases. IPCT integrates genomic aberrations of cancer cell lines from CCLE, drug response data from CTRP, genomic aberrations of cancer tissues from cBioPortal experimental conditions of differentially expressed genes from Expression Atlas, and biological pathways from REACTOME. IPCT allows biologists to search for genomic aberrations of cancer cell lines sensitive to drug of interest. Conversely, they can search for drugs sensitive to cell lines of interest. Furthermore, IPCT allows users to compare genomic aberrations in cancer cell lines and tissues by integrating
cMapper and IPCT allow users to apply filters on entities of interest. If users enters more than one genes, small molecule or cell lines, they can select options to find common biological objects connected with input. Furthermore, both platforms allow users to visualize their graph on screen or download them in as PNG or GraphML format. IPCT additionally also allows users to download data in CSV and JSON format to perform further analysis. Conclusively the research done in this dissertation addresses the problem of data integration in biology and demonstrates how modern-day data computational methods can be used to present integrated biological data in biologists’ friendly way so that biologists can use them to uncover to build their hypothesis by identifying potential hidden relationships between biological entities.
- 소하입 무함마드
- Issued Date
- Awarded Date
- Authorize & License
- Files in This Item:
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.