Tài liệu miễn phí Sinh học

Download Tài liệu học tập miễn phí Sinh học

NMF-mGPU: Non-negative matrix factorization on multi-GPU systems

In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets.

12/29/2020 6:53:23 AM +00:00

Family-based association analysis: A fast and efficient method of multivariate association analysis with multiple variants

Many disease phenotypes are outcomes of the complicated interplay between multiple genes, and multiple phenotypes are affected by a single or multiple genotypes. Therefore, joint analysis of multiple phenotypes and multiple markers has been considered as an efficient strategy for genome-wide association analysis, and in this work we propose an omnibus family-based association test for the joint analysis of multiple genotypes and multiple phenotypes.

12/29/2020 6:53:14 AM +00:00

Identification of indels in next-generation sequencing data

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance.

12/29/2020 6:53:07 AM +00:00

Insertion and deletion correcting DNA barcodes based on watermarks

Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure.

12/29/2020 6:52:59 AM +00:00

Mutations and CpG islands among hepatitis B virus genotypes in Europe

Hepatitis B virus (HBV) genotypes have a distinct geographical distribution and influence disease progression and treatment outcomes. The purpose of this study was to investigate the distribution of HBV genotypes in Europe, the impact of mutation of different genotypes on HBV gene abnormalities, the features of CpG islands in each genotype and their potential role in epigenetic regulation.

12/29/2020 6:52:51 AM +00:00

A multiobjective approach to the genetic code adaptability problem

The organization of the canonical code has intrigued researches since it was first described. If we consider all codes mapping the 64 codes into 20 amino acids and one stop codon, there are more than 1.51 × 1084 possible genetic codes. The main question related to the organization of the genetic code is why exactly the canonical code was selected among this huge number of possible genetic codes.

12/29/2020 6:52:41 AM +00:00

A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium

Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals.

12/29/2020 6:52:33 AM +00:00

Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat

Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects are often unavoidable in long-term studies and meta-analysis.

12/29/2020 6:52:25 AM +00:00

BACA: Bubble chArt to compare annotations

DAVID is the most popular tool for interpreting large lists of gene/proteins classically produced in high-throughput experiments. However, the use of DAVID website becomes difficult when analyzing multiple gene lists, for it does not provide an adequate visualization tool to show/compare multiple enrichment results in a concise and informative manner.

12/29/2020 6:52:17 AM +00:00

Sampling with poling-based flux balance analysis: Optimal versus sub-optimal flux space analysis of Actinobacillus succinogenes

Flux balance analysis is traditionally implemented to identify the maximum theoretical flux for some specified reaction and a single distribution of flux values for all the reactions present which achieve this maximum value.

12/29/2020 6:52:09 AM +00:00

Pheno2Geno - High-throughput generation of genetic markers and maps from molecular phenotypes for crosses between inbred strains

Genetic markers and maps are instrumental in quantitative trait locus (QTL) mapping in segregating populations. The resolution of QTL localization depends on the number of informative recombinations in the population and how well they are tagged by markers.

12/29/2020 6:52:02 AM +00:00

Measuring semantic similarities by combining gene ontology annotations and gene co-function networks

Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms.

12/29/2020 6:51:54 AM +00:00

MBBC: An efficient approach for metagenomic binning based on clustering

Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement.

12/29/2020 6:51:46 AM +00:00

Extraction of relations between genes and diseases from text and large-scale data analysis: Implications for translational research

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories.

12/29/2020 6:51:38 AM +00:00

PCalign: A method to quantify physicochemical similarity of protein-protein interfaces

Structural comparison of protein-protein interfaces provides valuable insights into the functional relationship between proteins, which may not solely arise from shared evolutionary origin. A few methods that exist for such comparative studies have focused on structural models determined at atomic resolution, and may miss out interesting patterns present in large macromolecular complexes that are typically solved by low-resolution techniques.

12/29/2020 6:51:29 AM +00:00

Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites.

12/29/2020 6:51:22 AM +00:00

An evidence-based approach to identify aging-related genes in Caenorhabditis elegans

Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior.

12/29/2020 6:51:14 AM +00:00

The acquisition of novel N-glycosylation sites in conserved proteins during human evolution

N-linked protein glycosylation plays an important role in various biological processes, including protein folding and trafficking, and cell adhesion and signaling. The acquisition of a novel N-glycosylation site may have significant effect on protein structure and function, and therefore, on the phenotype.

12/29/2020 6:51:05 AM +00:00

A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses

Microbiome studies incorporate next-generation sequencing to obtain profiles of microbial communities. Data generated from these experiments are high-dimensional with a rich correlation structure but modest sample sizes.

12/29/2020 6:50:57 AM +00:00

Identifying restrictions in the order of accumulation of mutations during tumor progression: Effects of passengers, evolutionary models, and sampling

Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers.

12/29/2020 6:50:49 AM +00:00

Effective alignment of RNA pseudoknot structures using partition function posterior log-odds scores

RNA pseudoknots play important roles in many biological processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs.

12/29/2020 6:50:41 AM +00:00

Discovery of prognostic biomarkers for predicting lung cancer metastasis using microarray and survival data

Few studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining.

12/29/2020 6:50:33 AM +00:00

Metabolome searcher: A high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction

Mass spectrometric analysis of microbial metabolism provides a long list of possible compounds. Restricting the identification of the possible compounds to those produced by the specific organism would benefit the identification process.

12/29/2020 6:50:25 AM +00:00

Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data

Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions.

12/29/2020 6:50:16 AM +00:00

ViVaMBC: Estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering

Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing.

12/29/2020 6:50:09 AM +00:00

Aber-OWL: A framework for ontology-based data access in biology

Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning.

12/29/2020 6:50:01 AM +00:00

Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets

Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches.

12/29/2020 6:49:53 AM +00:00

TagDust2: A generic method to extract reads from sequencing data

Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.

12/29/2020 6:49:46 AM +00:00

MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis

In genome-wide studies, over-representation analysis (ORA) against a set of genes is an essential step for biological interpretation. Many gene annotation resources and software platforms for ORA have been proposed.

12/29/2020 6:49:36 AM +00:00

Data-intensive analysis of HIV mutations

In this study, clustering was performed using a bitmap representation of HIV reverse transcriptase and protease sequences, to produce an unsupervised classification of HIV sequences. The classification will aid our understanding of the interactions between mutations and drug resistance.

12/29/2020 6:49:28 AM +00:00