Tài liệu miễn phí Sinh học

Download Tài liệu học tập miễn phí Sinh học

An algorithm to enumerate all possible protein conformations verifying a set of distance constraints

The determination of protein structures satisfying distance constraints is an important problem in structural biology. Whereas the most common method currently employed is simulated annealing, there have been other methods previously proposed in the literature. Most of them, however, are designed to find one solution only.

12/29/2020 6:49:19 AM +00:00

Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach

Familial binding profiles (FBPs) represent the average binding specificity for a group of structurally related DNA-binding proteins. The construction of such profiles allows the classification of novel motifs based on similarity to known families, can help to reduce redundancy in motif databases and de novo prediction algorithms, and can provide valuable insights into the evolution of binding sites.

12/29/2020 6:49:11 AM +00:00

Exploring possible DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data

Pausing of DNA polymerase can indicate the presence of a DNA structure that differs from the canonical double-helix. Here we detail a method to investigate how polymerase pausing in the Pacific Biosciences sequencer reads can be related to DNA sequences.

12/29/2020 6:49:03 AM +00:00

Semi-supervised adaptive-height snipping of the hierarchical clustering tree

In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set.

12/29/2020 6:48:55 AM +00:00

PathwayBooster: A tool to support the curation of metabolic pathways

Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.

12/29/2020 6:48:48 AM +00:00

MARZ: An algorithm to combinatorially analyze gapped n-mer models of transcription factor binding

A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.

12/29/2020 6:48:40 AM +00:00

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

An important problem in computational biology is the automatic detection of protein families (groups of homologous sequences). Clustering sequences into families is at the heart of most comparative studies dealing with protein evolution, structure, and function.

12/29/2020 6:48:33 AM +00:00

ERD: A fast and reliable tool for RNA design including constraints

The function of an RNA in cellular processes is directly related to its structure. The free energy of RNA structure in another important key to its function as only some structures with a specific level of free energy can take part in cellular reactions.

12/29/2020 6:48:25 AM +00:00

The application of sparse estimation of covariance matrix to quadratic discriminant analysis

Although Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge.

12/29/2020 6:48:18 AM +00:00

MDAT- Aligning multiple domain arrangements

Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments.

12/29/2020 6:48:11 AM +00:00

APP: An Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools

Mass spectrometry analyses of complex protein samples yield large amounts of data and specific expertise is needed for data analysis, in addition to a dedicated computer infrastructure. Furthermore, the identification of proteins and their specific properties require the use of multiple independent bioinformatics tools and several database search algorithms to process the same datasets.

12/29/2020 6:48:03 AM +00:00

Identifying tandem Ankyrin repeats in protein structures

Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases.

12/29/2020 6:47:54 AM +00:00

Nonparametric Bayesian clustering to detect bipolar methylated genomic loci

With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms.

12/29/2020 6:47:47 AM +00:00

Fast inexact mapping using advanced tree exploration on backward search methods

Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays.

12/29/2020 6:47:39 AM +00:00

Decomposing the space of protein quaternary structures with the interface fragment pair library

The physical interactions between proteins constitute the basis of protein quaternary structures. They dominate many biological processes in living cells. Deciphering the structural features of interacting proteins is essential to understand their cellular functions.

12/29/2020 6:47:31 AM +00:00

Rational selection of experimental readout and intervention sites for reducing uncertainties in computational model predictions

Understanding the dynamics of biological processes can substantially be supported by computational models in the form of nonlinear ordinary differential equations (ODE). Typically, this model class contains many unknown parameters, which are estimated from inadequate and noisy data.

12/29/2020 6:47:24 AM +00:00

Pollux: Platform independent error correction of single and mixed genomes

Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads.

12/29/2020 6:47:17 AM +00:00

Structure and sequence analyses of Bacteroides proteins BVU_4064 and BF1687 reveal presence of two novel predominantly-beta domains, predicted to be involved in lipid and cell surface interactions

N-terminal domains of BVU_4064 and BF1687 proteins from Bacteroides vulgatus and Bacteroides fragilis respectively are members of the Pfam family PF12985 (DUF3869). Proteins containing a domain from this family can be found in most Bacteroides species and, in large numbers, in all human gut microbiome samples.

12/29/2020 6:47:09 AM +00:00

HyLiTE: Accurate and flexible analysis of gene expression in hybrid and allopolyploid species

Forming a new species through the merger of two or more divergent parent species is increasingly seen as a key phenomenon in the evolution of many biological systems. However, little is known about how expression of parental gene copies (homeologs) responds following genome merger.

12/29/2020 6:47:01 AM +00:00

Amyloid precursor protein interaction network in human testis: Sentinel proteins for male reproduction

Amyloid precursor protein (APP) is widely recognized for playing a central role in Alzheimer's disease pathogenesis. Although APP is expressed in several tissues outside the human central nervous system, the functions of APP and its family members in other tissues are still poorly understood.

12/29/2020 6:46:53 AM +00:00

FogBank: A single cell segmentation across multiple cell lines and image modalities

Many cell lines currently used in medical research, such as cancer cells or stem cells, grow in confluent sheets or colonies. The biology of individual cells provide valuable information, thus the separation of touching cells in these microscopy images is critical for counting, identification and measurement of individual cells.

12/29/2020 6:46:45 AM +00:00

Predicting protein functions using incomplete hierarchical labels

Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete.

12/29/2020 6:46:37 AM +00:00

Single-molecule dataset (SMD): A generalized storage format for raw and processed single-molecule data

Here we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format.

12/29/2020 6:46:30 AM +00:00

Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls

Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest.

12/29/2020 6:46:20 AM +00:00

SubPatCNV: Approximate subspace pattern mining for mapping copy-number variations

Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort.

12/29/2020 6:46:12 AM +00:00

A framework for feature extraction from hospital medical data with applications in risk prediction

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.

12/29/2020 6:46:04 AM +00:00

BitTorious: Global controlled genomics data publication, research and archiving via BitTorrent extensions

This article demonstrates the adaptation of BitTorrent to private collaboration networks in an authenticated, authorized and encrypted manner while retaining the same characteristics of standard BitTorrent.

12/29/2020 6:45:57 AM +00:00

Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine

MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation.

12/29/2020 6:45:49 AM +00:00

YersiniaBase: A genomic resource and analysis platform for comparative analysis of Yersinia

Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species.

12/29/2020 6:45:41 AM +00:00

Fast and robust group-wise eQTL mapping using sparse graphical models

Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits.

12/29/2020 6:45:33 AM +00:00