DATA MINING BIG
DMB (Data Mining Big) is a collection of data analysis tools. Focus home page
DMB contains a collection of software tools that perform knowledge extraction from data. The methods adopted are based on several models and algorithms that have been developed by a team of researchers, most of them members of the computational and system biology research group. The description of the methods is available in different papers listed in the publication page, while the code can be executed remotely from this website on the servers of IASI - CNR - a research institute of the Italian Research Council. Results are sent by email or visualized on the web interface.
LOGIC RECOGNITION OF ALZHEIMER GENETIC FACTORS BASED ON MICROARRAY ANALYSIS
EBRI (European Brain Research Institute) is a private non-profit institute with the objective of investigating fundamental questions about the functional organization of the brain and to translate basic brain science into ways to possibly cure the diseases affecting the nervous system. IASI is actively collaboration with the Neurotrophic Factors and Neurodegenerative Diseases Laboratory of EBRI, using the logic mining system DMB (developed in IASI) to characterize the gene expression profile of the AD11 mice in different brain areas following temporal progression and to explain the onset of the Alzheimer disease and thus identify early biomarkers of the pathology.Focus home page
PROTEIN STRUCTURE ALIGNMENT
The structural Alignment of two or more proteins is one of the most studied problems in Bioinformatic and Computational Biology of the last decade. The problem consists in trying to establish equivalences between two or more polymer (proteins or RNAs) structures based on their sequence of residues, shape and three-dimensional conformation. On the basis of the information that one chose to consider, different types of alignment arise. The algorithms considering only the primary sequence of the proteins are usually referred to as sequence alignment algorithms. In contrast to this simple sequence alignment, three-dimensional structural alignment exploit to a full extent the knowledge on the tertiary structure of the proteins, that is the complete information on the coordinates of the atoms composing the proteins. Structural alignment is therefore a valuable tool especially for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Tools for structural alignment are of paramount importance also in assessing the reliability level of structural prediction methods. Indeed, evaluating such predictions often requires a structural alignment between the model and the true known structure to assess the model's quality. Structural alignments are especially useful in analyzing data from structural genomics and proteomics efforts, and they can be used as comparison points to evaluate alignments produced by purely sequence-based bioinformatics methods.Focus home page
Among the codes and methods specifically designed for proteins structural alignment, those concerned with binding and active sites of proteins are of particular interest. Indeed, The identification of protein binding sites, their classification and analysis is of much interest for drug design and treatment of diseases. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for the interaction with a specific ligand. At IASI we developed a new structural alignment method (CO) in collaboration with the University of Padova. The method is based on a reformulation of the structural alignment problem as a continuous global optimization problem. The method compares favorably with well-known tools for structural alignment such as Multibind and MolLoc.
SINGLE NUCLEOTIDE POLYMORPHISMS
Single Nucleotide Polymorphisms (SNPs) are single loci of the DNA where mutations occur. In these loci at least 5% of the individuals of a given population has the same nucleotide value that differs from the nucleotide value of the other 95% of individuals. The most frequent nucleotide value is called phase. SNPs contain all the relevant information of the DNA. SNPs analysis of DNA sequences is the main purpose of HAPMAP Project of the Division of Extramural Research of the National Human Genome Research Institute where two problems are studied:
- TAG SNP selection: to find specific nucleotides of haplotypes that identify the haplotypes (called tag SNPs) so reducing the number of SNPs required to examine the entire genome for association with a phenotype from the 10 million SNPs that exist to roughly 500,000 tag SNPs.
- Haplotype Inference: To produce information for studying the genetic factors contributing to variation in response to environmental factors, in susceptibility to infection, and in the effectiveness of and adverse responses to drugs and vaccines.
At IASI, Optimization techniques have been applied to both problems. In particular various Integer Programming Formulations of the Haplotype Inference Problem with Parsimony have been studied. The research produced the software for Haplotyping CollHaps (revision 2.0) that is the subject of the focus home page.Focus home page
SPECIES CLASSIFICATION THROUGH BARCODE AND LOGIC PROGRAMMING
The Consortium for the Barcode of Life (CBOL) is an international initiative devoted to developing DNA barcoding as a global standard for the identification of biological species. DNA barcoding is a new technique that uses a short DNA sequence from a standardized and agreed-upon position in the genome as a molecular diagnostic for species-level identification. IASI operates in the Data Analysis Working Group of the Consortium develiping ad hoc logic classification algorithms that are able to detect the specie from the barcode with high precision.Focus home page