Christ University Institutional Repository

SOFT COMPUTING TECHNIQUES FOR HUB SEQUENCE ANALYSIS

S., ATHMAJA (2010) SOFT COMPUTING TECHNIQUES FOR HUB SEQUENCE ANALYSIS. Masters thesis, Christ University.

[img]PDF
Restricted to Registered users only

6Mb

Abstract

Bioinformatics, the combination of Biology and Information Technology has become a pioneer industry booming worldwide. One of the grand challenges in biology is to understand organizing principles of bimolecular networks. There seems to be a deliberate effort towards uncovering new laws of biological complexity. One of the most pressing needs in this area is the understanding of protein-protein interaction networks and their complexity. Hub proteins- network elements with high connectivity- literally ‘hold the networks together’. Though several experimental methods have been developed to identify hub proteins, it is important to supplement procedures for pattern recognition to classify/predict hub protein sequences. This research work aims at the classification and prediction of hub proteins of two model organisms- Homo sapiens and Escherichia coli using different computational approaches of pattern recognition such as Principal Component Analysis (PCA), Artificial Neural Network (ANN) and Linear Discriminant Analysis using (i) Class Dependent Approach (LDACD), (ii) Class Independent Approach (LDACIND), and (iii) Normal Bayes Classification (LDANB). The protein sequences are collected from biological databases such as APID and UniProt. The collected protein sequences are encoded based on the physiochemical, structural and thermodynamic properties of amino acids. Four different classification/prediction algorithms have been implemented in MATLAB. i.The first algorithm is designed by combining the techniques of PCA and ANN. Here feature selection is done by PCA technique and classification by ANN technique. ii.The second algorithm is implemented using LDA – Class dependent approach. Here feature selection is done by combining LDA and PCA i.e. HAD and classification is performed by using Euclidean distance formula. iii.This approach is a variant of the second approach; i. e. instead of setting 2 criteria functions, LDA- class independent approach uses only one criteria function. All other procedures are same as the second approach. iv.The last classification algorithm is implemented by combining the principles of LDA and Normal Bayes classification techniques. Here the classification is done according to Bayes rule, i.e. each object is assigned to the group with highest conditional probability. The sensitivity, specificity and accuracy measures are calculated for each algorithm by providing different test data sets. A comparison is also made to scrutinize the performance of these algorithms. The efficiencies of these methods are compared in terms of their accuracies. Among all the methods the LDANB method gave the best accuracy of 93.8% for Escherichia coli and 97% for Homo sapiens.

Item Type:Thesis (Masters)
Subjects:Thesis > MPhil > Computer Science
Divisions:M Phil
ID Code:1805
Deposited By:Knowledge Center Christ University
Deposited On:14 Dec 2011 15:30
Last Modified:30 Jan 2014 20:01

Repository Staff Only: item control page