Informatica Logo

INFORMATICA
International Journal

Main Page
Editorial Board
Abstracting/Indexing
Instructions to Authors
Subscription Information


Contents
Author Index
Papers in Production

INFORMATICA, 2010, Vol. 21, No. 1, 13-30
© Institute of Mathematics and Informatics,

ISSN 0868-4952

Complexity Estimation of Genetic Sequences Using Information-Theoretic and Frequency Analysis Methods

Robertas DAMASEVICIUS

Software Engineering Department, Kaunas University of Technology Studentu 50-415, LT-51368 Kaunas, Lithuania E-mail: robertas.damasevicius@ktu.lt

Abstract

The genetic information in cells is stored in DNA sequences, represented by a string of four letters, each corresponding to a definite type of nucleotides. Genomic DNA sequences are very abundant in periodic patterns, which play important biological roles. The complexity of genetic sequences can be estimated using the information-theoretic methods. Low complexity regions are of particular interest to genome researchers, because they indicate to sequence repeats and patterns. In this paper, the complexity of genetic sequences is estimated using Shannon entropy, Rényi entropy and relative Kolmogorov complexity. The structural complexity based on periodicities is analyzed using the autocorrelation function and time delayed mutual information. As a case study, we analyze human 22nd chromosome and identify 3 and 49 bp periodicities.

Keywords:

genetic sequence, DNA analysis, entropy, complexity, frequency analysis, bioinformatics

To preview Lithuanian abstract see full article text

PDFTo preview full article text in PDF format click here

Get Free ReaderYou could obtain free Acrobat Reader from Adobe


TopTop Copyright © INFORMATICA, Vilnius University Institute of Mathematics and Informatics, 2010