We propose a distance measure for DNA sequences by Discrete Fourier Transform. We propose a method in denote the n-point dft of the n-point sequence pdf scaling time series.
We apply the Discrete Fourier Transform distance on clustering of DNA sequences. DNA sequences, yet it is hampered with inherent limitations in computational complexity. Alignment-free methods have been developed over past decade for more efficient comparison and classification of DNA sequences than MSA. However, most alignment-free methods may lose structural and functional information of DNA sequences because they are based on feature extractions. Therefore, they may not fully reflect the actual differences among DNA sequences.
Alignment-free methods with information conservation are needed for more accurate comparison and classification of DNA sequences. In this method, we map DNA sequences into four binary indicator sequences and apply DFT to the indicator sequences to transform them into frequency domain. The Euclidean distance of full DFT power spectra of the DNA sequences is used as similarity distance metric. To compare the DFT power spectra of DNA sequences with different lengths, we propose an even scaling method to extend shorter DFT power spectra to equal the longest length of the sequences compared. After the DFT power spectra are evenly scaled, the DNA sequences are compared in the same DFT frequency space dimensionality.
We assess the accuracy of the similarity metric in hierarchical clustering using simulated DNA and virus sequences. The results demonstrate that the DFT based method is an effective and accurate measure of DNA sequence similarity. Check if you have access through your login credentials or your institution. This paper presents methods for rational sampling rate conversion performed in the domains of discrete Fourier transform and discrete cosine transform. Several important issues, such as the conversion error performance and the required computational complexity based on the proposed fast transform algorithms, are discussed in detail.