Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Comparative genomics beyond sequence-based alignments : RNA structures in the ENCODE regions. / Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.; Bramsen, Jesper Bertram; Hansen, Claus; Kjems, Jørgen; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan.

I: Genome Research, Bind 18, Nr. 2, 2008, s. 242-251.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Þórarinsson, E, Yao, Z, Wiklund, ED, Bramsen, JB, Hansen, C, Kjems, J, Tommerup, N, Ruzzo, WL & Gorodkin, J 2008, 'Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions', Genome Research, bind 18, nr. 2, s. 242-251.

APA

Þórarinsson, E., Yao, Z., Wiklund, E. D., Bramsen, J. B., Hansen, C., Kjems, J., Tommerup, N., Ruzzo, W. L., & Gorodkin, J. (2008). Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Research, 18(2), 242-251.

Vancouver

Þórarinsson E, Yao Z, Wiklund ED, Bramsen JB, Hansen C, Kjems J o.a. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Research. 2008;18(2):242-251.

Author

Þórarinsson, Elfar ; Yao, Zizhen ; Wiklund, Eric D. ; Bramsen, Jesper Bertram ; Hansen, Claus ; Kjems, Jørgen ; Tommerup, Niels ; Ruzzo, Walter L. ; Gorodkin, Jan. / Comparative genomics beyond sequence-based alignments : RNA structures in the ENCODE regions. I: Genome Research. 2008 ; Bind 18, Nr. 2. s. 242-251.

Bibtex

@article{3774f130e88811ddbf70000ea68e967b,
title = "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions",
abstract = "Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.",
author = "Elfar {\TH}{\'o}rarinsson and Zizhen Yao and Wiklund, {Eric D.} and Bramsen, {Jesper Bertram} and Claus Hansen and J{\o}rgen Kjems and Niels Tommerup and Ruzzo, {Walter L.} and Jan Gorodkin",
year = "2008",
language = "English",
volume = "18",
pages = "242--251",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "2",

}

RIS

TY - JOUR

T1 - Comparative genomics beyond sequence-based alignments

T2 - RNA structures in the ENCODE regions

AU - Þórarinsson, Elfar

AU - Yao, Zizhen

AU - Wiklund, Eric D.

AU - Bramsen, Jesper Bertram

AU - Hansen, Claus

AU - Kjems, Jørgen

AU - Tommerup, Niels

AU - Ruzzo, Walter L.

AU - Gorodkin, Jan

PY - 2008

Y1 - 2008

N2 - Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.

AB - Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.

M3 - Journal article

VL - 18

SP - 242

EP - 251

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 2

ER -

ID: 9905718