Newick Standardによる系統樹の記述について

DendroMaker Home Pageに戻る

 Newick Standardとは、系統樹を括弧やカンマを使って記述する方式です。これは、以前は New Hampshire Format と呼ばれていました。結合する分類群(OTU)を括弧で囲み、それを入れ子にすることによって、系統樹を表現します。枝の長さは、分類群の名称のあとにコロン(:)を書き、それに続けて数値で記載します。最も外側の括弧内の要素の数は通常3つです。それは、無根系統樹(unrooted tree)を表現するためです。しかし例外的に、最も外側の括弧内の要素の数が2つの場合があり、それは有根系統樹(rooted tree)に対応します。最も外側の括弧のあとには、セミコロン(;)を書きます。例えば、次のようになります。

((Human:0.3, Chimpanzee:0.2):0.1, Gorilla:0.3, (Mouse:0.6, Rat:0.5):0.2);

 この方式は完全には標準化されていませんが、実質的には多くの解析プログラムで使われています。この方式の詳細は、Dr. Joe Felsenstein らによる PHYLIP (ver. 3.572) に付属の書類(main.doc)で説明されています。以下の文章は、そこからの引用です。

(引用ここから)


The Tree File
--- ---- ----
     In output from most programs, a representation of the tree is also written
into  the  tree  file (usually named "treefile").  The tree is specified by the
nested pairs of parentheses, enclosing names and separated by commas.  If there
are any blanks in the names, these must be replaced by the underscore character
"_".  Trailing blanks  in  the  name  may  be  omitted.   The  pattern  of  the
parentheses  indicates  the  pattern  of  the  tree  by  having  each  pair  of
parentheses enclose all the members of a monophyletic group.  The tree file for
the above tree would have its first line look like this:

((Mouse,Bovine),((Orang,(Gorilla,(Chimp,Human))),Gibbon));

In the above tree the first fork separates the lineage  leading  to  Mouse  and
Bovine  from the lineage leading to the rest.  Within the latter group there is
a fork separating Gibbon from the rest, and so on.  The entire tree is enclosed
in  an outermost pair of parentheses.  The tree ends with a semicolon.  In some
programs such as DNAML, FITCH, and CONTML, the tree will be completely unrooted
and  specified  by  a  bottommost  fork  with  a  three-way  split,  with three
"monophyletic" groups separated by two commas:

(A,(B,(C,D)),(E,F));

The three "monophyletic" groups here are A, (B,C,D),  and  (E,F).   The  single
three-way  split  corresponds to one of the interior nodes of the unrooted tree
(it can be any interior node).  The remaining forks are encountered as you move
out from that first node, and each then appears as a two-way split.  You should
check the documentation files for the particular programs you are using to  see
in  which of these forms you can expect the user tree to be in.  Note that many
of the programs that estimate an unrooted tree produce trees in the treefile in
rooted  form!  This is done for reasons of arbitrary internal bookkeeping.  The
placement of the root is arbitrary.

     For programs estimating branch lengths, these are given in  the  trees  in
the  tree  file as real numbers following a colon, and placed immediately after
the group descended from that branch.  Here  is  a  typical  tree  with  branch
lengths:

((cat:47.14069,(weasel:18.87953,((dog:25.46154,(raccoon:19.19959,
bear:6.80041):0.84600):3.87382,(sea_lion:11.99700,
seal:12.00300):7.52973):2.09461):20.59201):25.0,monkey:75.85931);

Note that the tree may continue to a new line at any time except in the  middle
of  a  name  or the middle of a branch length, although in trees written to the
tree file this will only be done after a comma.

     These representations of trees are a subset of  the  standard  adopted  on
June  24, 1986 at the annual meetings of the Society for the Study of Evolution
at an meeting (the final session in Newick's lobster restaurant  --  hence  its
name  --  the  Newick  standard)  of  an informal committee consisting of Wayne
Maddison (MacClade), David Swofford (PAUP), F. James  Rohlf  (NTSYS-PC),  Chris
Meacham  (COMPROB  and  plotting  programs),  James  Archie  (character  coding
program), William H.E. Day, and me.   This  standard  is  a  generalization  of
PHYLIP's  format, itself based on a well-known representation of trees in terms
of parenthesis patterns which has  been  around  for  almost  a  century.   The
standard  is now employed by most phylogeny computer programs but unfortunately
has yet to be decribed in a formal published description.

(引用ここまで)