About Newick Standard (New Hampshire Format)

Back to DendroMaker Home Page

Newick Standard is a method to describe trees by parentheses and commas. This is previously called a New Hampshire Format. It describes trees by combining OTUs (operating taxonomic units) by nested parentheses. The branch lengths are written in numerals after OTU names followed by colons (:). The outermost parenthesis usually contains three elements. This is because the tree is an unrooted tree. However, the outermost parenthesis sometimes has only two elements, which corresponds to an rooted tree. A semicolon (;) is needed after the outermost parenthesis. Below is an example.

((Human:0.3, Chimpanzee:0.2):0.1, Gorilla:0.3, (Mouse:0.6, Rat:0.5):0.2);

Although it is not a common standard, this style has been, in practice, adopted by many programs for sequence data analyses. The details of this style is explained in a document attached to PHYLIP (ver. 3.572) developed by Dr. Joe Felsenstein and colleagues. Below is a citation from it.

(Citation begins)


The Tree File
--- ---- ----
     In output from most programs, a representation of the tree is also written
into  the  tree  file (usually named "treefile").  The tree is specified by the
nested pairs of parentheses, enclosing names and separated by commas.  If there
are any blanks in the names, these must be replaced by the underscore character
"_".  Trailing blanks  in  the  name  may  be  omitted.   The  pattern  of  the
parentheses  indicates  the  pattern  of  the  tree  by  having  each  pair  of
parentheses enclose all the members of a monophyletic group.  The tree file for
the above tree would have its first line look like this:

((Mouse,Bovine),((Orang,(Gorilla,(Chimp,Human))),Gibbon));

In the above tree the first fork separates the lineage  leading  to  Mouse  and
Bovine  from the lineage leading to the rest.  Within the latter group there is
a fork separating Gibbon from the rest, and so on.  The entire tree is enclosed
in  an outermost pair of parentheses.  The tree ends with a semicolon.  In some
programs such as DNAML, FITCH, and CONTML, the tree will be completely unrooted
and  specified  by  a  bottommost  fork  with  a  three-way  split,  with three
"monophyletic" groups separated by two commas:

(A,(B,(C,D)),(E,F));

The three "monophyletic" groups here are A, (B,C,D),  and  (E,F).   The  single
three-way  split  corresponds to one of the interior nodes of the unrooted tree
(it can be any interior node).  The remaining forks are encountered as you move
out from that first node, and each then appears as a two-way split.  You should
check the documentation files for the particular programs you are using to  see
in  which of these forms you can expect the user tree to be in.  Note that many
of the programs that estimate an unrooted tree produce trees in the treefile in
rooted  form!  This is done for reasons of arbitrary internal bookkeeping.  The
placement of the root is arbitrary.

     For programs estimating branch lengths, these are given in  the  trees  in
the  tree  file as real numbers following a colon, and placed immediately after
the group descended from that branch.  Here  is  a  typical  tree  with  branch
lengths:

((cat:47.14069,(weasel:18.87953,((dog:25.46154,(raccoon:19.19959,
bear:6.80041):0.84600):3.87382,(sea_lion:11.99700,
seal:12.00300):7.52973):2.09461):20.59201):25.0,monkey:75.85931);

Note that the tree may continue to a new line at any time except in the  middle
of  a  name  or the middle of a branch length, although in trees written to the
tree file this will only be done after a comma.

     These representations of trees are a subset of  the  standard  adopted  on
June  24, 1986 at the annual meetings of the Society for the Study of Evolution
at an meeting (the final session in Newick's lobster restaurant  --  hence  its
name  --  the  Newick  standard)  of  an informal committee consisting of Wayne
Maddison (MacClade), David Swofford (PAUP), F. James  Rohlf  (NTSYS-PC),  Chris
Meacham  (COMPROB  and  plotting  programs),  James  Archie  (character  coding
program), William H.E. Day, and me.   This  standard  is  a  generalization  of
PHYLIP's  format, itself based on a well-known representation of trees in terms
of parenthesis patterns which has  been  around  for  almost  a  century.   The
standard  is now employed by most phylogeny computer programs but unfortunately
has yet to be decribed in a formal published description.

(Citation ends)