Syngraph: An application for graphic display and interactive use of synonym lists

Multiple names that refer to a single species (synonyms) and more than one species being referred to by the same name (homonyms) bedevil taxonomy. They produce ambiguity about the entity under discussion. Syngraph is a computer application that organizes information about synonyms and homonyms. It can track different names that potentially have been applied to the same species, or identical names that have been applied to different species. It can create a list of synonyms in conventional format for use in publication, as for a taxonomic monograph. It can also display and print names so they are linked, thereby providing information on the conceptual basis of a name and the action taken in a publication. In the display, each name is imposed on a color-coded rectangle; all names on rectangles of the same color refer to records that stem from a single description. This allows quick visualization of the taxonomic history. When linked to a geographical information system application, the color can be used for points plotted on a map that displays the geographical locality of specimens referred to in each record. This visualization of the geographic distribution of the nominal species can provide tests of the hypothesis that the names are, indeed, synonyms. Syngraph is available for download; help files accompany the executable files.


Introduction
A list of synonyms and homonyms is an essential element of most species treatments, be it part of a monograph (Berta et al. 1995) or the description of a new species (Winston 1999). Linking synonymous names and differentiating between uses of the same name for organisms that really belong to different taxa are basic to biological research. Tracking synonyms (and, to a lesser degree, homonyms) is a monumental bookkeeping task (Dubois 2000). It rarely suffices to make a simple statement that one name equals another (signifying that two names have been applied to a single species). A usage may apply to multiple species, in which case the synonymy is only in part (pro parte), some usages are subsequently disagreed with by taxonomists, some specimens are thought to have been misidentified, etc. Table 7.1 of Winston (1999), which is entitled "Terms and abbreviation used in synonymies and taxonomic description," runs to nearly three pages.
We have developed Syngraph, a computer application that uses a relational database to allow linkage of any published name to any other using a lexicon of modifiers that are conventional in taxonomy, anchoring each use to its bibliographic reference. Such an instance is what is referred to as a "Taxon Name Usage" (TNU) in the Global Names Architecture (http://gnapartnership.org). Through the use of color and other symbology, the graphical synonym list generated by the report function of our application provides more information than is provided in conventional lists about relationship of the name to other members of the list and about confidence in that usage. Syngraph also allows access to all published uses of a name: print lists commonly provide only uses relevant to the taxon of concern. For example, a name that has been used as a synonym of two other names would be a member of the list of synonyms for all three names; all can easily be traced by means of Syngraph. Syngraph is not bound to a particular classification -a name can appear in multiple synonym lists that reflect differences of opinion among taxonomists. Syngraph is a means of inventorying and organizing information from such varied opinions. For opinions too incompatible to be presented together (e.g. disagreement about the senior synonym, questionable synonymies that require merging large lists), competing taxonomic hypotheses can be developed as separate synonym lists.
The tracking and display of names in our application has three functions. One is to create the historical record of name use. This function is relevant to taxonomic publications such as descriptions and monographs. Another is to allow the testing of hypotheses about the uses of names. This function has broad relevance in systematics and biogeography. We discuss it after explaining the display and functioning of the program. The third relates to assembling information about a taxon, by compiling all references referring to a taxonomic entity, regardless of the name(s) used to refer to it. For example, in a biogeographic analysis that defines a species' range based on occurrence records, the number of records (and thereby the confidence in defining the species' range) can be increased not only by obtaining more records for the nominal species but by including records that use its synonymous names, and it can be further refined by excluding from consideration records using the same name that actually refer to other species (Fautin in press).
Throughout this manuscript we use the term "synonym" to encompass all uses of names that refer to a single taxonomic entity, not just those that formally purport to make a taxonomic change. Thus the list assembled is central to formulating a taxon concept, and is what Smith and Smith (1972) referred to as a chresonymy, what Winston (1999) referred to as a full synonymy, or what Dubois (2000) referred to as a holonymy. Dubois (2000) exhaustively distinguished among the types of information taxon lists contain; terms he proposed could be applied in Syngraph as adjectives and the list named according to his terminology. Of course, a synonymy in the strict sense can be compiled using Syngraph if the user enters only those elements considered properly to belong in such a list. Alternatively, from a comprehensive list, only some elements can be shown, as is done for each species in the display termed "Strict Synonymy" on the website "Hexacorallians of the World" (http://hercules.kgs.ku.edu/Hexacoral/Anemone2/index.cfm).
A synonym list generated by Syngraph is compatible with the symbols used in open nomenclature (e.g. ?, cf., and aff.). We include such symbols in our citation of a name to preserve the sense in which the cited authority used the name (for details, see Matthews 1973). These symbols indicate problems and uncertainties in the synonym list, and highlight aspects subject to debate and in need of improvement.
We describe the basic attributes of Syngraph. The examples presented here and on the website deal with species names of sea anemones and their allies, but Syngraph can manage taxonomic names of any rank or taxon. Syngraph may be downloaded at http://hercules.kgs.ku.edu/Hexacoral/Anemone2/syngraph.cfm; detailed instructions for use are part of the help file in the download, available also at http://web.nhm.ku.edu/ inverts/syngraph/beta/index.htm.
We are aware of no other publicly available applications that function like and create the output of Syngraph, which is a tool for research and data storage by practicing systematists. We infer that conceptually similar programs must have been developed to allow assembly of taxonomic inventories such as CLEMAM (http://www.somali.asso.fr/clemam/biotaxis.php). Applications such as Linnaeus II, which "facilitates biodiversity documentation and species identification" (http://www.eti.uva.nl/products/linnaeus.php), are designed more for outreach to people who are not specialists in a taxon. Applications such as those described by Graham and Kennedy (2005) and Craig and Kennedy (2008) deal with synonymy between taxa in alternative classifications rather than whether two names refer to a single species.

Implementation
Syngraph works with a relational database that provides data storage. A Syngraph user can link the application to an existing taxonomic database or build one as the data relevant to the synonymies are input.
The relational database platforms with which Syngraph can currently function are Access, MySQL, and SQL Server. An Oracle database can be used by mirroring it through links in an Access database file using an ODBC driver; such linkage can be done within Access, or though third party software. A Database Wizard guides a user through the process of linking an existing database to Syngraph, table by table and field by field.
The data needed to generate a Syngraph list are derived from sixteen tables (Fig. 1).
1) Seven tables concern bibliographic data. The three that store bibliographic details are defined by the nature of the publication: Books, Chapters, and Articles. The table References compiles in a less bibliographic complete manner information from the Books, Chapters, and Articles tables. The Journals table, a master list of all the periodical publications, is referred to by entries in the Articles table. The Authors table is the master list of authors of the references. The Authors-Reference-Relationships table links each reference to its author(s). 2) Three tables concern the names that are the focus of Syngraph. The Names table contains every usage of every scientific name. There are separate fields for genus and above, subgenus, species, and subspecies (the application was developed primarily to track species names). By contrast with the Authors and Journals tables, there is no master list of names-a name (as a TNU) is listed separately in each publication in which it appears. Two other tables-Species Spellings and Supra-Specific Spellings-contain data that allow a name that was misspelled in publication to be rendered correctly so it is not considered as a distinct entity. Each table consists of one column containing the entry as it appeared in publication and another column with the name spelled correctly; in most cases the entries in the two columns are identical. Erroneous spellings are of varied sources, such as a lapsus, a printer's error, or a rendering that is no longer compliant with the relevant nomenclatural code (e.g. using numerals in the name of an animal). The correct spelling is either the original spelling or the orthography dictated by application of the relevant nomenclatural code (although in some cases the correct spelling may be open to debate, for purposes of linking records, one is selected for use in Syngraph -it may be changed as additional information is adduced).
3) Six tables, peculiar to Syngraph, must be created even if Syngraph is used with an existing relational database.
• Synsynonyms-list of synonyms for species names, each linked by the key name for a species, as well as color codes for the synonym units and indexes for sorting the names.
• Synrelation-the nature of relationships between two linked records.
• Synname_adjectives-links an adjective to a record.
• Synadjective _list-inventory of adjectives available for use in Syngraph.
• Valid_species -inventory of species names considered valid.
Syngraph will create any of the tables it needs that are not in a database to which it is connected. Due to the choice of Microsoft Visual Studio as the development environment, Syngraph is available only for the Windows operating system. Syngraph can be run using the Macintosh operating system through virtual machine software, such as Parallels, VMWare Fusion, or Apple Bootcamp; Sun's VirtualBox, which is available for free, runs on Windows, Linux, Mac, and OpenSolaris. Syngraph requires at least 1 GB of RAM to run properly.

The display
A core feature of Syngraph is its intuitive graphical display, designed to provide more information about the relationships between species names than is available with conventional text-based lists through the use of colors and annotations. The graphical list generated by Syngraph is composed of five blocks of information (Fig. 2). The two main blocks are the list of names (B) and authorship and reference information (E). Each line of an entry consists of the name of a taxon rendered precisely as given in the reference cited at the right end of the line. The reference consists of author, date, and page(s) for an original description; for a subsequent citation, the reference consists of two parts, the author and date of the name (which we term the verbatim authority), precisely as given in the subsequent citation, followed by the author, date, and page(s) of the subsequent citation itself. The verbatim authority is linked to the citation rather than the name part because it varies (far more than the rendering of the name itself) and may not even be cited (in which case we enter the word NONE).
Displayed in chronological order, these are the entries in a conventional list of synonyms. The display however, links the usage to the reference in which the name is cited rather than to the name itself. Thus, unlike journal titles and author names, there is no master list of species names-each appears in the Names table as many times as there are references containing it. We do this because one name can be used in different senses by different authors or by a single author at different times (e.g. Michener et al. 2007)-thus its function as a TNU.
An original description and all subsequent references to that species stemming from that original description form a synonym unit (an entity termed by Dubois [2000] a morphonym); all the names of a synonym unit are imposed on rectangles of one color. A rectangle outlined in black represents the particular FIGURE 2. The Syngraph display is composed of five blocks: A synonym relations; B list of names; C problems; D adjectives; E authorship and reference information. Each name in B is rendered precisely as given in the reference at the right end of the line in E; the verbatim author (the left-hand portion of E) is stored as part of the reference. FIGURE 3. Three entries of a synonym unit. The rectangle on which the first use of a name is imposed (the original description) is outlined in black. The genus name is annotated as having been misspelled -it was correctly rendered in the second citation. This example also illustrates an adjective, which is rendered as a two-branch relation using a solid green line to the right of the names. The name to which the adjective applies is linked to the name considered valid by the authority, indicated by a green box. Thus, the name listed third is a new combination, and the relation should be read: Carlgren (1949) placed the species described by Carlgren (1900) as Actinoides africana in the genus Anthopleura. usage instance in which that species name was made available-the original description. Thus, every combination derived from a single original description has the same color, regardless of the genus with which it is associated (e.g. Fig. 3), regardless of any change in rendering of the species name dictated by gender agreement, or regardless of other spelling variations (such as misspellings [also shown in Fig. 3] or changes to conform to the provisions of codes of nomenclature). This color coding provides a quick and intuitive understanding of how the names in a list relate to each other and of the complexity of a species' taxonomic history. The color used for a synonym unit has no significance other than to distinguish among synonym units. All uses of a particular name may not be members of the same synonym unit and therefore not bear the same color (e.g. homonyms, misidentifications). Syngraph is programmed to assign colors in a particular sequence, but colors may be manually changed-their appearance may differ in print than on screen, and people with color blindness may want to use colors in other than the default sequence.
To the left and right of the list of names are vertical lines that link related entries. A box on the lead line to the vertical lines indicates which of the linked names was considered valid by the authority who published the relation. On the left side of the name block (Fig. 2, column A) are displayed synonymy, homonymy, and determination relations. A synonym relation is three-branched; the solid black lines link the name considered valid by the authority for the relation with the two names considered related (Fig. 4). A homonym relation is two-branched; the dashed black lines link the name considered valid by the authority for the relation with the name considered homonymous (Fig. 5). A determination is depicted with three branches; the solid grey lines link the name used for specimens that had been incompletely identified with the name subsequently attributed to them by the authority for the relation and the original description of that taxon (Fig. 6). Problems or mistakes are displayed on the right (Fig. 2, column C). A misidentification relation is three-branched; the solid gray lines link the name considered by the authority for the relation to have been misapplied with the name considered correct by the authority and the original description of that taxon (Fig. 7). A non relation is twobranched; the solid red lines link the name considered valid by the authority for the relation with the name considered not to be part of the synonymy (Fig. 7). A pro parte relation is three-branched (Fig. 8); the dashed red lines link a record referring to specimens erroneously identified as belonging to a single species with the bibliographic citation to the source of the information and to the original description of that taxon. Because a pro parte name correctly applies to only some of the specimens identified as such, the specimens to which the name was incorrectly applied belong in the list of a different species where the name also appears as pro parte.

FIGURE 5.
A homonymy is represented as a two-branch relation using a dashed black line to the left of the names. The name the authority considered homonymous (on black) is linked to the name the authority considered valid; the original description of that species is indicated by a rectangle of same color outlined in black. The status of the homonymous name is indicated on the green line to its right. The relation shown here should be read: The status of Actinia aurora of Gosse, 1854, as a junior homonym of Actinia aurora of Quoy and Gaimard, 1833, was recognized by Dunn (1981), who used the name Heteractis aurora (Quoy & Gaimard, 1833) for the latter.

FIGURE 6.
A determination is represented as a three-branch relation using a solid gray line to the left of the names. The name initially used for the specimen(s) is linked to the first mention of the name and the name the authority considered valid, indicated by a gray box, which is followed, at the end of the line, by the name of the authority and the bibliographic citation to the source of the information. The relation shown here should be read: Dunn (1981) determined that Condylactis sp. of Saville-Kent (1897) is Heteractis aurora (Quoy & Gaimard, 1833).

FIGURE 7.
Misidentification and non relations. A misidentification is represented as a three-branch solid gray line to the right of the names. The name that was misapplied (imposed on light grey) is linked to the first mention of the name the authority considered valid and to the name used by the authority, which is indicated by a gray box, and which is followed, at the end of the line, by the name of the authority and the bibliographic citation to the source of the information. The relation shown here should be read: Carlgren (1938) found that Bolocera longicornis Carlgren, of Stephenson (1918) is Bolocera capensis Carlgren, 1928. The two-branch relation in red to the right of the names illustrates the use of non. The name the authority considered non (imposed on black) is linked to the name considered valid by the authority, indicated by a red box, and which is followed, at the end of the line, by the name of the authority and the bibliographic citation to the source of the information. The relation shown here should be read: Dunn (1983) considered Bolocera longicornis Carlgren, 1891, not to be a synonym of Bolocera kerguelensis Studer, 1879.

FIGURE 8.
Pro parte is represented as a three-branch relation using a dashed red line to the right of the names. The middle name is the one the authority considered was erroneously applied to some specimens. It is linked above to the name considered by the authority to be the senior synonym, and below (indicated by a red box) to the name considered valid, followed, at the end of the line, by the name of the authority and the bibliographic citation to the source of the information. The relation shown here should be read: Carlgren (1928) found that some of the specimens described by Pax (1922) as Rhytidactis antarctica are, in fact, Halianthella kerguelensis (Stud).
Names in a synonymy imposed on black, grey, and white rectangles are relevant to the species in question but are not certainly part of any synonym unit. A name that is not a synonym of the species in question is on a black rectangle: Figure 5 illustrates this for a homonym, and Figure 7 for non. A name that doubtfully refers to the species in question is on a gray rectangle: a name that is questionably synonymous with the species that is the subject of the list is on a dark gray rectangle (Fig. 9); a name that correctly belongs to specimens originally attributed to another species (misidentification) is on a light gray rectangle (Figs. 6, 7). A name on a white rectangle comes from a reference that treats a species other than the species in question but has relevance to the entries in the synonym list (for example, it may make invalid the name of the genus to which the species has been assigned) (Fig. 9).
To add adjectives to a synonym list, a user can choose from terms commonly used in taxonomic and nomenclatural publications displayed in the adjective list; the user may add to the picklist. An adjective may be rendered as a two-branch relation using a solid green line to the right of the names. The name to which the adjective applies is linked to the name discussed by the authority, indicated by a green box (Figs. 3, 5). Alternatively, an adjective may be added without citation of an authority, such as labeling a misspelling (Fig.  3).
Syngraph allows a user to indicate an author's degree of confidence in a relation by use of a symbol imposed over the box on the lead line preceding the valid name of the species in question. The lack of such a symbol denotes that the author indicated no degree of confidence. Full confidence in a relation is indicated by an asterisk (*). Probability is indicated by the letter p, and possibility by a question mark (?) (Fig. 9); addition of a plus symbol (+) indicates a higher value and addition of a minus symbol (-) indicates a lower value, thus providing five levels of certainty/uncertainty (there cannot be certainty +). Symbols for levels of confidence are rendered in parentheses if inferred from a statement by the author and without parentheses if explicitly stated by the author. Such a symbol, entered as part of a synonymy in Syngraph, should not be confused with a symbol that is part of a taxonomic name (Matthew 1973) (Fig. 9). lloydii, but is listed in Syngraph because the name C. lloydii has questionably been applied to C. vermicularis. Note that the line for Cerianthus vermicularis (E. Forbes) Gosse, 1860, contains two question marks that should not be confounded: that after the name vermicularis was introduced by the authority in the cited publication to signify the authority's uncertainty about the name, whereas that on the box of the lead line is the convention in Syngraph for such uncertainty (which is not as obvious in some citations as it is in this one).

Creating and modifying a syngraph synonym list
Details for using Syngraph are in files at http://web.nhm.ku.edu/inverts/syngraph/beta/index.htm; we provide an overview of the process to explain some aspects of the underlying philosophy. A knowledgeable user may affect the display by modifying the content of the tables directly rather than through the Syngraph interface; this option may be easier for some operations, such as changing the sequence in which names are listed (as when there is more than one publication in a year).
If a database is being built as part of implementing Syngraph, entering author and journal names first is suggested, although this information can be added at any time in association with entering a reference. Once bibliographic details are captured, the species names in a reference can be entered into the Names table, along with associated information needed for the display including whether the name is an original description, and the numbers of the pages on which that name can be found in the reference. The name should be entered precisely as rendered in the publication, even if incorrectly spelled (so a user who does not realize it is incorrect can find it). A misspelling can be indicated by choosing "Correct Spelling" from the Database menu. This way, the name remains unchanged in the Names table, but can be rendered correctly in other displays through the function of the Species Spellings and Supra-Specific Spellings tables.
To begin a synonym list, the user selects "Create new synonymy list" from the Database menu, then selects a name to enter from the inventory in the Names table. This name will be entered as the key name, so it is good to select what may be the senior name, but that is not necessary (for example, after a synonym list is well advanced, it may be discovered there is an earlier name for the species). As a list is built, if an entry is for an original description, the name will be imposed on a rectangle of a color not used in that synonym list (the user may change a color that has been automatically assigned). An entry that is not an original description is coded with the color used for an identically-spelled species name in the synonym list; if the name is a homonym, the user can manually change the color. If no match is found, the name is imposed on a block of red. Possible reasons for lack of a match are that the original description for that species had not yet been entered, that the species name was misspelled, or that the name is not a synonym but was added to the list for information; the user may manually change to the appropriate color, including to black, gray, or white, depending on the reason for inclusion of the name in the list. FIGURE 10. Geographical distribution of the sea anemone Heteractis aurora, its documented occurrences coded by the color used for the synonym unit applied to it in the Syngraph synonym list for the species (this image was captured when there were 73 records for the species; currently there are 84: http://hercules.kgs.ku.edu/Hexacoral/anemone2/ distribution.cfm?xmlsource=http%3A%2F%2Fhercules.kgs.ku.edu%2Fhexacoral%2Fdev%2Fxml%2Fhexmlscript.cfm %3Fseniorid%3D15%26type%3D&callingpage=species&speciessearched=Heteractis%20aurora).
Once a name is entered in a synonym list, the user can set relations between it and other entries, and insert adjectives. The type of relations and adjectives are selected from a pick-list. In the case of a three-branch relation, the user links the names that an authority considered related by 1) selecting the use of the name by the authority who proposed the synonymy, then 2) selecting the original description of the junior synonym, and finally 3) selecting the original description of the senior synonym. The result, in the illustrated example (Fig. 4), is read as "Carlgren (1928) considered Dimyactis duplicata Pax, 1922, to be a junior synonym of Edwardsia kerguelensis Studer, 1879, under the new combination Halianthella kerguelensis (Stud.)." For a two-branch relation or an adjective, the user links the source of the information to the name addressed.
From among the names in a synonym list, the user can select the name to be displayed as the valid name for all species in the list. If the precise name appears in the list as an original description, it is rendered as the valid name with the author and date of that description not in parentheses. If the name selected has a genus other than that used in the original description, the author and date appear in parentheses.