Part I. Getting Started

Introduction

Welcome to CATE! CATE stands for "Creating a Taxonomic eScience" - a research grant of the same name funded the development of the CATE system. Later parts of this guide will show you how to install this application on a webserver, populate it with data, and customize your own version of CATE. To begin with, we'll take a look at the application and how you can use it to explore and publish taxonomic information online.

What is CATE?

Figure 1. The front page of a CATE site. A CATE web revision consists of a taxonomic checklist (arranged hierachically on the left) and a set of species pages (accessed by clicking on the species names). The checklist, names, literature, specimens, and other types of data can be queried and browsed in isolation (by following the links in the "Explore" sub-menu, on the left). Users can log in to the revision, allowing them to annotate and tag pages, edit pages, and administrate the web revision if they have appropriate permissions.

The front page of a CATE site. A CATE web revision consists of a taxonomic checklist (arranged hierachically on the left) and a set of species pages (accessed by clicking on the species names). The checklist, names, literature, specimens, and other types of data can be queried and browsed in isolation (by following the links in the "Explore" sub-menu, on the left). Users can log in to the revision, allowing them to annotate and tag pages, edit pages, and administrate the web revision if they have appropriate permissions.

Species Pages

Species pages bring together taxonomic, nomenclatural, and other information about a species (or a taxon of any rank) into a single page. As you might imagine, this can be a lot of information, so only the most important information is given on the species page and the rest is provided in pages linked from the species page. Species pages are shown when you click on a link in the taxon menu or if you search in the list of taxa.

Figure 2. Species pages consist of four major parts: The section with diagnostic and distributional data at the top, a section containing links to structured information about the taxon such as the subordinate taxa, synonyms and so on, a section containing links to structured information about the nomenclature of the taxon (objective / nomenclatural synonyms, type specimens and hybrids), and finally a section containing discussion of the nomenclature and typifictation of the taxon

Species pages consist of four major parts: The section with diagnostic and distributional data at the top, a section containing links to structured information about the taxon such as the subordinate taxa, synonyms and so on, a section containing links to structured information about the nomenclature of the taxon (objective / nomenclatural synonyms, type specimens and hybrids), and finally a section containing discussion of the nomenclature and typifictation of the taxon

Species pages contain:

  • The taxonomic name, including the sense in which the name is being used

  • The Lifescience Identifier for the taxon concept, if assigned

  • The protolog or nomenclatural reference where the name was published

  • Information describing the taxon concept, including sections of text, images, and references.

  • Links to any number of further descriptive sections.

  • The number of subordinate taxa. Click on this link to display a paged list of subordinate taxa

  • The number of synonyms. Click on this link to display a paged list of synonyms

  • A link to the data page that represents the Taxon Concept

  • The number of nomenclatural / objective synonyms, homonyms etc. Click on this link to display a paged list of nomenclatural / objective names, homonyms, etc.

  • The number of hybrids. Click on this link to display a paged list of hybrids

  • A link to the data page page that represents the Taxonomic Name

  • Discursive information about the taxonomic name, including sections of text, images, and references.

Species pages are made available using "pretty urls" under taxonomy as follows

Example 1. Species page pretty urls

				  http://{hostname}/taxonomy/{genusOrUninomial}		// Family-group and generic taxa
				  http://{hostname}/taxonomy/{genusOrUninomial}/{specificEpithet} // Species
				  http://{hostname}/taxonomy/{genusOrUninomial}/{specificEpithet}/{infraspecificEpithet}  // Infraspecific taxa
 			  

Pretty URLs are not as precise as using Globally Unique Identifiers (GUIDs) to reference particular taxonomic concepts as there is no guarentee that the sense in which the taxonomic name is used will be the same. For example, if Xus yus L. was synonymized, the species page displayed at http://www.cate-project.org/taxonomy/Xus/yus would resolve a different taxonomic concept, wheras the LSID urn:lsid:cate-project.org:taxon:1234 would always resolve to the same taxon concept.

Pretty URLs are intended to be used when you wish to link to whatever taxonomic concept has a particular name in the current version of the revision. If it is important to link to a particular object or to link to a specific version of that object, you should use one of the GUID s provided instead.

Data Pages

Figure 3. Data pages consist of at least one page, with more depending upon the type (or category) of data being displayed. Each page displays a title, the LSID of the object if assigned, and a menu of tabs linking to other information about the object. The core data is displayed in the middle of the page. Links to metadata including information about the copyright & licensing, provenance, citation of the page, change history and alternate formats are displayed at the bottom of the page.

Data pages consist of at least one page, with more depending upon the type (or category) of data being displayed. Each page displays a title, the LSID of the object if assigned, and a menu of tabs linking to other information about the object. The core data is displayed in the middle of the page. Links to metadata including information about the copyright & licensing, provenance, citation of the page, change history and alternate formats are displayed at the bottom of the page.

The species pages described above are composed of several different types of information. CATE stores this information in a structured format, meaning that for each taxonomic name, concept, reference, image, specimen (and so on), there is a single database record that can be linked to and included in multiple species pages. When a record is referenced by another record, it is usually displayed as a link in that page. Selecting that link will display the data page for the related object.

The fact that there is a single page for each distinct specimen, article, name, and so on makes curation of the web revision much easier, since any changes made to an object will be reflected in all of the species pages which that object appears in. Also, because objects are distinct from the pages which they appear in, they can be searched, browsed, edited, and used on their own. For example, the list of publications that are referenced from species pages doubles as the bibliography for the web revision. Likewise, images, specimens and other data can be searched, viewed, and re-used outside the context of the species pages that they appear in.

There are 9 distinct categories of data in CATE:

  • Taxonomic Concepts

  • Taxonomic Names

  • Descriptions of Taxonomic Names and Concepts

  • Literature

  • Specimens & Observations

  • Images, Identification Keys & Phylogenetic Trees

  • Biological Collections (such as Herbaria, Museums, and Botanic Gardens)

  • People (who are Authorities, Authors, Artists, Collectors, Determiners and so on)

  • Controlled Terms & Vocabularies

Searching & Browsing

Figure 4. Each type of data in CATE can be searched using 'google'-style free-text searching. Type a term or terms into the search box to filter the results that are returned.

Each type of data in CATE can be searched using 'google'-style free-text searching. Type a term or terms into the search box to filter the results that are returned.

CATE can be used to manage checklists with thousands of species, images and references. Each type of data can be browsed by selecting the data type in the left navigation menu. You can increase the size of the list returned, filter and sort the records, and page backwards and forwards through the results.

CATE uses the powerful free-text search engine developed by Apache lucene to index its data. By default, if you type some search terms into the form, then CATE will perform a 'google'-like multi-field query that searches multiple fields in each object. You can restrict the query to only consider certain fields, or combine terms using AND or quotes. The query syntax is described in more detail later on

In addition to searching through the different types of data, you can use the 'summary' tab to provide you with an overview of that data type, including the changes to that data over time. This data is dispayed in both graphical and tabular form

Figure 5. In addition to searching the data, you can summarize it by selecting the 'summary' tab.

In addition to searching the data, you can summarize it by selecting the 'summary' tab.

Editing Pages

Figure 6. Data pages can be edited online provided you have been given appropriate permissions. In addition to editing individual pages, there are taxonomy-specific workflows for creating new child taxa, synonymising accepted names and raising names out of synonymy. CATE tries to validate objects prior to saving them to the database.

Data pages can be edited online provided you have been given appropriate permissions. In addition to editing individual pages, there are taxonomy-specific workflows for creating new child taxa, synonymising accepted names and raising names out of synonymy. CATE tries to validate objects prior to saving them to the database.

One of the main functions of CATE is to allow you to manage your revision online and in collaboration with other taxonomists working around the world. You can edit all of the pages online. CATE uses a sophisticated page-flow framework for its editing pages that allow you to perform a number of steps within the same editing process rather than needing to remember to perform several steps in a specific sequence.

Apart from adding, editing and deleting individual pages, there are taxonomy specific workflows that guide you through the process of adding new taxa, synonymising accepted taxa or raising synonyms, recombining taxa and so on. Whilst it is by no means foolproof, it does try to detect problems when editing data and will try to give you an opportunity to correct these errors if it detects them.

CATE also provides the facility for bulk upload and import of certain types of data.

Developer API

Figure 7. CATE provides an application programmers interface that allows you to store data in the CATE site programmatically, or to access data in the CATE site from another software application. This is based on a simple RESTful protocol that exposes data as XML or JSON.

CATE provides an application programmers interface that allows you to store data in the CATE site programmatically, or to access data in the CATE site from another software application. This is based on a simple RESTful protocol that exposes data as XML or JSON.

CATE is designed to help you disseminate and publish your web revision to the widest possible audience. It provides access to your data in a number of ways. To integrate your web revision with other biodiversity informatics systems, it provides a developer API that allows access to much of the data in CATE. This is a RESTful API based on the EDIT (European Distributed Institute of Taxonomy) CDM (Common Data Model). Individual objects can be accessed using HTTP GET and retrieved in a number of formats, including XML and JSON. Objects can be manipulated using HTTP POST and HTTP DELETE. You can also use the restful Java DAO implementations in your CDM applications.

In addition to implementing its own API, CATE implements a number of other standard APIs including the Lifescience Identifier Resolution Service that provides Globally Unique, Persistent, Actionable identifiers for your data. It also implements the Open Archives Initiative Protocol for Metadata Harvesting to allow other systems to re-use your data. Full details of the external APIs implemented are given in a later section

Whats new in CATE 1.3?

The main improvement in CATE 1.3 is the addition of a bulk upload option for images. Users with the appropriate permissions are able to upload multiple images and a spreadsheet of metadata relating to those images (the caption, artist, copyright and so on). Behind the scenes, we've upgraded a number of core libraries to address performance and stability issues, including spring, hibernate, and the apache axis web services library.