Sharkipedia is an open source research initiative that aims to make all observations and measurements of chondrichthyans (sharks, rays, and chimaeras) accessible in order to more rapidly advance resesearch. The framework behind the database is heavily based on the open source Coral Trait Database. Anyone collecting chondrichthyan life history data can join and contribute to the growing data compilation. Contributors have control over the privacy of their data, but will benefit from being able to download complimentary public data from the database in a standard format for use in their analyses. We hope that private data will become public once the contributor has published them, and a citation is available. The reference citation system here has been designed to ensure full transparency about the origin of each individual data point.
Contact the Administrators for any inquiries about the database at admin@sharkipedia.org.
Here, we describe the format of the data that can be entered in the template. Below, we describe Shark Traits data in detail.
The database currently accepts species-level measurements of chondrichthyans. Data must be associated with a species name (i.e., not genus- or family-level data) and public data must be associated with a published reference (e.g., paper, monograph, technical report, or book).
Data accepted:
Data not accepted:
Unpublished data can also be imported into the database if it is kept private. Private data can be made public once associated with a published reference. Benefits of submitting unpublished data include:
To contribute data, you need to email the database Administrator to become a contributor.
Having a primary, peer-reviewed reference is essential for maintaining data quality, contributor recognition and scientific rigor.
Even if only entering small amounts of data, use of the spreadsheet submission template is recommended. The fillable template can be accessed here. The spreadsheet import is the fastest way to get data into the database. The import function accepts csv-formatted spreadsheets and runs a number of tests to make sure your data fit the database formats correctly (note that you can export csv-formatted files from Excel using “Save as…”). Any errors will reject the entire import and the system will generat a log and contact the user by e-mail to indicate where the errors occur. If necessary, you will be able to fix these errors and try the import again. Errors may include incorrect formatting of the spreadsheet (e.g. not all required columns present) or missing values for required fields.
The spreadsheet you import must have a header with at least the following column names:
observation_id, access, user_id, reference_id, species_name, location_id, trait_name, standard_name, methodology_name, value, value_type, precision, precision_type, precision_upper, sample_size, notes
species_id
, trait_id
, method_id
, and model_id
can be included instead of species_name
, trait_name
, method_name
, and model_name
, respectively (or you can include both ids and names). Having the names can be useful for navigating large spreadsheets. These names must exactly reflect the names in the database, so it is best to copy and paste names directly from the database.
reference_id
is reserved for the original data reference (i.e., the paper that reports the original collection of the measurement). You can credit papers that compiled large datasets from the literature by adding a column named reference_secondary_id
. reference_id
and reference_secondary_id
may be substituted with reference_doi
and reference_secondary_doi
, respectively (the doi should start with “10.”, not “doi:”). The reference will automatically be added using Crossref if the doi is not already in the database.
We have collated all the references from shark-references.com and generated a unique reference_id
for each reference, so the easiest way to populate this field is by searching for the reference_id
in the reference list provided.
user_id
must be your own database user id (i.e., you cannot import data for other people). You can find your user id by clicking on your name in the top right corner and selecting “My Observations”.
Copy and paste the above header into a text file and save as import_trait_author_year.csv
, where author and year correspond with the reference (paper). Alternatively, download a CSV or Excel template.
The first six required columns are associated with the observation.
observation_id
is an unique integer that groups a set of measurements into one observation. In the example below, the first two rows belong to the same observation of a shark.
hidden
is a boolean value indicating if the observation should be accessible (0 denotes private and 1 denotes public). In the example below, the data are public.
user_id
is the unique ID (name or integer) of the person entering the data.
species_name
and/or species_id
is the unique name or ID of the species of which the observation was taken. IDs occur in grey to the left of species or at the top of a given species’ observation page.
marine_province
is the large marine regions (e.g. longhurst province) where the study was conducted. You can see a global map of provinces and associated names here
location_id
is the unique ID of the location where the observation took place.
reference_id
is the unique ID of the reference (paper) where the observation was published. reference_id
can be empty for unpublished data, in which case access
must be private (0) until the data are published and the published reference is entered.
The remaining columns are associated with measurement-level data. All measurements corresponding to the same observation should have exactly the same observation-level data.
Warning All measurements corresponding to the same observation should have exactly the same observation-level data. Use copy and paste or fill down to avoid making errors.
trait_name
and/or trait_id
is the unique name or ID (integer) of the species-level characteristic that was measured.
standard_name
is the unique ID of the standard (measurement unit) that was used to measure the trait.
method_name
is the unique ID of the methodology used to measure the trait.
model_name
is the unique name of the model being used for the given methodology to measure the trait.
value
is the actual measured value (number, text, true/false, etc.). If the value is an option of a categorical trait (e.g., growth form), then the value must exactly match the value options for the trait (e.g., massive).
value_type
describes the type of value. Current options are:
raw_value
for a direct measurement,mean
if the value represents the mean of more than one value,median
if the value represents the median of more than one value,maximum
if the value represents the maximum of more than one value,minimum
if the value represents the minimum of more than one value,model_derived
if the value is derived from a model,expert_opinion
if the actual value has not been measured directly, but an expert feels confident of the value, perhaps based on phylogenetic relatedness or an indirect observation,group_opinion
if the actual value has not been measured directly, but a group of experts feel confident of the value.precision
is the level of uncertainty associated with the value if it is made up from more than one measurement (e.g., mean).
precision_type
is the kind of uncertainty that the precision estimate (above) corresponds with. Current options are:
standard_error
standard_deviation
95_ci
range
precision_upper
is used to capture the maximum (upper) value if range is used (above).
sample_size
is the number of data points that were used to calculate the value. Leave this field blank if equal to one (e.g., a raw_value).
Optional fields include:
notes
is an optional field for reporting useful information about how the measurement was made.
dubious
is a optional Boolean field noting whether the data being reported is classified as dubious (1) or not (0 or blank) by the user entering the data.
validated
is a boolean field to be used only for Growth Trait Class measurements, indicating whether any validation of the growth model has been attempted (1) or not (0).
validation_type
denotes the method used for validating the growth model. At the moment, this is a text field with pre-determined values.
If your data is well-managed, you can ask a database programmer to upload it for you. The data will be associated with your name and made private. You are required to make the data public yourself (if desired).
Entering published data not already in the database in strongly encouraged to improve the data’s usability and augment data analysis. A case in which this might occur is a meta-analysis. The data contributor can keep the data they submit private until their study is published.
The key objective is to extract data from references in such a way as to avoid people ever needing to go back to the that reference again.
For example, extracting only the mean value of a trait measurement from a paper, without extracting any measure of variation or the context in which the trait was measured, will mean that the data may not be useful for other purposes. Someone else might need to go back and extract the information again, and there is a chance your initial efforts won’t be cited.
Primary references only. Often people enter data from summary tables in papers that come from other (primary) references. It is important to enter the data from the primary reference for two reasons: (1) so that the primary reference’s author is credited for their work, and (2) to avoid data duplication, where the same data are entered from both the primary and secondary reference. Secondary references, such as meta-analyses, can be credited for large data compilations.
Careful extraction. Copy values from tables carefully and double check. Extracting data from figures can be done with software like ImageJ or DataThief, where a scale can be set based on axis values and measurements of plotted data made, including error bars.
There are three levels of data review.
Contributor-level
review at time of submission, Once submitted, data are tagged as pending.Editor-level
review. The relevant Editor/s for traits in your submission are automatically notified by email. The contributor may be contacted by the Editor if there are any issues with the submission. The Editor will approve the submission once satisfied.User-level
review. Anyone signed-in as a database user can report an issue with an observation record, and the submitter and the Editor will be notified by email.The import step involves a series of checks by the computer. This basic error checking will ensure data submissions fit into the database format. Measurement records with the same species, location, reference and value will be flagged as potential duplicates. We expect error checking at this step will evolve and improve as different issues arise. Once past this step, uploaded data contributions will be reviewed by a curator to ensure the database contains high-quality, usable information.
The Shark Traits Database is a research tool, not a meta-data repository.
A meta-data repository captures dataset-level
information about your data set, so that people can easily find it. Examples of meta-data repositories include DRYAD, Ecological Archives and Figshare. You are encouraged to submit data sets to meta-data repositories to help ensure their longevity and the reproducibility of the results for which the data were originally collected.
The Shark Traits Database captures data-level
information so that measurements from multiple data sets can be integrated, extracted, compared and analyzed.
One way to think about The Shark Traits Database is as a very large data-set being cobbled together by the chondrichthyan research community for everyone to use, avoiding redundant efforts, identifying knowledge gaps, informing management, and speeding up science.
The data are organized by observation
. Observations index related trait measurements of the same population. For example, estimates of age at maturity for males and females of the same population results in one observation
with two measurements
(each corresponding to a different trait of the population).
Observation-level data include the chondrichthyan species, location and reference identity (i.e., reference). These data are the same for all measurements corresponding to the observation. When entering or importing trait data, the following observation-level data are minimally required:
1All observations must be associated with a valid species name. We are using the taxonomy of Weigmann 2016. Note that this taxonomy does not currently include all the taxonomic nomenclatural revisions in Naylor et al 2016. The list of valid species names is built into the spreadsheet template and can be accessed as a dropdown menu. First select the appropriate `species_superorder` (Chimaeriformes, Batoidea, Squalimorph Sharks, or Galeomorph Sharks) and a reduced taxa set to select from will be provided in the `species_name` column. If you cannot find the species name then you can select `Recently described/Novel species` from the drop down menu and include the Latin binomial name in the notes column.
2Location information can be entered in several formats/columns: (1) `marine_province` is restricted to the designated marine Longhurst Provinces, (2) `location_name` is for general locations provided in the study methods (e.g. Southern Calfifornia Bight, Northern Gulf of Mexico), and (3) `lat` and `long` coordinates if available.
3The reference ID should be taken from the unique ID found in the references database. This is a web application that contains all materials compiled on Shark-References.You can filter the reference databse by year and/or search for author, keyword, species, journal, etc. The reference can be left blank for unpublished data, but those contributions must be kept private.
Measurement-level data include the sex, trait, value, standard (unit), methodology, estimates of precision (if applicable), and model used. When entering or importing trait data, the following information is minimally required:
The database was designed to contain `population-level` characteristic measurements, which are trait estimates of a species at a given location. These traits are grouped into six trait-classes:
Trait | Description |
---|---|
Lmat50 | Length at 50% maturity |
Lmat95 | Length at 95% maturity |
Length at first maturity | Length of smallest mature individual based on gonadal observation (production of gametes) or clasper calcification. |
Length of largest immature | Length of largest immature individuals based on gonadal observation (production of gametes) or clasper calcification. |
Length at maternity | Length at first observation of pregnancy |
Lmax-observed | Maximum observed length |
Lmax-estimated | Maximum length estimated from model |
Lbirth | Length at birth |
Standard | Description |
---|---|
cm TL | Total length in centimeters |
cm FL | Fork length in centimeters |
cm DW | Disc Width in centimeters |
cm CL | Chimaera length in centimeters |
cm PCL | Pre-caudal length length in centimeters |
cm BDL | Body length in centimeters |
cm PSCFL | Pre-supra-caudal fin length in centimeters |
Methods | Description |
---|---|
Back-calculated | Estimated from a model output (Model should be specified) |
Directly observed | Length directly measured from individuals |
Histologically | Length observed from tissue sections |
Macro-dissection | Length observed during dissection (e.g. near term pups) |
Smallest free-swimming individual | Observation of smallest free-swimming individual |
Largest Embryo | Observation of largest embryo in utero |
Free + Embryo | Based on combined observations of largest embryo and smallest free-swimming individual |
Embryo growth curve | Estimated from a model output (Model should be specified) |
L95 | 95th percentile of observed size distribution |
L99 | 99th percentile of observed size distribution |
Max observed size | Observation of largest individual |
Inverse mortality | Estimated from a model parameter (Model should be specified) |
Models | Description |
---|---|
Back-calculated | Estimated from a model output (Model should be specified) |
Directly from growth curves | Estimated from growth curve (Model should be specified) |
Observational | Directly Observed |
Logistic | Estimated from logistic model or ogive |
Trait | Description |
---|---|
Amat50 | Age at 50% maturity |
Amat95 | Age at 95% maturity |
Age at first maturity | Age at first maturity based on gonadal observation (production of gametes) or clasper calcification |
Age of largest immature | Age largest immature individual based on gonadal observation (production of gametes) or clasper calcification |
Age at maternity | Age at first observation of pregnancy |
Amax-observed | Maximum observed age |
Amax-estimatedy | Maximum age estimated from model |
Standard | Description |
---|---|
Year | Age in years |
Methods | Description |
---|---|
Back-calculated | Estimated from a model output (Model should be specified) |
Directly observed | Observed from known age individual (typically in captive environment or recapture) |
Histologically | |
Macro-dissection | |
Smallest free-swimming individual | |
Largest Embryo | |
Free + Embryo | Based on combined observations of largest embryo and smallest free-swimming individual |
Embryo growth curve | |
L95 | 95th percentile of observed size distribution |
L99 | 99th percentile of observed size distribution |
Max observed size | Age estimated from maximum observed size and growth curves |
Inverse mortality | Estimated by the inverse of natural mortality |
Mark-recapture | From mark-recapture data |
Models | Description |
---|---|
Back-calculated | |
Directly observed | |
Observational | |
Logistic | |
Given that this trait class deals with model outputs, there is only a single placeholder Standard as the units of each Trait are implicit.
Trait | Description |
---|---|
k | von Bertalanffy growth parameter |
Linf | von Bertalanffy asymptotic size parameter |
L0 | von Bertalanffy length-at-age zero parameter |
t0 | von Bertalanffy age-at-length zero parameter |
t1 | Lester model |
Tmat | Lester model |
H | Lester model |
G | Lester model |
sigma-growth | error estimate of the growth model |
cv_readers | Coefficient of variation between readers |
APE | Average Percent Error used to estimate variability in age determination between readers |
PA | Percent agreement between readers |
Standards | Description |
---|---|
Coefficient | Coefficient value from model output (Model must be specified) |
Method | Description |
---|---|
Mark-recapture | Age estimated from recapture of known age and time and liberty individuals |
Length-frequency | Growth determined by length-frequency distribution of population |
Vertebrae | Growth determined using age calculated from vertebral growth bands |
Spines | Growth determined using age calculated from spine growth bands |
Tooth Plates | Grwoth determined from age calculated from tooth plates |
Direct Age | Estimates from individuals of known ages |
Captive | Data from captive individuals |
Model | Description |
---|---|
VBFG2 Strong Bayes | Bayesian implementation of 2 parameter von Bertalanffy growth curve with strong priors |
VBFG2 Uninformed Bayes | Bayesian implementation of 2 parameter von Bertalanffy growth curve with weak/uninformative priors |
VBFG2 ML | Maximum Likelihood implementation of 2 parameter von Bertalanffy growth curve |
VBFG3 Strong Bayes | Bayesian implementation of 3 parameter von Bertalanffy growth curve with strong priors |
VBFG3 Uninformed Bayes | Bayesian implementation of 3 parameter von Bertalanffy growth curve with weak/uninformative priors |
VBFG3 ML | Maximum Likelihood implementation of 2 parameter von Bertalanffy growth curve |
Gomp Strong Bayes | Bayesian implementation of Gompertz growth curve with strong priors |
Gomp Uninformed Bayes | Bayesian implementation of Gompertz growth curve with weak/uninformative priors |
Gomp ML | Maximum Likelihood implementation of Gompertz growth curve |
Fabens | |
Francis | |
Other |
Trait | Description |
---|---|
Ovarian fecundity | Maximum number of visible ovarian follicles (generally used in egg-laying species (e.g. skates)) |
Uterine fecundity | Max number of visible ovulated eggs or developing embryos in both embryos |
Annual reproductive output | The total number of offspring or biomass a mother produces per year |
Litter size (unspecified) | Number of ova or developing embryos |
Breeding | Observed timing of mating |
Ovulation | Observed timing of ovulation |
Parturition | Observed timing of parturition |
Incubation length | The length of incubation in months for egg-laying species |
Gestation length | The length of gestation in months for live-bearing species |
Breeding interval | Biannual, Annual, Biennial, Triennial |
Ovum size | Maximum observed ova diamter (uterine or recently ovulated) |
Offspring mass | The body mass of offspring at the time of birth or hatching |
Max oviducal width | Maximum width of the oviducal gland during the reproductive cycle |
Max uterine width | Maximum uterus width during the reproductive cycle |
Single uterus | Do females have only a single functioning uterus? Typically both are functional (Logical: Yes/No) |
Seasonal | Does reproduction occur seasonally (logical:Yes/No) |
Peak parturition | The month of peak parturition (hatching or live-bitrh) |
Embryonic sex ratio | In utero ratio of Males:Females within a litter |
Standard | Description |
---|---|
Number | Number |
mm | millimetres |
g | grams |
Month | Number of months for length traits (e.g. Incubation or Gestation Length) or Calendar month for phenological traits (e.g. breeding, ovulation, parturition) |
Year | Years for breeding interval (0.5 for biannual, 1 for annual, 2 for biennial, 3 for triennial) |
Logical | Yes/No traits |
Method | Description |
---|---|
Captive Observation | Observations from captive individuals |
Modeled | Estmimated from model output (Model must be specified) |
Proportion Gravid Females | Breeding interval determined by the proportion of pregnant females in a population |
Macroscopic observation | Observation through macroscopic dissection or observation of free swimming individuals |
Trait | Description |
---|---|
Natural mortality | M |
Total mortality | Z |
Fishing mortality | F |
rmax | Maximum intrinsic rate of population increase |
λ | Population growth rate |
Standards | Description |
---|---|
Coefficient | Coefficient value from model output (Model must be specified) |
Method | Description |
---|---|
Empirical | Estimated directly from abundance/catch data (e.g. stock assessments, catch curves) |
Derived | Estimated indirectly from life history traits (e.g. Hoenig 1983, Chen and Watanabe 1983 mortality estimators) |
The Relationships Trait Class is a somewhat “catch-all” category that encompasses any time of conversion equations between different types of data, such as length to length conversions (e.g. fork length to total length), and length to weight conversions. The parameters a and b are the most commonly used and represent the coefficients on the typical conversion equations of the form y = a * xb. The parameters c and d will probably be seldom used, but are presented in case a conversion equation has more than two coefficients.
Trait | Description |
---|---|
a | Multiplier of the equation (intercept if in log scale) |
b | Exponent of the equation (coefficient if in log scale) |
c | |
d | |
sigma-conv | Error term of the equation |
Standards | Description |
---|---|
Coefficient | Coefficient value from model output (Model must be specified) |
Model | Description |
---|---|
Length-Length | Length to length conversion |
Length-Weight | Length to weight conversion |