A comprehensive public database of secondary metabolites from cyanobacteria
One major challenge associated with studying cyanobacterial secondary metabolites is access to a comprehensive publicly available list of known metabolites including information of their chemical structures. It was our motivation to create a comprehensive database to facilitate dereplication studies and chemical profiling. The result is CyanoMetDB, a highly curated, flat-file, openly-accessible database of more than 2000 cyanobacterial secondary metabolites. Our efforts have nearly doubled the number of entries with complete literature metadata and structural composition information compared to previously available open access databases (until 2019). While information from commercial databases of secondary metabolites is only accessible to paying customers, several open access databases exist but were limited in terms of the number of cyanobacterial metabolites or parameters listed.
The work on CyanoMetDB was initiated at the 11th International Conference of Toxic Cyanobacteria (ICTC, 2019, Krakow, Poland) with the desire to have one comprehensive list of cyanobacterial metabolites to promote effective analysis and interchange of information. In 2019 and 2020, we have manually collated and evaluated disparate resources including 850 primary research articles published between 1967-2020. Publication trends in the field suggest that the discovery of cyanobacterial metabolites is still on the rise with up to 100 new compounds identified every year. For each compound, we include the primary literature metadata, sample type and whether nuclear resonance spectroscopy was used. The metabolites span over a wide range of molecular weights, between 118 and 2708 Da. We generated structural identifiers to represent the 2D molecules and recommend to use the simplified molecular input line entry system (SMILES) strings. Particularly the structural codes were often missing, incomplete or not standardized in previous sources and needed manual curation. One strong recommendation we like to make to the authors of future publications is to always publish a SMILES together with new structures that allows to import this information efficiently and with less chances of error.