A team of bioinformaticians at the Friedrich Schiller University in Jena, Germany, led by Professor Sebastian Böcker, together with their collaborators from the Aalto-University in Espoo, Finland, have developed a search engine that significantly simplifies the identification of molecular structures of metabolites. They describe their search engine “CSI:FingerID” in a paper in Proceedings of the National Academy of Sciences of the United States of America (PNAS).
In this case CSI stands for Compound Structure Identification and is based on combining a variety of methods. To begin with, metabolite samples to be analysed undergo a tandem mass spectrometry run. “During this step, molecules are dismantled into smaller fragments and their molecular weights are identified”, Böcker explains. The resulting spectra give information about the chemical composition of metabolites, but this information is not yet adequate to draw conclusions about the molecular structure. This is where the newly developed search engine comes into play. It works in a similar way to an internet search engine, but instead of searching for keywords, the tool looks for molecular information which translates the given mass spectrum into a structural formula. After the mass spectrum has been submitted to the search engine, “CSI:FingerID” trawls a number of online molecular structure databases, where scientists throughout the world publish information and structural formulae of both newly discovered and long-known metabolites. A single “CSI:FingerID” search results in a list of possible candidate structures which best correspond to the spectrum.
“After obtaining the list of possible candidates we still don’t know with absolute certainty which metabolite we are dealing with. But when we can reduce the number of possible compounds from several thousand down to perhaps ten, then this is huge progress”, says Böcker. “Because precise lab tests to identify compounds can be expensive and time-consuming, so distinguishing among thousands of possibilities is usually impossible—but testing just ten compounds is often feasible.” And, as the relevant databases also grow constantly—with an average of ten entries being added per minute on a worldwide basis—the search results become consistently more precise.
The bioinformaticians show in this new study that they obtain a significantly higher hit ratio with their method than any other method that has been used so far. To this end, they have validated their search engine with more than 6000 test substances. As well as using “CSI:FingerID” themselves to analyse naturally occurring metabolites, Professor Böcker and his team have made the search engine freely available to the international scientific community at: http://www.csi-fingerid.org.