Raiders of the lost chemistry

Database to plumb depths of academic research.

Randall C Willis
MALDEN, The Netherlands—Researchers at the Selected Organic Reactions Database (SORD) and Advanced Chemistry Development (ACD/Labs) announced a collaboration to harvest the contents of academic theses and dissertations worldwide. In the process, they hope to open a whole history of chemi­cal reaction data to the global scientific com­munity.
 
"Worldwide, there are tens of thousands of theses detailing millions of syntheses," explains Dr. Antony Williams, ACD/Labs CSO. "This work represents the collective efforts of thousands of man-years of innova­tion, intellect and advances in science."
 
"We did a full analysis of one of the biggest compound databases with 9 million entries and arrived at less than 500,000 compounds that you could begin to describe as 'medici­nally interesting'," says SORD CSO Dr. Dick Wife. "Then you make a simple calculation about how many compounds have been pre­pared in universities over the last 40 years and you come up with a figure around 50 million. So, 40 million compounds for one reason or another never got published."
 
To achieve their goals, SORD scientists are relying on infrastructure support from ACD/Labs, which has an extensive portfo­lio of tools for the handling and sharing of chemical data. But the company is also likely to provide support in terms of added value to the data SORD scientists catalogue.
 
"Since ACD/Labs develops algorithms for structure-based prediction of physi­cochemical properties, for nomenclature generation and for spectral prediction, it is likely that ACD/Labs will extend the data content of the SORD database to provide access to a selection of these properties," Williams adds.
 
SORD is being developed at a time when numerous groups are expanding the repertoire of chemical data repositories. For example, aside from commercial repositories like Chemical Abstract Services and Elsevier MDL's CrossFire Bielstein, the NIH recently estab­lished PubChem, with the goal of providing information about the biological activities of small-mol­ecule compounds.
 
According to Wife, SORD is dif­ferent from and yet complementa­ry to these efforts, but rather than just being digital, the information in SORD is electronic and can therefore be searched with mod­ern datamining techniques.
 
Initially, the repository is being filled with historical data that is of particular interest to pharma­ceutical companies, according to Wife, including reactions that offer good yield, no metal contamina­tion, easy isolation or separation, and little or no waste products. He expects the project to surpass one million records within the first five years.
 
The longer term goal, however, is to have a system that becomes relatively self-sustaining, Williams suggests.
 
"Our hope is that the value the database delivers to the scientific committee will catalyze an interest in contributing data to the system on an ongoing basis and not nec­essarily await the publication of a specific thesis," he says.
 
"The payback for ACD/Labs is clear," he adds. "Continued expo­sure to the academic community, ongoing feedback regarding expec­tations, and an opportunity for pride as we continue to deliver an impact to scientists worldwide."

Randall C Willis

Subscribe to Newsletter
Subscribe to our eNewsletters

Stay connected with all of the latest from Drug Discovery News.

Front Cover

Latest Issue  

• Volume 18 • Issue 12 • December 2022/January 2023

December 2022/January 2023

December 2022/January 2023 issue