‘Breathing life’ into analytical data
A Q&A with Ryan Sasaki, director of global strategy at ACD/Labs, and Steve Thomas, an investigator at GlaxoSmithKline
Advanced Chemistry Development Inc. (ACD/Labs), a chemistry software company that develops and commercializes enterprise and desktop solutions to support R&D efforts, recently sponsored a webinar titled, “Breathing Life into Your Analytical Data.” Co-hosting the event were Ryan Sasaki, director of global strategy at ACD/Labs, who discussed how to avoid the “one-and-done” data lifecycle and achieve better returns from analytical data generation, and Steve Thomas, an investigator at GlaxoSmithKline (GSK) who shared how their DMPK group is successfully using analytical data to improve global healthcare collaboration. To explore the issues discussed in the webinar, DDNews posed several questions to the pair.
DDNews: Why is the phenomenon of a “one and done” data lifecycle, which leads to massive amounts of dead data, still operative in the midst of a thriving informatics segment?
Ryan Sasaki: The major reason is because organizations and their research groups are dealing with many, many different types of disparate data generated by different instrument manufacturers. Because all of this data exists in different data formats, it becomes difficult to handle for most of the established informatics systems like ELNs, LIMS and SDMS simply because those systems were not designed to store, for example, analytical data in a live, homogenized and structured fashion. As a result, these data get locked down in silos that become difficult or nearly impossible to find and access. In some laboratory environments, sample permitting, it has become a lot easier to simply reacquire new data after several minutes of searching for legacy data unsuccessfully.
Steve Thomas: Although we have embraced change in GSK, it is understandable that some may adopt an “If it ain’t broke, don’t fix it” paradigm where fear of the dip in productivity going through the change curve obscures a clear view of the eventual benefits.
DDNews: What are the keys to creating live data based on so-called unified laboratory intelligence?
Thomas: The power of holding clusters of related chemical structures together with vendor-neutral analytical data from differing techniques and associated metadata mimics the way an analyst does their job. This then facilitates embedding such a technique painlessly into an existing workflow.
Sasaki: I agree with Steve. The introduction of “chemical context” to the data is crucial. Data by itself is just information. Add the chemical context to it, like a biotransformation map, and that information now becomes knowledge. Make that knowledge accessible to many scientists throughout GSK, and it now becomes corporate intelligence.
DDNews: How does ACD/Labs’ approach differ from others in the informatics field?
Thomas: At the time of selection, due diligence revealed ACD/ChemFolder Enterprise (now re-branded as ACD/Spectrus) to be the only product capable of storing, searching and sharing analytical and metadata linked to structures in a biotransformation map.
Sasaki: In Steve's context he nailed the differentiation in a DMPK environment. But this integration of chemical and analytical data with metadata in a “live” fashion has really helped us offer a similarly unique product in the informatics field for other applications such as impurity resolution management, preformulation and formulation studies, etc.
DDNews: In differentiating “insight” from “information,” how would you describe that distinction?
Thomas: I see “insight” as synonymous with knowledge— the value you place on your information and past outcomes to successfully drive future actions. Learn from your successes (and failures) to make better and faster decisions.
Sasaki: Let me add an example to Steve's explanation. Today, it is very common for scientists to retrieve some old raw data from an archiving system based on a sample ID, for example. However, at the end of the day all it really represents is proof that this data was collected in association with that sample. That's information. However, I think the key driver in Steve's lab at GSK was to have the ability to search for that same information but with insights built-in. So when someone from Steve's group searches the ACD/Labs database, not only do they see the data associated with that sample, but they see the chemical context and insights associated with that data. For example they know this data was associated with a given metabolite from a different project. And more importantly, the fate of that metabolite can be tracked via the biotransformation map along with its associated data.
DDNews: What are examples of “moderate improvements” that lead to “substantial returns” mentioned during the webinar?
Thomas: A pooled database of knowledge circumvents the luck required to bump into the exact person who could help with an analysis. Numerous times, easy access to colleagues’ findings have added confidence to my own conclusions, or provided additional considerations when analyses have proved tricky.
Sasaki: I believe that databasing in a laboratory environment is about 60 percent technical and 40 percent psychological. What I mean by that is that obviously the informatics solution is important. The GUI has to be straightforward and easy to use. In order for someone to buy into the concept of databasing, the workflow built within the software needs to require as few clicks as possible. I feel that we have really done a good job of delivering a pretty seamless workflow that enables scientists to easily database their analytical data complete with chemical context. However, my experience also suggests there is a psychological element that holds back organizations from building databases and leveraging them. I think Steve will agree that it really requires a team effort and support from management. I have seen a lot of one-man efforts fail because of lack of support and buy-in.
DDNews: What percentage of R&D organizations currently lack adequate systems to collect, report and analyze data?
Sasaki: [A 2011 survey] revealed that despite the emergence of laboratory informatics technologies like ELN, LIMS and SDMS, 88 percent of R&D organizations still lack adequate systems or practices to automatically collect data for reporting, analysis and decision-making. This is a disappointing result to see, but I think one should also consider the rather generic use of the term “data.” I don't think there is one Holy Grail “big data” for all types of data. Some of the best strategies are likely to be those that address needs in the context of sub-disciplines. I don't think the same data management strategies can be applied for clinical data vs. analytical data, for example.
DDNews: How much time do R&D organizations currently spend on data interpretation?
Sasaki: A 2012 report from the International Data Corporation suggested that the average employee spends an alarming 21 hours per week searching for, analyzing and reporting information. They equated the cost of this to $33,000 per year, per employee.
DDNews: Describe the difference between data, metadata and knowledge.
Thomas: While embedding a functional databased approach for globally storing, sharing and searching metabolic data, it became clear that some people use the terms data, information and knowledge indiscriminately and interchangeably. To me they form an ascending scale of value and context. The metaphor I have used is the chance spotting of an old school classmate while shopping. This facial recognition represents data. The value is increased by information or metadata that begins to fill in the picture. You remember his dog’s name and how his daughters are doing in school. Knowledge is how you recall that he is dreadfully dull! And so you quickly duck into a store to avoid him. You have used your rich knowledge to guide your future actions to a preferable outcome.
DDNews: What are the important criteria for defining success?
Thomas: How easy is it to get the data into the database? Problems here will lead to low compliance and incomplete data entry. Can it help us move away from relying on the power of memory? Hopefully a given for an effective database. Can we speed up the process of interpretation? Reuse historical analyses to avoid reinventing the wheel. Can we have greater confidence in the elucidations and structural ID? Precedent in legacy analyses adds to confidence. Can it help us avoid mistaken elucidations? Spectral assignment benefits from expert tools and avoids transcription error. Will the proposed solution offer advantages in communication of data? Sharing of live data for peer review is far more flexible than communication via paper or PDF.
ACD/Labs is a Canadian chemistry software company based in Toronto, Ontario, that develops and commercializes enterprise and desktop solutions to support R&D efforts and preserve and reuse legacy chemical and analytical knowledge. ACD/Labs’ areas of expertise include a unique knowledge management solution; spectroscopic data processing and interpretation for NMR, MS, LC/MS, IR, UV and other instrument techniques; chemical structure confirmation, verification and elucidation; impurity, metabolism and degradation research; ADMET, physicochemical property prediction, and property-based lead optimization; chromatographic method development and optimization; and chemical nomenclature.
GSK is a science-led global healthcare company that researches and develops a broad range of medicines and brands, with its three primary areas of business being pharmaceuticals, vaccines and consumer healthcare. The company is headquartered in the United Kingdom but has offices in more than 115 countries, major research centers in the U.K., United States, Spain, Belgium and China, and an extensive manufacturing network with 87 sites globally.