Next-gen data dynamic duo
IBM and CLC bio pair up to deliver combined turnkey genomics sequencing analytics solution
The Boston venue was appropriate, not simply because CLCbio's Americas headquarters is based in neighboring Cambridge, Mass., butbecause, in a sense, the Bio-IT World conference is why the two companies gottogether several years ago.
"IBM had not participated in Bio-IT World for a number ofyears. During our first year of return to Bio-IT, three years ago, we surveyedthe booths in an effort to learn more about customers' interests," recallsJanis Landry-Lane, director of World-wide Technical Computing at IBM. "Wefollowed the crowds and came upon the CLC bio booth. We were intrigued by theirdemonstrations and learned more from discussions with them and their website.We decided that CLC bio was a company that we really needed to be engaged with.We were starting fresh at Bio-IT World, we wanted to do something interestingand valuable, and so we started our partnership discussions."
"IBM had identified that life sciences was an area theywanted to invest in, and a lot of big data was already coming out of next-gensequencing technologies then," adds Lasse Görlitz, vice president ofcommunications for CLC bio. "Back then, being at Bio-IT World was more anexploratory thing for them, and one of the things they did other thanexhibiting themselves and having discussions with visitors was to go around andvisit the other exhibitors. They were intrigued by the number of people we wereattracting to our booth, and wanted to find out why."
At Bio-IT World 2012, the twocompanies presented the optimized performance of CLC Assembly Cell for genomicssequence analysis, which leveraged IBM's high-performance file system andcluster, and they were able to do referenced-based mapping at 37x coverage in13.5 minutes.
"As we worked together, thesynergy was evident; CLC bio has some of the best-of-breed genomics software,IBM has deep systems optimization skills. With our combined efforts, weproduced remarkable results," Landry-Lane says. "In 2013, our joint effort ledto a turnkey solution for bioinformatics. We built the CLC Genomics SequencingAnalytics Solution with our optimized IBM hardware sized appropriately for small,medium and large workloads and delivered to customers with CLC bio's latestversion of Genomics Workbench and Server."
The combined platform announced this year at Bio-IT World2013 is a scalable end-to-end solution that integrates a computing cluster builton advanced IBM hardware, CLC Genomics Server software for large-scale genomicssequencing data analysis and CLC Genomics Workbench client software foranalyzing, comparing and visualizing high-throughput sequencing data, the twocompanies noted in the news release about the collaborative effort.
"One of the really nice things about this collaboration isthat they've had many years of experience with elaborate IT setups at big andcomplex institutions," Görlitz says of IBM. "Meanwhile, we're really good atmaking bioinformatics software, but we don't have that big data experience.Both partners bring something to the table that the other doesn't have and thatthe customers want."
Market forces play a huge role indriving the need for building a solution like the combined IBM-CLC biooffering, Landry-Lane explains. First of all, many new institutions are nowengaged in next-generation sequencing because of its promise to deliver betterhealthcare. The costs of sequencing have come down dramatically and the time tosequence has been reduced to a day or less, she notes, and the flood of datagenerated by the sequencer must be processed in a timely fashion so that themaximum utilization of sequencers can be realized. This puts a demand on the ITsolution to support this environment.
A second strain on the system isstoring all of the data generated and analyzed, she adds.
"Scientists want to keep files forfuture reference, and these are very costly to keep online," Landy-Lane notes."With our integration of tape storage into the file system and the use of ourinformation lifecycle management that is policy-driven with hierarchicalstorage and data access, we can seamlessly tier stored data on both disk andtape, and researchers can store and retrieve files from the system regardlessof the storage medium."
But the work the companies have pursued in the past andcontinuing into the present is more than simply showing off their technicalchops, and is now a very important business arrangement, Landry-Lane says.
"Aside from the technical aspect,the legal agreement between our organizations was very important," sheexplains. "We have formalized world-wide agreements regarding joint marketingand initiatives. The CLC bio teams have been great collaborators. They areresponsive and enthusiastic about what we're doing. It is all about synergy;there is no overlap or redundancy in our work together. They've been awonderful independent software vendor to work with—we couldn't do this alone.None of this would have happened without their technology to drive this."
"By combining our world-leading bioinformatics software withIBM's excellent hardware and many years of expertise in setting up andsupporting elaborate IT systems, we're delivering a powerful turnkey analysisplatform, which will enable institutions and scientists to handle the demandsof high-throughput sequencing data analysis," said Mikael Flensborg, directorof global partner relations at CLC bio, in an official statement.
According to the two companies, the cluster compute nodesare IBM System x 3550 M4 rack servers powered by Intel Xeon E5-2650 processors.The nodes are connected to an IBM Storwize V7000 Unified network attachedstorage system, which consolidates block and file workloads. Storwize V7000Unified systems support file data storage using the IBM General Parallel FileSystem (GPFS). With GPFS, CLC bio software is leveraging a shared-disk filemanagement solution designed to provide fast, reliable access to next-gensequencing data for optimizing performance. The turnkey analysis platform comesin three different configurations, ranging from 48 CPU cores and 192 GBs ofmemory to 192 CPU cores and 768 GBs of memory, depending on the analysisrequirements of the individual customer.