More bang for their buck

Cornell researchers partner with Ocarina Networks for cost-efficient data compression and storage

Amy Swinderman
ITHACA, N.Y.—Seeking to maximize the storage capacity of its life science research data, the Cornell Center for Advanced Computing (CAC) has partnered with Ocarina Networks, an online storage optimization solution provider, to compress and store its next-generation data sets.

Under the partnership agreement, Cornell will perform extensive data compression testing across a wide range of research applications—now stored on data infrastructure provider DataDirect Networks' (DDN) Silicon Storage Architecture (S2A) 9700, a high- performance storage platform—using Ocarina's ECOSystem, which reads stored files and uses content-aware compression and de-duplication to reduce the amount of space those file take. With next-generation sequencing techniques producing vast quantities of data that must be quickly processed and stored online for a certain period of time, the university is in need of a cost-efficient, scalable data solution—and Ocarina offers a space savings of 50 percent or more, says Dr. David Lifka, Cornell CAC director.

"We were looking for ways to leverage our research dollars, as a lot of academic institutions are these days," Lifka says. "Despite advances in disk technology, storing research data remains an expensive proposition. New breakthroughs in content-aware compression and de-duplication are making it possible for data sets to be reduced soon after they come off scientific instruments and have been analyzed,. Technologies with efficient algorithms are becoming an essential component in data-intensive computing system deployments. We are working with Ocarina and DDN to effectively maximize storage capacity without sacrificing performance. With Ocarina, we get more storage per dollar, more bang for our buck."

Ocarina's ECOSystem includes multiple data compressors for the types of files commonly found in research computing environments and includes over 100 algorithms that support 600 file types. Testing is occurring on DDN's S2A9700 technology, which can manage up to 1.2 petabytes in only two floor tiles and deliver sustained throughput of up to 6 gigabytes per second for both writes and reads, per appliance. The combination of DDN's storage systems with Ocarina's ECOSystem will bring a new level of efficiency to the data center, says DDN CTO Dave Fellinger.

"New data reduction technologies hold great promise for helping to get the most out of storage systems and reducing overall operating expenses," Fellinger says.

Carter George, vice president of products at Ocarina, points out that because Ocarina keeps all versions of data files, researchers can re-visit the compressed data to re-analyze it as needed or as new algorithms are developed.

"Researchers collect all of their data from mass spectrometers, gene sequencers or next-generation tools and analysis solutions, and no one wants to throw it away because you never know if your data could turn out to be the next breakthrough," George says. "The amount of data they are collecting it outpacing their ability to store it, and the more data you keep, the more it costs you for storage. With us, you can take data that is already processed, park it, and if two years from now, someone comes out with a Nobel Prize-winning algorithm, you can re-visit it. With us, you get to keep more of the data, and save money, power and cooling."

Ocarina has provided application-aware storage optimization solutions for government entities as well as the film and entertainment, energy and Internet media industries, so the company considers the Cornell project to be a bellwether one for the life sciences industry, George says. Ocarina is in discussions with several other academic institutions to provide similar solutions, but the company would also like to reach out to commercial entities as well, he says.

"We're seeing the most immediate demand for our services from the genomics field," George says. "One of the things we are trying to do is partner with the people from whom these researchers get their equipment to understand why they create certain things, and work with the scientists to understand the meaning of the patterns they see in their files, eventually partnering with both to help create better algorithms."

Amy Swinderman

Published In:


Subscribe to Newsletter
Subscribe to our eNewsletters

Stay connected with all of the latest from Drug Discovery News.

November 2022 Issue Front Cover

Latest Issue  

• Volume 18 • Issue 11 • November 2022

November 2022

November 2022