Guest commentary: Bioinformatics research enables better biological information
With life sciences evolving so quickly, bioinformatics research is critical to building information infrastructures that can match that evolution, allow researchers to access data quickly and easily and develop new informatics tools as needed. It is no longer enough to generate the best measurement data; researchers also require the best meaning and insight from that data.
Until the early 1990s, biology researchers primarily studiedone or two genes at a time, often for their entire careers; now, they mightstudy hundreds of thousands of DNA segments in one experiment. In addition,researchers can probe complementary non-coding parts of the genome and profilerelevant proteins and metabolites to seek new understanding and treatments ofdisease. Ongoing bioinformatics research is critical to make sense of an almostlimitless supply of biomolecular information.
The National Institutes of Health (NIH) definesbioinformatics as "research, development or application of computational toolsand approaches for expanding the use of biological, medical, behavioral orhealth data, including those to acquire, store, organize, archive, analyze orvisualize such data." The NIH acknowledges the overlap with computationalbiology: "the development and application of data-analytical and theoreticalmethods, mathematical modeling and computational simulation techniques to thestudy of biological, behavioral and social systems."
Continuing electronic advancements of our increasinglydigital world open opportunities to analyze and leverage vast amounts ofinformation across research boundaries. For example, although physical analogmeasurements of a cell and integrated circuit are very different, processingthese measurement signals is similar once they're converted to the digitaldomain. Following analog to digital conversion, the tools to generate insightare the same: fast digital signal processing, measurement systems software,data management, visualization and decision analysis.
With life sciences evolving so quickly, bioinformaticsresearch is critical to building information infrastructures that can matchthat evolution, allow researchers to access data quickly and easily and developnew informatics tools as needed. It is no longer enough to generate the bestmeasurement data; researchers also require the best meaning and insight fromthat data.
Bioinformatics research enables new insights from platformsand analysis. Bioinformatics tools are required to create and evolvemeasurement platforms and data analysis methods to reach new biologicalinsight. Each new measurement can change the questions that scientists cananswer. Researchers start with a hypothesis about what they want to measure,and they understand initial measurements will be imperfect. They consider whatmathematical model will describe the behavior being measured, what the signalshould look like, how they will implement the measurement in a fast, efficientway, and ultimately how they will communicate their results to the worldwidescientific community.
Oligonucleotide DNA microarrays, for example, have dependedon bioinformatics for initial development and ongoing advances in measurementtools and analysis methods. Researchers started with a hypothesis that it could be possible tomeasure gene expression patterns in a tissue sample using complementary probesof nucleic acid polymers on a glass slide. By taking advantage of the DNAbuilding blocks that act like two sides of a zipper, each probe could test forthe presence of a specific DNA sequence
Initial microarray development required prior bioinformaticswork on sequencing before researchers could design the hundreds of thousands ofprobes for an array.
Bioinformatics research was essential to develop themathematical models, software, visualizations and processes to analyze the databecause initially, no tools were available.
Bioinformatics enabled extensions of microarray platforms toexplore new layers of biology, such as comparative genomic hybridization (aCGH)to identify multiple and missing pieces of chromosomes in cancer cells comparedwith normal cells. With each new platform or modification, researchers dependon bioinformatics for probe development, analysis methods and visualizationsoftware.
The process of platform development from measurement toanalysis often becomes iterative, as researchers push the envelope to get amore complete "picture" of the sample being analyzed. Experiments that yieldincomplete information often motivate efforts to improve platforms andalgorithms to achieve better-defined and more focused results. For example,early aCGH research indicated that even outside of cancer biology, genes arepresent at varying levels. This led to studies of copy number variation as akey element of genetic structural variation across multiple diseases.
Bioinformatics research has improved platform resultsindependent of instrument improvements. Mass spectrometers, for example, helpidentify proteins by matching the mass/charge of their constitutive peptide'sfragments against a database. New spectral clustering techniques and betterspectrum-to-peptide matching algorithms have improved peptide identificationwithout any change to the mass spectrometer.
Bioinformatics research enables integration of informationfrom multiple sources. Bioinformatics enables scientists to transcend data-typeboundaries and begin to view the cell as a system of complex interactions.Study of the whole provides insights into emergent systems-level behavior thatisn't visible by looking at individual genes and proteins separately.
Learning how a cell responds to stimuli requires integrationof data from multiple experiments and measurement platforms. In the cascade ofbiological events, proteins interact with receptors that also interact withproteins and genes. Integration of gene expression and protein networks, forexample, can reveal pathways that potentially govern disease progression and mighthelp identify people who would most likely respond to a particulartherapy.
Bioinformatics is becoming more collaborative as researchersintegrate information from multiple sources and put it into a contextual viewfor new knowledge and worldwide availability.
Cytoscape, a tool that encourages information sharing, is anopen-source bioinformatics software platform that enables researchers tovisualize molecular-interaction networks and integrate these interactions withexperimental data and other data from other sources, such as pathway databasesand scientific literature. Putting data into a biological context helpsincrease understanding of molecular networks, interactions and pathwaysinvolved in biological processes. Cytoscape allows users to query biologicalnetworks to derive computational models and to view, manipulate and analyzetheir data to reach biological insight.
The Cancer Genome Atlas Project to characterize molecularalterations in cancer is another example of integrating data from multiplesources. This collaborative effort, led by the National Cancer Institute andthe National Human Genome Research Institute, has demonstrated the feasibilityof using integrated genomic strategies.
Scientists are developing new data,sharing it with researchers worldwide, and developing innovative bioinformaticstools and technologies to study cancer with greater precision and efficiency.Findings already are influencing treatment; investigators have reported thatgenetic alterations in patients with glioblastoma (a form of brain cancer) arelinked with resistance to a drug that is commonly used for treatment.
Bioinformatics research is also expanding to computermodeling to simulate and calculate, for example, gene expression over time.Researchers are creating models with the goal of feeding data through them topredict how a living cell will react. Further research will involve validatingsuch models experimentally.
New directions in bioinformatics research include syntheticbiology and visual analytics. Synthetic biology is a growing field ofbioinformatics research. The redesign of biological systems and component partsfor useful and practical purposes has many parallels to the electronicsindustry. Standardized, integrated electronic parts, devices and tools haveenabled a well-developed, mature industry. Advocates of synthetic biologysimilarly champion development of tools and processes that will enablestandardized, integrated biological parts and devices to create syntheticgenomes. While synthetic biology requires a revolution in tools and technology,these approaches may address significant challenges in healthcare, energy andthe environment.
The Artemisinin Project uses synthetic biology to make safe,effective anti-malarial medicines accessible to people in developing countries.Representatives from academia, biotechnology, pharmaceutical and the nonprofitsector are developing semi-synthetic artemisinin because the natural source,tree bark, is too expensive for extensive use. Other synthetic biology effortsinvolve research to generate energy-rich fuels by engineering the enzymes thatare part of the pathways that create these molecules, insert them into bacteriaand grow them on a large scale.
Visual analytics is a new, emerging field. Although allsciences are improving the ability to collect and analyze information, newtools are required to analyze massive, complex, incomplete and uncertainworldwide information. IEEE recognizes this challenge and in 2006 founded theSymposium on Visual Analytics Science and Technology. It focuses on the R&Dagenda for visual analytics developed under the leadership of the PacificNorthwest National Laboratory to define the directions and priorities forfuture R&D programs focused on visual analytics tools.
lEEE defines visual analytics as "the science of analyticalreasoning supported by highly interactive visual interfaces. People use visualanalytics tools and techniques to synthesize information into knowledge; deriveinsight from massive, dynamic and often conflicting data; detect the expectedand discover the unexpected; provide timely, defensible and understandableassessments; and communicate assessments effectively for action." Thisinterdisciplinary science includes statistics, mathematics, knowledgerepresentation, management and discovery technologies, cognitive and perceptualsciences, decision sciences and more.
The advances in bioinformatics research in some waysparallel history. In the 16th century, Tycho Brahe collected precise measurementson the positions of planets. Johannes Kepler made Tycho's data more meaningfulby using it to develop his laws of planetary motion. Sir Isaac Newton extendedthe value further by developing principles of physics, such as universalgravitation and laws of motion.
While Newton's principles intuitively match everydayexperience, Albert Einstein's 20th century discoveries pushed science into thenon-intuitive realm. Now bioinformatics research is tackling data complexityand interrelatedness that describe a new world. We don't have the answers yet,but we are getting in touch with the right questions.
Bioinformatics research gives us new glimpses of processesthat have been going on for millions of years, the billions of molecular eventshappening in our bodies that enable us to function. People who will reduce thisdata to laws and principles to explain them will make great steps forward.
Darlene J.S. Solomon is chief technology officer for AgilentTechnologies in Santa Clara, Calif. She holds a B.S. degree in chemistry fromStanford University and a Ph.D. in bioinorganic chemistry from theMassachusetts Institute of Technology.