EVENTS | VIEW CALENDAR
Guest commentary: Bioinformatics research enables better biological information
Until the early 1990s, biology researchers primarily studied one or two genes at a time, often for their entire careers; now, they might study hundreds of thousands of DNA segments in one experiment. In addition, researchers can probe complementary non-coding parts of the genome and profile relevant proteins and metabolites to seek new understanding and treatments of disease. Ongoing bioinformatics research is critical to make sense of an almost limitless supply of biomolecular information.
The National Institutes of Health (NIH) defines bioinformatics as "research, development or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze or visualize such data." The NIH acknowledges the overlap with computational biology: "the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral and social systems."
Continuing electronic advancements of our increasingly digital world open opportunities to analyze and leverage vast amounts of information across research boundaries. For example, although physical analog measurements of a cell and integrated circuit are very different, processing these measurement signals is similar once they're converted to the digital domain. Following analog to digital conversion, the tools to generate insight are the same: fast digital signal processing, measurement systems software, data management, visualization and decision analysis.
With life sciences evolving so quickly, bioinformatics research is critical to building information infrastructures that can match that evolution, allow researchers to access data quickly and easily and develop new informatics tools as needed. It is no longer enough to generate the best measurement data; researchers also require the best meaning and insight from that data.
Bioinformatics research enables new insights from platforms and analysis. Bioinformatics tools are required to create and evolve measurement platforms and data analysis methods to reach new biological insight. Each new measurement can change the questions that scientists can answer. Researchers start with a hypothesis about what they want to measure, and they understand initial measurements will be imperfect. They consider what mathematical model will describe the behavior being measured, what the signal should look like, how they will implement the measurement in a fast, efficient way, and ultimately how they will communicate their results to the worldwide scientific community.
Oligonucleotide DNA microarrays, for example, have depended on bioinformatics for initial development and ongoing advances in measurement tools and analysis methods. Researchers started with a hypothesis that it could be possible to measure gene expression patterns in a tissue sample using complementary probes of nucleic acid polymers on a glass slide. By taking advantage of the DNA building blocks that act like two sides of a zipper, each probe could test for the presence of a specific DNA sequence
Initial microarray development required prior bioinformatics work on sequencing before researchers could design the hundreds of thousands of probes for an array.
Bioinformatics research was essential to develop the mathematical models, software, visualizations and processes to analyze the data because initially, no tools were available.
Bioinformatics enabled extensions of microarray platforms to explore new layers of biology, such as comparative genomic hybridization (aCGH) to identify multiple and missing pieces of chromosomes in cancer cells compared with normal cells. With each new platform or modification, researchers depend on bioinformatics for probe development, analysis methods and visualization software.
The process of platform development from measurement to analysis often becomes iterative, as researchers push the envelope to get a more complete "picture" of the sample being analyzed. Experiments that yield incomplete information often motivate efforts to improve platforms and algorithms to achieve better-defined and more focused results. For example, early aCGH research indicated that even outside of cancer biology, genes are present at varying levels. This led to studies of copy number variation as a key element of genetic structural variation across multiple diseases.
Bioinformatics research has improved platform results independent of instrument improvements. Mass spectrometers, for example, help identify proteins by matching the mass/charge of their constitutive peptide's fragments against a database. New spectral clustering techniques and better spectrum-to-peptide matching algorithms have improved peptide identification without any change to the mass spectrometer.
Bioinformatics research enables integration of information from multiple sources. Bioinformatics enables scientists to transcend data-type boundaries and begin to view the cell as a system of complex interactions. Study of the whole provides insights into emergent systems-level behavior that isn't visible by looking at individual genes and proteins separately.
Learning how a cell responds to stimuli requires integration of data from multiple experiments and measurement platforms. In the cascade of biological events, proteins interact with receptors that also interact with proteins and genes. Integration of gene expression and protein networks, for example, can reveal pathways that potentially govern disease progression and might help identify people who would most likely respond to a particular therapy.
Bioinformatics is becoming more collaborative as researchers integrate information from multiple sources and put it into a contextual view for new knowledge and worldwide availability.
Cytoscape, a tool that encourages information sharing, is an open-source bioinformatics software platform that enables researchers to visualize molecular-interaction networks and integrate these interactions with experimental data and other data from other sources, such as pathway databases and scientific literature. Putting data into a biological context helps increase understanding of molecular networks, interactions and pathways involved in biological processes. Cytoscape allows users to query biological networks to derive computational models and to view, manipulate and analyze their data to reach biological insight.
The Cancer Genome Atlas Project to characterize molecular alterations in cancer is another example of integrating data from multiple sources. This collaborative effort, led by the National Cancer Institute and the National Human Genome Research Institute, has demonstrated the feasibility of using integrated genomic strategies.
Scientists are developing new data, sharing it with researchers worldwide, and developing innovative bioinformatics tools and technologies to study cancer with greater precision and efficiency. Findings already are influencing treatment; investigators have reported that genetic alterations in patients with glioblastoma (a form of brain cancer) are linked with resistance to a drug that is commonly used for treatment.
Bioinformatics research is also expanding to computer modeling to simulate and calculate, for example, gene expression over time. Researchers are creating models with the goal of feeding data through them to predict how a living cell will react. Further research will involve validating such models experimentally.
New directions in bioinformatics research include synthetic biology and visual analytics. Synthetic biology is a growing field of bioinformatics research. The redesign of biological systems and component parts for useful and practical purposes has many parallels to the electronics industry. Standardized, integrated electronic parts, devices and tools have enabled a well-developed, mature industry. Advocates of synthetic biology similarly champion development of tools and processes that will enable standardized, integrated biological parts and devices to create synthetic genomes. While synthetic biology requires a revolution in tools and technology, these approaches may address significant challenges in healthcare, energy and the environment.
The Artemisinin Project uses synthetic biology to make safe, effective anti-malarial medicines accessible to people in developing countries. Representatives from academia, biotechnology, pharmaceutical and the nonprofit sector are developing semi-synthetic artemisinin because the natural source, tree bark, is too expensive for extensive use. Other synthetic biology efforts involve research to generate energy-rich fuels by engineering the enzymes that are part of the pathways that create these molecules, insert them into bacteria and grow them on a large scale.
Visual analytics is a new, emerging field. Although all sciences are improving the ability to collect and analyze information, new tools are required to analyze massive, complex, incomplete and uncertain worldwide information. IEEE recognizes this challenge and in 2006 founded the Symposium on Visual Analytics Science and Technology. It focuses on the R&D agenda for visual analytics developed under the leadership of the Pacific Northwest National Laboratory to define the directions and priorities for future R&D programs focused on visual analytics tools.
lEEE defines visual analytics as "the science of analytical reasoning supported by highly interactive visual interfaces. People use visual analytics tools and techniques to synthesize information into knowledge; derive insight from massive, dynamic and often conflicting data; detect the expected and discover the unexpected; provide timely, defensible and understandable assessments; and communicate assessments effectively for action." This interdisciplinary science includes statistics, mathematics, knowledge representation, management and discovery technologies, cognitive and perceptual sciences, decision sciences and more.
The advances in bioinformatics research in some ways parallel history. In the 16th century, Tycho Brahe collected precise measurements on the positions of planets. Johannes Kepler made Tycho's data more meaningful by using it to develop his laws of planetary motion. Sir Isaac Newton extended the value further by developing principles of physics, such as universal gravitation and laws of motion.
While Newton's principles intuitively match everyday experience, Albert Einstein's 20th century discoveries pushed science into the non-intuitive realm. Now bioinformatics research is tackling data complexity and interrelatedness that describe a new world. We don't have the answers yet, but we are getting in touch with the right questions.
Bioinformatics research gives us new glimpses of processes that have been going on for millions of years, the billions of molecular events happening in our bodies that enable us to function. People who will reduce this data to laws and principles to explain them will make great steps forward.