Special Report: Neuroscience
On good behavior
To translate to humans, neurobehavioral models must first translate to animals
By Randall C Willis
A lab-coated intern excitedly corrals her lab mates to share her great discovery. She carefully places a housefly in the middle of a cleared space and asks everyone to observe. Suddenly, she claps two pans together, making a loud noise, and the housefly takes off.
No one is particularly impressed.
She then places a second housefly in the same spot, but carefully immobilizes its wings.
Again, the pans clash, but this time, the fly stays put.
“See,” the intern proclaims. “When you disable a fly’s wings, it goes deaf.”
An old joke and unfair comparison, perhaps, but neurobehavioral and neuropsychiatric research present challenges rarely experienced in most other scientific efforts—and sometimes it feels a lot like the scenario above.
Unlike many areas of research where the molecular and medical align discretely, here that alignment can be significantly hazier, criss-crossed with a multitude of confounding and disguising factors.
“For many of these complex behavioral assays ... the ultimate goal is to identify disease-relevant endpoints that are robust, reliable, and reproducible, and that can be employed to evaluate potential novel therapeutic agents,” wrote Jill Silverman of the UC Davis MIND Institute and Jacob Ellegood of Toronto’s Mouse Imaging Centre in a 2018 review. “The impact of a competing or confounding behavior on the behavioral endpoints ... cannot be understated.”
They stressed that mutations can cause physical impairments that limit a subject’s abilities to perform a task, much as in the housefly example above.
“Motor defects in [autism spectrum disorder] models, including hypo- and hyper-locomotion, can also have consequences on the behavioral outcome of interest by competing or preventing the subject from engaging in the tasks of core symptomology testing,” the researchers noted. “Just as it is important to understand the limitations of a behavioral task itself, it is important to investigate, acknowledge, and report the limitations of the rodent model being tested so as not to be short-sighted in the interpretations and applications of the data.”
Elucidating the scale of this complexity and designing experiments and models to mitigate at least some of those challenges is a significant focus of neurobehavioral researchers who seek to improve translation of findings not just to humans, but also to the same animals in a more native state.
Teasing out complexity
“My background and training is in Parkinson’s disease, but I studied signaling and inflammation components as an environmental modifier of neurodegenerative disease,” says Taconic Biosciences field application specialist Terina Martinez. “So, I was keenly aware of the fact that we were talking of diseases that have a complex etiology.”
That applies not only to neurodegenerative disorders, she continues, but also to neuropsychiatric diseases where genetics is clearly not the only factor involved. Rather, she points to a variety of environmental and social cues that, alongside genetics and other pathologies, converge in a manifestation of disease.
“Behavior is one of those modalities that is both loved and loathed,” Martinez offers.
“In a complex disease that has maybe six or seven different molecular pathways that converge on a pathologic mechanism, behavior is one of the few modalities to integrate multiple pathways,” she suggests. “So, it is very important as a measure.”
She is quick to add, however, that this ability to integrate multiple pathways to a pathology is also why behavioral studies and models are so challenging. Rather than presenting in discrete terms, where it is simple to reproduce findings from one experiment to another, these analyses offer significant variability.
“The challenge is in acknowledging [behavior’s] importance, knowing what the limitations are, and being thoughtful about interpretation so you don’t over-interpret,” Martinez states. “And, on the front end, designing studies so you can try to optimize the behavioral observations in a way that’s interpretable.”
And that complexity is, in some ways, confounded by the nature of animal models and testing.
Experimental models are, by necessity, a gross simplification of what is happening in a human patient, as well as what is happening with test animals under natural circumstances, suggests Lucas Noldus, founder and CEO of Noldus Information Technology and recently appointed professor at Radboud University in Njimegen, the Netherlands.
“In the laboratory, we try to eliminate as many uncontrollable variables and reduce the experiment to a very simple set of stimuli and outcome measures, which we then record and from which we tease out the results of the effects of treatment,” he continues.
This has resulted in a vast collection of test paradigms that address single aspects of behavior or functional domains of the brain. These could be tests of locomotion, where the animal goes or how long it stays there, how it interacts with cage-mates, or even its diurnal rhythms.
“And all these different aspects of behavior were traditionally tested in separate devices, apparatuses and tests,” Noldus notes.
This isn’t to say that these tests have had questionable value. Instead, Noldus suggests they have been quite helpful in establishing specific relationships—say, between an administered compound and a behavioral outcome. He is quick to note, however, that translational or ecological validity has long been a weakness.
“The behavior of an animal in a barren cage, devoid of any stimulation, with the animal being observed for 10 minutes, during which you record whether the animal moves to the center of the cage, is hardly a valid representation of what happens to a human patient suffering from an anxiety disorder during his daily pattern of life, at home, at work, in the open space in the street,” Noldus offers as an example. “The environment in which we humans operate and perform is so much more complex than the simplistic representation in the test arena for an animal that the translational value is inherently weak.”
“That translatability is very key,” she says. “Looking at some of these complex neuropsychiatric questions, how do you ask your mouse if it’s depressed?”
In DDNews’ April 2019 Special Report on Neuroscience, Aptinyx's president and CEO Nobert Reidel and Brain and Cognition Discovery Foundation's executive director Roger McIntyre offered much the same observation while discussing efforts to develop new treatments for depressive disorders.
“It is hard to measure depression in a rodent,” acknowledges Riedel. “There are ways in which we look at this, but I would say that it is of limited value in predicting a human disease course in a depressive disorder.”
“There is no question that the animal models we have are not just imperfect, but are serving as a major limitation to progress in the field,” McIntyre adds.
“I have done that forced swim test [on rodents],” offers Riedel. “Why would you stop swimming? Well, maybe the rat doesn’t know how to swim. Maybe it’s weak. Maybe it did well the day before. But to say that it is because they’re depressed is a stretch.”
“Certain human symptoms like hallucinations, delusion, guilt; obviously, those aren’t going to replicate in an animal,” Martinez presses. “But other things have the potential to be replicated.”
She points to examples like executive function, motivation, working memory and emotion, which she argues are more reasonable to ask an animal to model.
Further complicating the translational validity of neurobehavioral models and experiments, says Noldus, is the manner in which the experiments have been conducted.
“Because there were no ways in the past to automate the measurements,” he explains, “all the measurements were performed by human beings, who would handle the animals, transport the animals, administer the compound, do the observations.”
Despite the use of common protocols, the natural variability in how the human researchers handled the animals, performed the manipulations, and even how they dressed and smelled was an inherent confounding factor for these studies.
“We sometimes forget that mice, rats, all other animals have much more heightened senses of smell,” Martinez echoes. “They use other senses to engage with their environments and register stress and respond to their environment.”
“We can literally turn on or off certain behavior outcomes in mice depending on whether or not the person running the study is male or female,” she offers. “The animal will smell testosterone on a male operator, and it will automatically enhance their fight-or-flight response.”
And because these hidden variables influence experimental outcomes, Noldus adds, animal behavior studies have a reputation of being very difficult to reproduce.
Highlighting this challenge was a 2018 report from Anne Andrews and colleagues at UCLA, who looked at questions of reproducibility and validation in the rodent anxiety test novelty-suppressed feeding (NSF).
NSF monitors hyponeophagia, a state where the normal tendency to avoid new environments competes with the need to find food. Thus, the longer the latency period to the first bite of food, the greater the state of anxiety.
Following previously reported NSF test parameters on a strain of mice genetically engineered to lack SERT expression, the researchers found they were unable to reproduce the earlier results. It was only through a systematic modification of different test parameters that they were able to achieve the same outcomes.
From this, the researchers posited two conclusions.
Firstly, they wrote, “had we assumed the NSF test was ‘working’ without first validating the test in our hands, the results of experiments investigating novel phenotypes could have been wrongly interpreted.”
Secondly, they opined, “it is less critical to reproduce exact conditions reported by others, though these are a logical starting point. In contrast, it is more important to determine experimental conditions in individual laboratories that produce expected results, acknowledging that precise conditions can vary across laboratories.”
The researchers went one step further, however, reporting on challenges of reproducing behavioral studies even within the same lab following their move from Penn State to UCLA.
Having re-established their SERT-modified mice from cryopreserved embryos, the researchers repeated a different anxiety-related behavioral test, the elevated plus maze, only to realize that yet again, they could not reproduce results from other labs or their own.
As the scientists reported, they were using the same strain of mice, the same maze, similar lighting conditions and even the same experimenter.
“Even within the same research group, behavior phenotypes can shift, and differences may go unnoticed without ongoing test validation,” the authors noted. “Behavior changes can be due to differences in laboratory or animal care personnel, changes in environmental/housing conditions, or genetic or epigenetic drift.
“Thus, even laboratory-specific conditions benefit from periodic revalidation.”
Martinez offered her takeaways from this study.
“I found it absolutely fascinating that when they tried to control for all variables, every single one possible, that this didn’t actually contribute to success, because it didn’t magically solve the problem,” she opines. “As with so many things, there’s a balance to it. One of the thoughts that resonated with me is that researchers really need to pay close attention to that initial validation step.”
“But then there also needs to be really close attention to reporting in a more accurate and detailed fashion the parameters of the behavior testing that was done in the environment,” she adds.
This is something that is becoming increasingly important at Taconic, for example.
“We’re understanding that even within the ecosystem of, say, one company or in Taconic, from one animal room to another, the environmental conditions may be different, the number of animals in a cage, the people taking care of the animals,” Martinez explains. “Even just the food, water and microbiome.”
Fortunately, this expanding recognition is being met with action.
Noldus points to the Innovative Medicines Initiative (IMI) as one international effort to recognize and address challenges like these. One project under the IMI umbrella is European Quality in Preclinical Data, or EQIPD.
An assembly of 29 pharmaceutical companies, universities and technology developers, EQIPD is investigating the variables that influence the quality of preclinical data, seeking to improve not only the quality of this data, but also the processes involved in generating it.
In January, along with the American Society for Pharmacology and Experimental Therapeutics, three members of the EQIPD consortium published new instructions to authors outlining new methods to display and report experimental data with an eye to greater transparency, less risk of bias and ultimately, greater scientific rigor (see sidebar article titled “Full disclosure”).
“It is trying to identify the sources of variability and secondly, to define protocols and standards by which the variation is reduced by adhering to more rigorous, robust designs of experiments,” summarizes Noldus.
Without such guidelines, he adds, nobody can ever reproduce your study and reuse your results.
Martinez also sees opportunities for publishers to step up and help facilitate both model validation and reproducibility.
“It’s hard to get validation data published and into the public domain,” she argues.
Journals prefer to publish really tight, hypothesis-driven stories, she complains, and validation or confirmatory studies simply do not fit that cookie-cutter approach.
For this reason, she highlights different formats that can cater to this unique need, offering the video journal JoVE as an example.
“This is perfect for behavior,” she enthuses. “If there are certain aspects of an animal behavior study that really are operator-dependent or have a very acute environmental component, you can capture that in a video format.”
Integral to making progress in this area, Martinez presses, is an awareness of being really accurate and detailed at a more enhanced level than has been the norm for the past several decades.
And, as suggested earlier, part of enhancing reproducibility will also come in minimizing some of the sources of variability in the first place through experimental design and execution.
Platform companies, including Noldus IT, are stepping up to improve the ways in which behavioral neuroscience is studied.
As Noldus describes the approach: Rather than bring the animal to the experiment, we are bringing the experiment to the animal.
The goal, he continues, is to move away from artificial arenas and test apparatus, environments completely unlike how an animal lives normally or how the downstream patient experiences a neurological or psychiatric disorder, and move toward more natural or near-natural settings using automated measurements and multiple testing modalities.
By examining multiple outcomes or even a single outcome but from multiple perspectives, there is an opportunity to discover much more meaningful behavioral data and insights.
The home cage environment—Noldus’ PhenoTyper is a good example—is enriched with food, drink, shelter, bedding, whatever the animals need to behave as they might normally. And using recording techniques such as video, audio or other sensors, researchers can unobtrusively observe the animals for days or weeks.
Likewise, technologies for stimuli or challenges can be brought to the home cage, where changes in behavior can be monitored while the animals interact in a social setting.
“That has been a big shortcoming of previous generations of technology—the study of social behavior wasn’t really possible,” Noldus says. “And many of the psychiatric disorders that we are dealing with in our current society in the West have a social component.”
He offers examples of conditions like schizophrenia, which leads to social isolation, and autism or anxiety disorders.
“Many of them must be addressed taking the social aspect into account, because we humans are social beings, and our performance is often determined in a group setting at work or in a family,” he stresses.
A good example of this is efforts at high-throughput automated olfactory phenotyping reported by Janine Reinert and colleagues at Heidelberg University late last year. The researchers used group-housed RFID-tagged mice to monitor investigator-free training and response of mice to rewarded and unrewarded odors.
The mouse cohort was housed in a variant of a behavioral testing platform known as the AutonoMouse.
“Such a design can house two cohorts of mice (i.e., genetically modified mice and their littermate controls) for simultaneous behavior testing using a single olfactometer, thereby reducing potential sources of variation,” the authors described. “As animals are able to freely access the testing area whenever they are motivated to obtain a reward, this setup produces a large number of trials performed by highly incentivized animals.”
The researchers noted that once a cohort had been trained on one odor pair, the number of trials to learn a second odor pair was typically shorter. And although they noted a correlation between smaller cohorts and higher success rates, the difference from larger cohorts was small and they could not rule out confounding factors.
Interestingly, where cohort size seemed to have the most significant impact was in the circadian rhythm of activity. In smaller cohorts, the mice tended to perform their daily trials during the dark phase, with peak activity in the fourth hour of the night phase. In larger cohorts, the activity was distributed throughout the day with no peak in activity.
Such a difference in activity between day and night phases is something that has until recently been either largely unappreciated or ignored.
“Mice and rats are naturally nocturnal, so any behavior test that is run on them in the human day with the lights on, effectively yanks them from their otherwise peaceful night of sleep and makes them perform under sleep deprivation,” notes Martinez. “It seems reasonable to infer that this can be a major factor for behaviors or disease models that include stress, motivation, etc., as either dependent mechanisms or confounds.”
Reinert and colleagues also reported high reproducibility both for animals within a given cohort and across multiple experiments.
“In addition to simultaneously phenotyping large cohorts of mice or testing a large variety of arbitrary odor mixtures, the setup could also be a useful tool to prepare mice for more complex experiments like in-vivo imaging or electrophysiological recordings,” the authors suggested. “As these experiments themselves can be very time-consuming, the setup could be used to, for example, generate a continuous supply of pre-trained animals without the need for potentially time- and labor-intensive manual training.”
They also noted that the modular and non-proprietary nature of the equipment meant it could be further modified for additional stimuli, such as visual or tactile cues, or for more complex olfactory tests.
In 2018, Patrick Nolan and colleagues at MRC Hartwell Institute and Actual Analytics reviewed efforts to assess mouse behavior through the light/dark cycle using home-cage analysis (HCA) platforms.
“Robust changes in social interactions over the dark and light phase can be observed in the mouse home cage using the HCA system, where cumulative time spent in close proximity (<75 mm) to other individual cage mates can be recorded over time,” the authors wrote.
But beyond the anticipated night activity and day nesting observations, they continued, the HCA system allows for social and behavioral monitoring of animals of mixed genotype, both in snapshots and longitudinally.
“True home-cage phenotyping over long periods has the potential to greatly enhance the study of a wide range of neurobiological diseases by enabling the accurate measurement of progressive behavioural changes in the same animals over weeks and months,” the team noted, as exemplified by a particular experimental observation of three lab animals.
“While the three animals in the cage show similar activity during the light phase, one of the mutants shows sustained hyperactivity during the dark phase up until dawn,” the researchers described. “Without continuous monitoring over the light/dark phase, it would not have been possible to observe this phenotype, and its potential impact on the welfare of the animals would have gone unnoticed.”
They also described the automated analysis of unprovoked cage-bar climbing activity as a subtle measure of motor function.
“We used this automated system to analyse climbing activity in detail over three consecutive days in a mouse line with progressive motor deficits with wildtype littermate controls at eight and 13 weeks of age,” the authors wrote. “Preliminary data indicates that a specific time-dependent decrease in climbing activity, detected using the automated system, is a strong indicator of disease onset in this line.”
Although many of these tests offer discrete measurements—e.g., swim time, frequency of events—observing a behavior or making a correlation with disease is still viewed by many as qualitative and lacking in the rigor of the more quantitative molecular techniques.
Room for all
Noldus acknowledges this has long been a challenge, but he suggests that efforts like those described above are helping to change that perception.
The automation of behavioral testing using advanced sensor technologies, digital imaging and video tracking, and AI-powered data analysis is allowing researchers to provide robust, quantitative phenotyping of rodents, including genetically modified animals, that is on par with molecular techniques.
Martinez goes even further.
“The other thing that I think is really important is taking a page from the playbook of machine learning and big data, and really acknowledging that a lot of care and attention needs to be put into what the input is,” she remarks. “And then, if you have 40 different things that you’re observing, which six provide a signature or provide less sensitivity to noise and environmental influence?”
“It’s really an interesting way to take what is probably the oldest play in the scientist playbook, which is observing animals in their environment, and then integrating these really next-gen technologies to get to the high-level functioning of what would otherwise be a very simple observation,” she enthuses.
That said, neither Noldus nor Martinez see phenotyping and genotyping as competitive efforts, but instead look to the synergies between the two approaches.
Martinez posits that molecular biomarkers may in fact be one way to ensure translatability from animal model to human patient.
Once you have a behavior or series of behaviors that show a connection to human disease, there is an opportunity to supplement or enhance that connection by searching for companion biomarkers that correlate with the behavior(s).
“Whether you do imaging or biochemistry or some other non-invasive type of means, you can build a case of correlation and more confident translatability of that behavior modality from animals to humans,” she notes.
Silverman and Ellegood expanded on this idea in their review.
“In conjunction to behaviorally relevant outcome measures, the search for biomarkers of [neurodegenerative disorders] has grown and heavily relied upon visualizing the brain in an effort to understand the neurodevelopmental differences in preclinical genetic models and to determine if those neuroanatomical alterations can be reversed or corrected,” they wrote.
They highlighted efforts to examine models both at the cellular level, using platforms such as histology, two-photon microscopy and electron microscopy, as well as at the mesoscopic level with CT, PET and MRI.
Silverman and Ellegood then offered further thoughts on the applications of MRI.
“The non-invasive nature of MRI also means that it can be performed repeatedly to track disease progression and loss of skills and/or symptom onset (or regression by reversals of brain phenotypes), extremely beneficial to neurodevelopmental research,” they suggested.
“In collaboration with prominent behavioral scientists, we have spearheaded an effort to correlate neuroanatomical differences with behavioral metrics, which allows for powerful inferences and biochemical hypotheses to be pursued for any given study,” the authors continued. “In fact, showing direct relationships and links amongst behavior and any of our numerous MRI readouts (e.g., regional volume, DTI, cortical thickness) can be used as biological markers, outcome measures, and may define targets for genetic or pharmacologic intervention.”
For Noldus—the founder and the company—the value of such collaborations cannot be underestimated.
“The molecular biologists tell us what the genome looks like and where exactly the mutations have occurred,” Noldus explains. “The behavioral scientists and the physiological monitoring experts provide digital readouts of the outcomes of these mutations at the phenotypic level.”
“We work with the genetics companies that make tools for the genetic analysis,” he offers. “Our behavioral data sets are merged with the genetics data sets to find the correlations between, say, specific changes in the genome and changes in the behavioral readouts.”
As an example, he offers his company’s efforts with Sylics, which has developed software tools to integrate these otherwise disparate data sets.
Sylics’ AHCODA platform, for example, rapidly processes, quality checks and analyzes data automatically. The company has also developed a variety of tests to monitor spontaneous behavior, as well as prespecified behaviors and even disease-specific behavioral parameters.
At Neuroscience 2019 in Chicago, Sylics CEO Maarten Loos and collaborators presented their efforts to develop better models of Alzheimer’s disease by seeding tau protein into the brains of different mouse strains. The goal was to facilitate testing of antibody-based therapies for protection against tau spreading and cognitive decline.
Although they were able to see differences in tau pathology between transgenic htau mice and C57BL/6J wildtype mice, a battery of behavioral tests found no signs of cognitive impairment in either model six months after seeding.
Thus, such experiments are critical in determining the limitations of current models of neurological disease.
“Behavioral science is becoming more quantitative, reproducible, standardized,” says Noldus. “So, I hope that these critical voices will be silenced in the near future as we produce more robust results.”
Last year, Noldus and colleagues at organizations such as Sylics, Pfizer, Roche, University of Groningen and others demonstrated the changing reproducibility landscape in a multi-center study of a genetic rat model for autism spectrum disorder.
The study, led by Groningen’s Martien Kas, involved replicating previous observations of autistic-like hyperactive and repetitive behavior phenotype in a Shank2 knockout rat model of synaptic dysfunction. Beyond simply reproducing previous results in a single lab, however, the collaboration sought to reproduce the results across three study sites, as well as examine the response to pharmacological intervention with mGluR1 antagonist JNJ16259685.
As the researchers explained, the study design was adapted from earlier work but with additional focus on preventing bias in the design, collection and analysis of data, and with this analysis performed using automated scoring.
They noted “that rigorous alignment of experimental protocols between three research centers resulted in comparable experimental findings across sites for both genotype and treatment effects.”
Phenotypic differences between the Shank2 knockout and wildtype rats were observed across all three sites, including consistently heightened motor activity and stereotypic circling behavior in the knockouts.
“Likewise,” the authors reported, “a consistent and dose-dependent attenuation of motor activity and circling behavior in both [knockout] and [wildtype] rats by JNJ16259685 was found across the three study sites.”
“These results show that reproducibility in preclinical studies can be obtained and emphasizes the need for high-quality and rigorous methodologies in scientific research,” they concluded. “Considering the observed external validity, the present study also suggests mGluR1 as potential target for the treatment of autism spectrum disorders.”
For Noldus, the study highlights the value of diligence to agreed-upon protocols and standardization of methods, tools and procedures.
“It is not as simple yet as ordering a DNA sequencer from PerkinElmer,” he says, “putting it on your benchtop in Los Angeles, and putting exactly the same device on the benchtop in New York and getting exactly the same outcome from the same samples.”
“We want it to be like that,” he adds, “but we’re heading in that direction.”
And that progress is vital, he presses, as the genome alone has yet to give us the outcomes we sought.
“Eventually, we’re talking about disorders and diseases that manifest themselves as behavioral problems in reality,” Noldus concludes. “You’re not suffering from the knock-out in your gene; you’re suffering from anxiety in daily life. And that’s what eventually needs to be cured.”
In continuing efforts to address the so-called reproducibility crisis in experimental biology, Martin Michel of Johannes Gutenberg University, T.J. Murphy of Emory University and Harvey Motulsky of GraphPad Software developed revisions to the Instructions to Authors for the American Society for Pharmacology and Experimental Therapeutics (ASPET) journals.
The authors were quick to note that the revisions were not developed to tell researchers how to design and execute their studies, but rather focused on data analysis and reporting with an eye toward improved robustness and transparency.
Such efforts are vital, if only to save scientists from themselves.
“I think there is an inherent tendency and bias in over-interpreting connections between outcomes,” says Terina Martinez, field applications specialist for Taconic. “That’s not because of any misguided nefarious intention. It’s simply because the scientists are asking questions and seeking answers.”
“Sometimes, if you’re not properly integrating the limitations and the other x-factors, then the answer that you see looks like the only possible outcome to the question when it isn’t,” she explains.
A subset of the ASPET recommendations included:
- Detail how data were analyzed, including normalization, transforming, subtracting, baselines, etc.
- Identify if all or part of the study tested a hypothesis with prespecified design or was exploratory
- Explain whether sample size or experiment number were predetermined or adapted after results were obtained
- Explain whether statistical analysis was predetermined or adapted after results were obtained
- Explain whether outliers were removed, the criteria for removal and if this was predetermined
- Use P values sparingly, but instead focus on confidence intervals
- Avoid or clearly define the meaning of “significant” in reporting
- Present graphs with as much granularity as reasonable, e.g., scatter plots rather than bar graphs
“We believe that these revised guidelines will lead to a less biased and more transparent reporting of research findings,” the authors concluded.
(Adapted from Michel et al. JPET. 2020;372:136-147.)