Clinical trial protocols are among the most complex and resource-intensive documents developed in pharmaceutical and biotechnology research. Designed to define objectives, procedures, participant criteria, and data collection practices, protocols serve as the operational blueprint for evaluating investigational therapies. However, as the range of study endpoints has grown, so too has the volume, complexity, and burden of data collection placed upon research sites and participants.
A new collaborative study conducted by the Tufts Center for the Study of Drug Development (Tufts CSDD) and 15 TransCelerate BioPharma sponsor companies provides fresh insight into the nature of this burden. Drawing on 105 multi-therapeutic protocols with primary completion dates after 2018, researchers analyzed data volume per study and per participant, classifying procedures into core, standard, and non-core based on the endpoints they support.
The results demonstrated a continuation of a long-standing trend: Clinical trials are collecting more data than ever before. Phase 3 trials now average 5.9 million data points, reflecting a sustained 11 percent annual growth rate since 2020. While this expansion supports increasingly sophisticated scientific objectives, it also introduces inefficiencies, executional challenges, and participant strain.
Laura Galuchie, Senior Director at Merck and TransCelerate Program Lead, told DDN, “TransCelerate and Tufts CSDD recognized the growing challenges posed by increasing protocol complexity and data volume. This collaboration was prompted by a shared observation that many procedures in trials may not directly support key endpoints, which may lead to unnecessary burden on sites and participants.”
When procedures outnumber purpose
To better understand how data volume contributes to operational and participant burden, the study categorized trial procedures based on the endpoints they supported and their necessity for demonstrating study outcomes. This approach provided a more nuanced picture of where data meaningfully informs efficacy and safety assessment — and where it accumulates for broader exploratory interests.
Core procedures are those that directly support primary and key secondary endpoints. These procedures are central to determining whether a therapy works and whether it is safe. Standard procedures, while not directly tied to key efficacy measures, are generally required to ensure participant safety, compliance, or appropriate trial oversight. They include routine safety monitoring and assessments necessary to meet regulatory expectations and clinical care standards.
Non-core procedures, by contrast, support exploratory, tertiary, or supplementary endpoints that do not determine the primary value of the therapy being studied. These procedures often provide data that may guide future research questions, inform biomarker investigation, support label expansion strategies, or create clinical understanding beyond the primary study aim. In Phase 2 protocols included in the analysis, non-core procedures constituted nearly 18 percent of the total. In Phase 3, they accounted for just over 16 percent.
The study also introduced non-essential procedures. These are procedures that do serve necessary functions — either for endpoint evaluation or for participant safety — but are performed more frequently than required to support those objectives. As explained by Kenneth Getz, Executive Director and public health researcher at Tufts CSDD, “If a procedure is conducted three times when only the data from two occurrences is needed to support its endpoint, then the additional occurrence is considered non-essential.”
Across the protocols analyzed, up to 12.6 percent of procedures supporting core and standard endpoints were identified as non-essential. When translated into data volume, those additional procedures accounted for eight to 17 percent of the total data points collected.
Combined with non-core procedures, the study found that nearly one-third of all procedures and data points across Phase 2 and Phase 3 protocols fall into categories that do not meaningfully contribute to evaluating the primary scientific questions. More than half of the data in this combined non-core and non-essential category came from clinical and patient-reported questionnaires — assessments that are often time consuming for participants and logistically intensive for clinical sites.
As Galuchie noted, the intent is not to suggest that sponsors have been gathering unnecessary data, but rather to “invite sponsors to consider what is the most appropriate data collection method in a particular trial, given the potential burden.” The findings indicate that intentional design — not simply more or less data — is key to optimizing both scientific value and the participant experience.
The human and operational cost of complex trials
As protocols expand in scope, the effects are felt most acutely at the research sites implementing them and by the participants experiencing them, influencing everything from recruitment and retention to data quality and trial timelines.
Participants often experience the impact of protocol complexity most directly. Each additional assessment, questionnaire, clinic visit, or monitoring requirement represents time, logistical planning, and in many disease areas, physical or emotional discomfort. Non-core and non-essential procedures may provide valuable context or exploratory insight, but they also lengthen visits, increase the frequency of interactions, and contribute to participant fatigue. For individuals with progressive, chronic, or rare diseases, these added demands can become a deciding factor in whether they remain enrolled.
The burden on research sites is similarly significant. Each procedure requires trained staff, time, documentation, and often coordination across multiple teams or third-party services. When procedures accumulate that do not directly support primary scientific outcomes, they compete for limited site resources, slowing trial operations, and increasing the likelihood of errors.
Increasing data volume can also have unintended consequences for data quality. Larger datasets require greater oversight and carry a higher risk of transcription errors, missing data, and discrepancies that require resolution. The result is a paradox: More data collected for the sake of completeness can lead to greater noise and less efficiency in determining the very outcomes the trial is designed to assess.
These operational pressures also contribute to broader systemic effects. Clinical research sites consistently report staffing shortages, administrative burden, and burnout as critical challenges. Trials that require extensive non-core data collection may become less attractive to sites deciding which studies to support. Similarly, participants — especially those with limited flexibility, mobility limitations, or caregiving responsibilities — may hesitate to enroll or may discontinue participation if the perceived burden outweighs personal benefit.
Rethinking protocol design for efficiency and relevance
The findings support the guidelines’ call for more intentional protocol design and reinforce the importance of collecting only essential and relevant data.
- Natalia Camargo Sanmiguel, Merck
The findings of this study come at a time when regulators and global standards bodies are encouraging greater flexibility and proportionality in clinical trial design. In January 2025, the International Council for Harmonisation (ICH) updated the Guideline for Good Clinical Practice (GCP), ICH E6(R3), to reflect emerging technologies and evolving study designs.
Natalia Camargo Sanmiguel, Associate Director of Clinical Data Management at Merck, noted that, “The study aligns closely with guidelines, such as ICH E6(R3), which emphasize fit-for-purpose data collection and minimizing unnecessary complexity and burden. The findings support the guidelines’ call for more intentional protocol design and reinforce the importance of collecting only essential and relevant data.”
Looking forward, the study suggests several promising strategies for strengthening alignment between scientific rigor and participant accessibility. One of the most important opportunities is to bring cross-functional review earlier into the protocol development process, allowing medical, biostatistics, operations, site, and patient-experience perspectives to shape data collection plans before they are finalized. This shift encourages teams not only to ask whether a procedure could yield useful information, but also its necessity, frequency, and relevance to the specific trial context.
This research highlights the need for a broader cultural shift: moving away from the assumption that more data is inherently better and toward recognizing that the most meaningful data is collected with purpose. By aligning protocol design with regulatory guidance and the lived realities of trial participants and sites, sponsors have the opportunity to create trials that are scientifically rigorous, operationally feasible, and ethically grounded.











