This article summarizes the key insights from the expert roundtable, “AI in Literature Reviews: Practical Strategies and Future Directions“, held in Boston on June 25, where a variety of R&D professionals joined this roundtable, bringing perspectives from across the pharmaceutical and biotechnology landscape. Attendees included senior scientists, clinical development leaders, and research informatics specialists, along with experts working in translational medicine and pipeline strategy. Participants represented both global pharmaceutical companies and emerging biotechnologies, providing a balanced view of the challenges and opportunities shaping innovation in discovery and development of drugs.
Discussions covered real-world use cases, challenges in data quality and integration, and the changing relationship between internal tools and external AI platforms. The roundtable reflected both enthusiasm and realism about the role of AI in drug discovery, underscoring that real progress depends on high-quality data, strong governance, and tools designed with scientific nuances in mind. Trust, transparency, and reproducibility emerged as critical pillars for building AI systems that can support meaningful research results.
If you are in an R&D role, whether in computational biology, computer science, or scientific strategy, and are looking to expand literature workflows in an AI-enabled world, read on for actionable insights, warning signs, and ideas to future-proof your approach.
Evolving Roles and Tool Strategies
Participants emphasized the diversity of AI users in biopharma, distinguishing between computational biologists and bioinformaticians in terms of approach and tools. While fundamental tools like Copilot have proven useful, there is a growing shift toward developing custom AI models for complex tasks such as protein structure prediction (e.g., ESM, AlphaFold).
AI adoption is developing both organically and strategically. Some teams are investing in internal infrastructure, such as enterprise-wide chatbots and data linking frameworks, while navigating regulatory restrictions around the use of external tools. Many organizations have strict policies governing how proprietary data can be handled with AI, emphasizing the importance of controlled environments.
Several participants noted that they work from the literature, focusing more on protein design and sequencing. For these participants, AI is applied earlier in the R&D process, before findings appear in publications.

Data: abundance meets ambiguity
Attendees predominantly use public databases such as GeneBank and GISAID rather than relying on the literature. However, problems remain: data quality, inconsistent ontologies, and lack of structured metadata often require retraining public models with proprietary data. While providers offer academic content through large knowledge models, confidence in those results remains mixed. Raw, structured data sets (e.g., RNA-seq) are strongly preferred over derived insights.
One participant described building an internal knowledge graph to examine drug interactions, highlighting the challenges of aligning internal schemas and ontologies while ensuring data quality. Another shared how they incorporate open source resources such as Kimball and GBQBio in the development of small molecule models, focusing on rigorous data annotation.
Several participants expressed concern about false positives in AI-based search tools. One described experimenting with ChatGPT in research mode and the Rinsit platform, both of which had issues with accuracy. Another emphasized the need to reveal metadata that identifies whether a publication is supported by accessible data, helping them avoid studies that offer visualizations without underlying data sets.
A recurring theme was frustration with the academic community’s reluctance to share raw data, despite expectations to do so. As one participant noted:
“This is a competitive area, even in academia. Nobody wants to publish and then be monopolized. It’s their bread and butter. The system is broken, so we don’t have access to the raw data.”
When data sets are not linked in publications, some participants noted that they often communicated directly with authors, although response rates were inconsistent. This highlights a broader unmet need: pharmaceutical companies are actively seeking high-quality data sets to complement their models, especially beyond what is available in specific subject repositories.
Literature and the need for feedback loops
Literature tracking tools struggle with both accuracy and accessibility. Participants mentioned difficulties filtering out false positives and retrieving extractable raw data. While tools like ReadCube SLR Although they enable user-driven iterative refinement, most platforms still lack persistent learning capabilities.
The absence of complete data sets in publications, often hidden for competitive reasons, remains a major obstacle. Attendees also raised concerns about AI-generated content contaminating future training data and discussed the legal complexities of using copyrighted materials.
As one participant noted:
“AI generates so much content that it feeds back to itself. New AI systems are trained with results from older AI. You get less and less real content and more and more regurgitated material.”
Knowledge graphs and the future of integration
Knowledge graphs were widely recognized as essential for integrating and structuring disparate data sources. Although some attendees speculated that LLMs could directly infer such relationships, the consensus was that knowledge graphs remain essential today. Companies like metafacts They are already applying ontologies to semantically index data sets, enabling more accurate, hallucination-free chatbot responses and deeper investigative analyses.
What’s next: trust, metrics and metadata
Looking ahead, participants advocated for AI results to include trust metrics, similar to statistical trust scores, to assess trustworthiness. Tools that index and display supplementary materials were considered essential for discovering usable data.
One participant explained:
“It would be valuable to have a confidence metric along with rich metadata. If I’m exploring a hypothesis, I want to know not only what supports it, but also the types of data—for example, genetic, transcriptomic, proteomic—that are available. A tool that answers these types of questions and breaks down the answer by data type would be incredibly useful. It should also indicate whether complementary data exist, what type they are, and whether they have been evaluated.”
Another emphasized:
“A reliability metric would be very useful. Articles often make contradictory or tentative claims, and it is not always clear whether they are supported by data or based on assumptions. Ideally, we would like to have tools that can assess not only the reliability of an article, but also the reliability of individual statements.”
The rich, although unvalidated, potential of preprints was also recognized, particularly bioRxiv content, which can offer valuable data not yet subject to peer review.
Conclusion
The roundtable reflected both enthusiasm and realism about the role of AI in drug discovery. Real progress depends on high-quality data, strong governance, and tools designed with scientific nuances in mind. Trust, transparency, and reproducibility emerged as critical pillars for building AI systems that can support meaningful research results.
Digital Science: Enabling scalable and reliable AI in drug discovery
At Digital Science, our portfolio directly addresses the key challenges highlighted in this discussion.
- ReadCube SLR offers auditable, feedback-based literature review workflows that allow researchers to iteratively refine systematic searches.
- Dimensions & metafacts offers the Dimension Knowledge Grapha comprehensive, interconnected knowledge graph that connects internal data with public data sets (spanning publications, grants, clinical trials, etc.) and ontologies, ideal for powering structured and trusted AI models that support projects across the pharmaceutical value chain.
- altmetric identifies early signals of research attention and emerging trends, which can improve model relevance and guide research prioritization.
For organizations seeking centralized AI strategies, our products offer interoperable APIs and metadata-rich environments that integrate seamlessly with custom internal frameworks or LLM-based systems. By building transparency, reproducibility, and structured knowledge into every tool, Digital Science helps computational biology teams create AI solutions they can trust.
#Drug #Discovery #Key #Insights #Computational #Biology #Roundtable