Name:
From Chaos to Clarity: GraphRAG, AI, and Process Orchestration
Description:
From Chaos to Clarity: GraphRAG, AI, and Process Orchestration
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/7d754ac7-5ecd-4edd-9b9a-ebdb97cf7b7d/videoscrubberimages/Scrubber_10.jpg
Duration:
T00H16M40S
Embed URL:
https://stream.cadmore.media/player/7d754ac7-5ecd-4edd-9b9a-ebdb97cf7b7d
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/7d754ac7-5ecd-4edd-9b9a-ebdb97cf7b7d/SSP2025 5-28 1330 - Industry Breakout - Ebcont.mp4?sv=2019-02-02&sr=c&sig=G5Udf4dSrQQKU6UxDzdMY8JadhBu7d2Yc7qID6AzAhM%3D&st=2025-06-15T22%3A10%3A43Z&se=2025-06-16T00%3A15%3A43Z&sp=r
Upload Date:
2025-06-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hi, my name is Colin O'Neill. I'm a consultant with site fusion pro console, which is a component content management system. And we're. So today what I'd like to do is talk about something we're all grappling with, which is the role of AI in scholarly publishing and more importantly, how we are able to bring order to what's quickly becoming unpredictable and high stakes ecosystem.
So I've titled this presentation from chaos to clarity, because what we're seeing isn't a failure of AI. It's a failure of workflow, structure and trust. My goal is today is to help you understand how to move from trial and error AI pilots to governance deterministic AI systems that scale. So we've all felt the pressure from leadership.
Who wants innovation. Editorial wants to improve their productivity, marketing wants differentiation. And the result has been what we've been internally calling a stampede of AI experimentation. We've been bolting tools onto existing systems, adding plugins to cmses and rolling out LLM pilots to the production staff and just kind of seeing what happens. And for a while it was exciting.
But many of these efforts stalled, and after efforts stall, it leaves behind a trail of inconsistent output, unclear ownership and fragile integrations. So what we're left with is not transformation. It's noise. And because there's no architectural oversight, no defined flows, no deterministic, deterministic framework to govern how AI should function.
So in these. So with these symptoms with this what we're left with are symptoms of chaos. So I've listed out a few here. We have AI generated metadata which conflicts across platforms. Reviewer summaries read like generic marketing text classification decisions that lack traceability or consistency.
And because of that, editorial teams are rejecting AI output due to not trusting it. Developers struggle supporting these ecosystems as well. Writing one off scripts to fix one simple problem, instead of holistically solving or coming up with a solution. And also, there's just really not been any clear ownership with versioning or tracing an audit trail. And one of the things that we're kind of trying to focus on now is that there is no deterministic framework or orchestrated process.
So what we'd like to do at site fusion pro consult, site fusion is a CMS that's built on the camunda workflow engine, which is a BPM Workflow. And so one of the things that we're introducing to a lot of our customers now are. Is expanding their use of Process Orchestration.
One of the reasons so orchestration isn't about automation alone. It's about intentional control. So at site fusion we build every implementation on top of that BPM engine. Every task decision point approval and I interaction is mapped, managed and untraceable. We may ask why that matters. And it's because without orchestration, is just guessing.
I inside of BPMN workflow is governed. So you decide what the data, what data the model sees, when and how it runs, who approves or rejects outputs, and what happens if something fails. Most importantly, we can enforce determinism. So what is Process Orchestration. It's a system for coordinating tasks between humans, AI and services.
It uses BPMN to define the workflows so every step has defined inputs and outputs, trigger conditions, error handling logic and deterministic execution follows rules, not guesses. So my notes are here. Sorry so with task flow, task flow will define the exact sequence of steps who does what and when.
It keeps processes predictable and roles aligned. Trigger logic so tasks only run when specific conditions are met. For example, after review is complete or approval is given, which prevents the misuse of AI review gates which are built in checkpoints to ensure outputs like AI summaries are reviewed before moving forward. It adds human oversight where it matters. I'm sure we'll hear human in the loop about 1,000 times over this whole week, but that's where we are attacking it with Process Orchestration.
So we can put in these manual checks anywhere we want. It allows for fallback and error paths like handling errors gracefully retries notifications, alternate paths. It keeps automation resilient and safe. So one of the. So these are all things that you build into any kind of workflow. One of the reasons we like doing it this way is giving every step.
It makes it easy to find out where everything is coming from. Track the audit trail, stop it before it turns into something that's learned by the AI model, and turn it into something that's actually usable and deterministic. And this is how we scale AI responsibly. With orchestration, automation becomes controlled, auditable, and trustworthy. So I alone is not smart.
It needs context. Most LLMs are probabilistic. They generate based on what sounds likely and in scholarly publishing likely isn't good enough. We need fact based determinism. So probabilistic AI relies on statistical predictions. It generates responses based on patterns, probabilities, and similarity. This is typical of most large language models, which produce fluid but unpredictable results based on likelihood rather than certainty.
So what we're trying to work with customers on is deterministic AI, which is rule bound and predictable. Given the same inputs and rules, it will produce the same outputs every time. And in publishing where we trust traceability in publishing, sorry, traceability and transparency are essential. Deterministic systems provide control and reliability. So the highlight of this talk is about graph RAG which when I submitted this abstract graph rag was kind of being introduced a little bit.
But I think everyone's kind of heard what graph RAG is now, which is just retrieval auto. Sorry retrieval. Retrieval Augmented Generation. So it takes your context and it adds it to what's being determined by the LLMs. So it's not only retrieving scoped content through structured queries or similarity scoring alone anymore.
By attaching these systems to BPMN, we ensure that the process is governed and that the retrieval and generation is transparent, repeatable and auditable. So yeah, sorry. Even with sorry in publishing we need this where the model not only generates from approved constrained resources.
There are no surprises, no hallucinations, no guessing. And that's really what graph RAG is supposed to deliver. Instead of throwing chunks into a vector database and hoping for relevance, we build semantic context through the graph.
Knowledge graphs built from structured content like jats or Dita add semantic meaning and enforce scope. For example, a jats article might define an organism, a method and a funding source. Each of these entities is extracted and linked to controlled vocabularies or ontologies. The graph formalizes the relationships organism. A study with method B under grant C, when a query comes in, the system doesn't just guess based on similarity, it traverses these relationships to find authoritative, vetted content.
The LLM is then only exposed to the scoped approved information. Nothing outside the graph's boundary gets included, which reduces hallucination and improves trust. This structure gives publishers full traceability. They can see which nodes were used, what path was followed, and why the content was chosen. So we're no longer relying on fuzzy similarity.
It's based on relationships and those ontologies and formal constraints that we program into the system. We ingest structured content charts, did a nice osts XML, whichever. We construct the graph with these type relationships and only then that lets you trace every answer. Where the model got its information and why that was selected.
So where it starts to work and/or where we've seen it benefit publishers has in peer review summarization. So it summaries only approved reviews applies tone and policy from the draft routed through BPMN for editorial sign off content classification uses taxonomy linked graph nodes. So the classifications are linked to taxonomies, not just keyword matching. And then you have author services, chatbot responses, scope to scope to stage and policy, each example reinforcing the same point that AI plus workflow plus graph equals clarity.
I summarizes reviewer comments only after review completion context includes journal tone policy and reviewer metadata from the graph and output passes through BPMN. Workflow for human approval. Content classification uses taxonomies map to graph nodes, not raw embeddings, and I suggest classifications based on the concept. Relationships, not just keywords, and every classification is traceable to a source path in the graphs.
Author services so graph can power submission help tools with scoped access questions are routed based on submission stage and permission answers pulled only from policy and documentation nodes marked as approval. Marked for approval or as approved. So at site fusion.
We have a camunda BPM engine which is embedded. Every AI task has the same has relevant inputs, outputs and approvals. Prompt variables and results are our tracked fall back and review logic. Sorry fall back and review logic is built into the workflow site. Fusion customers don't bolt an AI onto an existing CMS. Everything is orchestrated at the beginning.
So with site fusion, at site fusion, this is not theoretical. Every customer instance has their own camunda engine, meaning AI endpoints or BPM and service tasks. Prompt templates, retrieval rules and output formats are governed. You can integrate fallback logic, retries and review loops. Process variables track every prompt response and confidence score. You're not just using AI, you're building composable, traceable system where every action can be audited.
So what we're saying here is that structure will always be an improvisation, which is why sorry. So structure will be improvisation. BPM brings control. Graph drag brings the context together.
They create deterministic systems that are transparent and scalable. AI is no longer the risk. It's a reliable system component. So one of the key points here are that structure wins over improvisation. BPM delivers the control graph rag delivers the context. Together they make AI deterministic, transparent, repeatable and scalable.
AI stops being a wild card and becomes a trusted system component. AI doesn't have to be mysterious. It can be designed and we think with graph rag providing the context BPM providing control, we can build systems that are deterministic, auditable, governed and scalable. And this is how we move from chaos, where AI is unpredictable to clarity, where it becomes an accountable member of your publishing workflow.
One of the things that we start talking about is vector search. But vector search has the same kind of problems that a lot of LLM probabilistic systems have. So clustering metadata enrichment or discovery orchestration controls it when it's appropriate and then lets the LLMs do the work when it's appropriate to do that too.
It doesn't require this doesn't these methods don't require massive investment. They require an intent. And knowing what you want to do at the beginning and designing a system that will do that. So we do believe that vector search and probabilistic methods have that place, but only in an exploratory or low stakes context.
Yeah so I just wanted to say thank you for your attention. If you're looking to bring clarity to your AI strategy, or if you want workflows that are traceable, trustworthy and useful, we can talk to me or Mario Chandler. We're over in the exhibition Hall at booth 104. We also have Martin Von Volkmann from fonto who's available to talk about XML editing.
Yeah, that's all I have for today. Does anyone have any questions. OK thank you.