Name:
Automated Knowledge Base Creation in Finance
Description:
Automated Knowledge Base Creation in Finance
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/69f84287-3493-46df-a5d3-0c348d59c323/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H19M35S
Embed URL:
https://stream.cadmore.media/player/69f84287-3493-46df-a5d3-0c348d59c323
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/69f84287-3493-46df-a5d3-0c348d59c323/D1 - V19 - Nicolas Seyot.mp4?sv=2019-02-02&sr=c&sig=RoKkVx6bF4ArAXZbYoD9OFMzq41RbVvblLNvPqwEQhM%3D&st=2025-02-22T17%3A11%3A28Z&se=2025-02-22T19%3A16%3A28Z&sp=r
Upload Date:
2021-02-14T00:00:00.0000000
Transcript:
Language: EN.
Segment:1 A look at user experience at Morgan Stanley.
NICOLAS SEYOT: Well, thanks a lot. I mean, including my co-panelists there, real honored to be here. Just want to thank Francois and Tomas and the Colombian team for organizing such a valuable conference. And was very impressed last year, got the chance to attend very interesting sessions. And so we were hoping to do two things today. Give you and give the community a bit of a look into our user experience.
Segment:2 00:41 - 01:44 Project introduction.
NICOLAS SEYOT: I've been at the firm five years. We started on semantic and modeling journey about 3 and 1/2 years ago, four years ago. And so we'll share with you what the experience has been like at a high level, we'll try to the high points and hopefully, they'll be interesting. And then we'll try to give back but getting into something a little more technical and share details on how we approach the automated creation of a knowledge base, a SKOS knowledge base.
NICOLAS SEYOT: So to tell you a bit more about my team in information management, our mission is to enable the firm to retrieve, retain and protect information. So simply put, that means, how long should we keep information, where can we find information, what controls apply to it. So a lot of very interesting presentations today in that realm.
Segment:3 What is information about, conceptually?.
NICOLAS SEYOT: So if you think about the high level questions that I just posed, in the firms like ours, we we're looking at hundreds of thousands of discrete repositories of information. Many different groups with specialized knowledge. Not one single environment, but as Bethany, Brian, Radu mentioned, a lot of external benchmark in the consumer world that we've all learned to appreciate in our daily lives, built thank by companies like Google and Amazon and pioneers in the knowledge graphing space.
NICOLAS SEYOT: So going to the heart of your information management mission and information management as a discipline, the key, the central question for information management as a discipline is, is what is the information about conceptually? And the key challenge there is, how do we answer a question like that at scale systemically? So if you take that example, when the user says, where can I find VCP-related information, how long should I keep VCP information, can I share VCP information outside the firm, we have to resolve acronyms obviously.
NICOLAS SEYOT: And when SMEs talk about VCP, they mean based on their knowledge of business continuity, business continuity management, business continuity plan, be able to apply bank VCP contingency, disaster recovery, emergency planning. You get the point. That a great number of concepts will be connected. And so our desire to leverage semantic modeling, leverage ontologies, build knowledge graphs and build knowledge-base, stems from the need to answer that question at scale.
NICOLAS SEYOT: What is the information about conceptually? Where are the repositories where I can find VCP-related information? What rules, what controls apply to informations of that conceptual nature? So we will switch gear now, and I'll tell you a little bit.
Segment:4 Who can use a knowledge graph, who will own it, and the logic and motivation behind a knowledge graph .
NICOLAS SEYOT: We built a knowledge graph around information repositories. But as you can imagine, we need to know who owns those repositories of information, where they're located.
NICOLAS SEYOT: And that led us to create a broad knowledge graph for the firm that we've made available. So from an enterprise standpoint, we've created a core ontology that contains about 194 classes that are readily used and available for the firm. Generally relevant to the whole firm, such as personnel, geography, legal entities, organization and a number of firm-wide taxonomies.
NICOLAS SEYOT: We've created a consortium to help govern and maintain that core ontology that's regularly available to foster internally linked data. We've created communities and we at this point, have three divisions of the firm, including our colleagues in the research division who've joined the advisory board for consortium internally. It's been a great success and partnership.
NICOLAS SEYOT: The credit is really shared there. A lot of passion, a lot of interest, a lot of investments by early adopters in the firm over the past three years. And the open standard principles have really been our guidepost. Commitment to standards, w3c standards, and then internal standards around quality governance. Big focus on cooperation between the divisions.
NICOLAS SEYOT: Transparency and availability of ontologies, of standards through the communities, through the technology and business interest group. And then we've really focused on the notion of voluntary adoption and federated governance. And I think it's been a very interesting journey and we're very excited about what comes next. I think we are seeing more groups at the firm join the interest, the interest groups, both on the technology and business side.
NICOLAS SEYOT: And this adoption-based principle has worked quite well for us so far. So getting back to the topic of information management more specifically. We've created a knowledge graph, as I said, that focuses on information repositories and who owns them and the organizations that use them.
NICOLAS SEYOT: And the concepts that are relating to the information stored in those information repositories. So we've standardized and in some cases, ETL, some cases, virtualization, to get structured information into our knowledge graph. And then the knowledge base that we'll talk about in the next part of this presentation, both of our ontology models and both graphs get stored in the same triple store.
NICOLAS SEYOT: In this case, we're looking at about 100 million triples for the information management graph overall. And the knowledge base specifically, which is that SKOS knowledge base, where we've created automatically and that's the technical detail we'll try and give you quickly after this, is our network of concepts or semantic thesaurus, if you will.
NICOLAS SEYOT: The key use case to again, understand how we're making use of knowledge base, is department search. So giving departments across the firm the ability to search their content. And then reach that content with concepts with tags, leveraging the lists, categories and taxonomies that they use in their domain within the firm as facets for content search. That enrichment is one of the key benefit that we get out of having a Morgan Stanley knowledge base, if you will, that semantic network of the concepts and terms and how they relate to one another within our organization.
Segment:5 Building the knowledge graph.
NICOLAS SEYOT: So if we pivot now to the more technical part of the conversation around the knowledge base creation and the need for automation, I mean, our motivation there was, there are lots of taxonomies out there. There's lots of available thesaurus you can buy, vocabularies. We were very motivated obviously, to have a SKOS knowledge base that reflects our organization, our concepts, the way we use them, the way they interconnect in our firm. The manual creation of knowledge bases can be very time-consuming, error prone, costly.
NICOLAS SEYOT: And it's very difficult to maintain. It quickly gets outdated and importantly from an adoption standpoint, the fingerprints of the people involved in the creation of the knowledge base are quite visible. Usually, manual efforts, you don't you don't get the comprehensive nature that we obtain by taking an automated approach. You end up with curated vocabularies that reflect the knowledge of the group that went about creating it.
NICOLAS SEYOT: So in this case, we started with 6,500 policies and procedures as a curated corpus of knowledge of the firm. And then we ended up with a domain-specific knowledge base in SKOS we've reached conceptual relations. So I'll try and walk you through that quickly. We can, we'll dive into the specifics here of each step, starting with the NLP pipeline there and our partners at Lynda who are available at this conference, have been working with us and we've made use of their technology there on the NLP side to perform shadow NLP extraction tokenization parts of speech.
NICOLAS SEYOT: We went deeper with syntactic parsing and named entity concept extraction with common steps. But where Lymba was very helpful, is in the extraction of complex relationships out of taxonomy. You're seeing an example there, being able to represent these complex relationship and that knowledge extract that knowledge from the corpus with the NLP pipeline.
NICOLAS SEYOT: We ended up with a full ontology. Close to a half a million concept there. A number of complex semantic relations. And richness, if you will, to start contemplating the transformation into SKOS. We were very interested in using SKOS for the purposes of enriching content and enriching taxonomy.
NICOLAS SEYOT: I think to the power of the simplification there, the conceptual representation was essential to us. So we transformed a number of the complex relationship into the broader, narrower, and related SKOS properties. And then we took a few more steps after that in the automated creation. So out of the NLP processing, we had a fairly low recall on relationships.
NICOLAS SEYOT: And in this use case, relationships were essential to the quality of the results in a search application or if we're going to take a taxonomy from a particular department and enrich it, we were really keen to creating more relationships. So both for associated relations and hierarchical relations, we went to a deep learning approach on the associated relationships to find more related concepts.
NICOLAS SEYOT: Processed our corpus, curated corpus again, and were able to get great results there, in particular for multi-brand concepts. And generated a much broader set of related concepts in the SKOS knowledge base, thanks to [INAUDIBLE].. We also processed the same corpus syntactically for additional hierarchical relationships, including acronyms. And we were able to look for the presence of concepts, if you will, at the syntax level throughout the knowledge base, and able to generate more parent child relationship and our relationship out of that.
Segment:6 Minimizing manual curation.
NICOLAS SEYOT: So the last two steps of the knowledge base creation there, we had to again, the effort throughout, was to minimize the manual touch points. So we went to great lengths to stay away from manual curation as much as possible. Took steps to curate the knowledge base, remove the noise, merge concepts with their acronyms, merge concepts with their various forms and plural singular. Went through a process of leveraging WordNet to eliminate concepts with similar meaning.
NICOLAS SEYOT: And then we added definitions that we had at the firm in structured forms to complete the knowledge base and really get a richer corpus in SKOS with descriptions and a set of associated and hierarchical relationships throughout.
Segment:7 Identifying topics.
NICOLAS SEYOT: The final step in the journey there of the automated creation of that knowledge base was the identification of topics. So in an application like semantic search or the enrichment of taxonomies, having enriched conceptual network, I think the base of knowledge base is 200,000 concepts, can be a bit overwhelming. And we were eager to identify dominant concepts of topics in the network. We again, wanted to avoid the traditional pitfalls of the topic modeling or manual intervention.
NICOLAS SEYOT: And so we came up with, or lead scientist, Mohammed Muid,, came up with a quite innovative approach there. We treated each concept as a centroid of a cluster. Built a graph for each of those. And excuse me, just trying to switch. Build a graph for each cluster. And the technical details will be available after the presentation.
NICOLAS SEYOT: We then proceeded to page rank each of the concepts within the cluster for, throughout the knowledge base. So each concept being the centroid of its own cluster. Page ranking the concepts related to that centroid. And then once we had obtained the page rank scores, we were able to identify topic candidates.
Segment:8 Applying logic tests to evaluate the knowledge base overall.
NICOLAS SEYOT: The last steps from there was to look at the knowledge base overall. And we then decided to apply a logic test to select topic candidates that were neither leaf nor too broad. And we've been very, very pleased with the results. Again, no manual intervention there. And we've been able to stop.
SPEAKER 2: Nic, one minute.
NICOLAS SEYOT: Thanks,.. [One Minute] Yep. We're on the last page. So we've been able to put that knowledge base to work in the context of semantic content search. We've been able to put that knowledge base to work in a context of knowledge graph enrichment. And add conceptual tags to both our knowledge graphs and the content that we're searching, enriching the index and enriching the documents themselves.
Segment:9 Conclusion.
NICOLAS SEYOT: Thank you for your attention today. And very glad to have the opportunity to answer some questions and connect with the other presenters throughout the conference.