Name:
VisualizingInstitutionalResearchActivityUsingPersistentIdentifier
Description:
VisualizingInstitutionalResearchActivityUsingPersistentIdentifier
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/1558f91a-9482-4302-a7ef-8d6afc891567/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H38M09S
Embed URL:
https://stream.cadmore.media/player/1558f91a-9482-4302-a7ef-8d6afc891567
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/1558f91a-9482-4302-a7ef-8d6afc891567/VisualizingInstitutionalResearchActivityUsingPersistentIdent.mp4?sv=2019-02-02&sr=c&sig=ahVSgVnOykPCuEDG6L8ovtBmanFpzOcyu2wM61QoNw8%3D&st=2024-12-08T20%3A40%3A12Z&se=2024-12-08T22%3A45%3A12Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
That was great. And we had about 21 participants. So I hope everyone will hop on over here. They may open the chat and start watching. And looks like most everyone has made it over. So, I don't know. We didn't arrange in advance exactly how we wanted to do this.
But my preference is that people can use the mic in this if they want to just unmute themselves. I think that works well for a conversation. We don't have a whole bunch of prepared prompts for the speakers, so I will be relying on participation from all of UNLV watching for questions in the chat. Thank you netty for posting the Google Doc link. For the notes and I will open up the floor.
So just launch right in. If not, then you're going to have to hear me ramble on with questions that I wrote down as I was watching. So before. Before I ramble on. Do the speakers have anything that they want to add or thoughts that they have right now?
OK hearing nothing that I'm just going to start. And I really do encourage everyone to type things in chat or just unmute yourself and talk. So the question that I always, oh, Wendy has her hand up, please go ahead. Save save yourselves from me. I'm not trying to save us from you, you know. We have a good list of go to ORCID adoption in some of our colleges, and I have a pretty good list of publications by these people that are kind of connected to ORCIDs with duis, but they don't necessarily populated an ORCID.
So I'm guessing the answer is if I just learn about more about, Ah, I could bypass some of the steps in it and then pick up the process with what I have, rather than having to start from ORCID to pull the list of doses and then kind of continue on, or does what I really have to go through the whole process makes any sense? Yeah you know, in our.
Studio, like any code, things that are very step by step and you're executing line by line. And we do have code comments in there that give detailed instructions. So, yes, it is possible that you could. Kind of come in the middle and give it a spreadsheet of dishes, but. Or starting with a sort of list of awkward IDs.
That's a scenario we've already been asked about as well. So yeah, it all depends on the familiarity with Ah, I think like how easy that would be. But it's, it's possible. So I'll launch into a question as a follow on to that. So there are people who, of course, have data that they may want to use along with ORCID and know how to do things and are.
So I'm curious what other sorts of data people might want to match up and what sort of outputs and visualizations they might come up with? I had some things that occurred to me, but I actually have. I've never used Ar and I'm not very strong with ORCID yet either, so I'm curious what people have in mind. And just so you know, my, my headset just beeped. I'm going to have to change batteries. I'm going to do that now.
But don't feel like you have to wait on me. I can start. And then, Olivia, if you all have ideas, I think there are some. There are a few things that code improvements that we think can be made to link up to other data sources. So for example, right now, both from crossref but potentially connecting to data site would still do this.
But would get you sort of a wider, somewhat wider net and potentially drilling down to school department. Something a little bit more granular is something that requires a lot of would require a lot of work on the ORCID in making sure that those departments are filled in or are somehow manually backfilled and can get complicated, especially with people who work in interdisciplinary fields.
So I think there's a lot of questions around that, but that's another potential area of pulling in even more data, getting a little bit more granular. But yeah, if anyone else has ideas, I think, or anyone in the room has ideas that would be great to. Well, I would also just throw out there. Nagin actually had a great idea for a sort of 2.0 version of this that would do additional polling of sort of metrics about the doulas that could be built into the spreadsheet and then upload it to Tableau, and get like a new screen in Tableau just to visualize the sort of metrics.
So that would be like a functionality change addition to the whole product. But we just built this starting with some open code from other people. And it is openly available and we're happy to see people's interest and would love to see others develop it further. I'd like to keep developing it and hopefully they'll be more Fellows in future who can keep expanding it.
Yeah just to speak to the data site, crossref. I mean, like when testing, you know, with my own University data for like 2002, you know, I was getting like hundreds or even thousands of doses and then and the number of doses that it was not finding because they were it was not finding metadata on because they were minted from data site was like in the single digits.
So it would be more complete with data site. But that also gave me a sense of the proportion of doses that are created at crossref versus data site. So Ted has his hand up. Thanks really nice talk. Unfortunately, I had very choppy video. So it was it was hard for me to keep up a few times, but.
Can I just ask sort of just quick workflow kind of question. So you're searching ORCID. What's going into ORCID is an institution name or a Ringgold or Rau or something that identifies an organization. And you're getting back from that a list of doulas or you're getting back from that a list of ORCIDs.
So Yeah. Before running the script, the most complete way to approach it is to develop a list of all those persistent organization identifiers, email domains and organization names within your org that you want to cover. Like a lot of universities have, like subsidiary labs or things like that, a hospital system that you might also want to include your choice.
The first step of the script is to knowing all those identifiers and possible ways of identifying organizations will pull back all the ORCID IDs it can find with an affiliation, and it winnows those down to like the current. Researchers from that organization. From there, it then holds all the information about all of their works that all of the things that have dois so hampered.
So so it's pulling it's pulling the feelings of people who have that affiliation as their current, as their most recent work position. Yes so I do a lot of this stuff myself. So I'm a little bit interested in the details. Yes yes, there are many. Yes, it's the current it's the current researchers at your organization. At your organization, as much, you know, like this is like Nikki said in the beginning, we don't claim that this is.
Back 100% truth because people's ORCID profiles are sometimes inaccurate. So as far as we can tell from the ORCID profiles, these are all the researchers that have your organization as their current affiliation. Right? right. And then you're getting to work all of the works from those people.
Yeah and then you're searching crossref for those. Or you're searching crossref for other authors. First crossref needs to get searched for the Doi, so that's another link at every step. Data can drop off. Right because some works in the ORCID profile. Well, first of all, some works are not even in the ORCID profile.
Second, works in the ORCID profile might not have a Doi third that Doi might have. There's a, you know, a couple of cases proportionally that were minted by data site. This, it will still yield like thousands of doses and then it's got the metadata on the Doi from crossref. Yeah there it goes and gets the coauthors of that work.
Then like last step, it gets the ORCID profiles of the coauthors. So you see, it's just like branching, branching, branching, and it returns what it can. Yeah, but you could also. Once you get those authors, once you get the deal, then you go to crossref. By the way, I'm using open Alex instead of crossref a lot these time these days.
It has a lot. It has a lot more. Particularly I think it would have a lot more ORCIDs than crossref because they are collecting these ORCIDs in a lot of with a lot of screen scraping and various other things like that give you more ORCIDs than crossref, at least in a lot of the situations that I'm looking at. It's a simple API, but anyway, because once you get the deal and you get the authors, you can also get their institutions directly from their affiliations, right?
You don't need to take those allies and go after the affiliations again. Right? maybe it would be better if we talked about this sometime offline. But yeah, I was going to say that guys really are getting into the weeds, which is great for this sort of conversation.
But I see Adam has his hand up as well. Yeah so I wonder if you might want to put some of that into the notes document for further discussion and follow up, because I think there will be several people who want this level of detail, but I don't know that we have time to get into all the details right now. Right and Sheila, I'm just about to start that ORCID project with ucar.
So I will definitely be contacting you. Awesome so, Adam, did you still want to say something? Yeah Hi. Sorry I'm on iPad for various reasons, so I use bits when I can. So great talk and great to see some other work querying the record. So I'll stick the link to our code in the chat.
Sheila knows about it, I do believe. Oh gosh. Except I've got like three different things. So we've got another script that does lots of. Probing of the fabulous ORCID record and also is based on looking at affiliations and stuff and is currently down because the code maintainer has he had a tweak in mind and that tweak has broken the node script.
So I would at this point then point you all to show exactly what does. But he's currently fixing it back and he said, oh, it's a great idea and doesn't want to revert it because he thinks it's going to be really great when he does fix it because it basically scrapes the ORCID record for every single affiliation off the list. So we already got talked about the kind of like different affiliation kinds of the different affiliate.
And then he goes through and shows you all of the records for a different thing. And especially in the UK, a couple of the institutions use that as part of a starting point for advocacy. So they'll go down to the division level and say, you know, we've got tons of work, a really good coverage, for example, this school of chemistry. But you know what, immunology, you're doing really badly.
So they're actually going to talk to department heads about that kind of thing. And we have some kind of developments that we're looking at now for actually one of the things we're talking about, about writing back and like we've got links in to some of the Crescent systems systems systems, but we have like two issues with that. One is the kind of provenance and we can show in some instances like where the awkward ideas have come from and everyone's kind of all right with that.
But the other is in some cases we've kind of made a best guess, right? So we've looked at three or four different buoys and the ORCID IDs are pretty much co-occurring and we're fairly happy that that's OK. But there's not an assertion from the actual person that we're attaching that ORCID ID to and the metadata. Um, on kind of where these things come from isn't great. And that kind of.
Not actual authenticated thing, which is supposed to be part of the metadata. And this is nice, though, right? So I'm going to talk about this because it's really important that provenance metadata isn't there and and it should be. And I'm wondering from 2.5 from you guys, if you've looked at that part of the metadata of that kind of ORCID is authenticated.
And when you've been looking at writing this stuff back and whether or not you've actually considered those issues, when you've been looking at kind of updating these records with the ORCIDs that you've found. I'll see if I can find. I will say that we are not looking at writing data back to ORCID records based on the output of the script. It's simply just gathering the data that's there and creating a visualization based on that data and making it.
So that theoretically, anyone at any institution can do this themselves and create this kind of visualization. And there may be a lot of gaps in it, but they'll get something right and it's a starting point. So, yeah, so far we're not looking at. Writing anything back, it would kind of be up to each institution. Like we could use this tool as a way to expose gaps and be like, oh, it looks like a lot of your researchers don't have their place of employment populated on the ORCID record.
Maybe that institution wants to create an API integration that would add those affiliations to the researchers or good records. But we are not taking this data and then trying to make writing back assertions based on it, if that makes sense. Yeah I guess it's more kind of if you're sorry, if you're supplementing the information with rocket ideas, not from your direct queries, but from places other than the awkward record.
Is that clear in the way that it's reported? Again, I haven't looked at your so I just sent that directly to you. So that's a good point because I think we are getting some co-author ORCID IDs from the crossref metadata because yeah, Olivia, if you want to talk about that. So I think that's what you said right now. The script is not. Using any supplemental metadata like you're describing, but it does get the ORCID ID from the crossref metadata about the co-authors that other data is there and a code improvement.
It's just very, very, very messy. And a code improvement would be to try to make some of that data cleaner so that it could be used in order to keep trying to fill in blanks, which it already does to some extent. The issue you brought up about the provenance. This script does not detect whether the ORCID profile information was machine written or human written.
We definitely have thought about it a lot and we have, you know, run down like examples where we saw things that looked erroneous and those yielded the slide that was in the presentation with just a few examples of human entry error as far as we can tell. So I think. That I didn't really look into how you detect the provenance, like whether it's human or machine written and excluding one or the other, because we just wanted to visualize everything that we could.
But I can say that I really came to believe that this could be another good supporting visualization or argument for trying to get your organization to use the member API to write to the researchers or can profiles from your organization. Cool thank you. Helps with some of the thinking we've got for some of the updates more than happy to talk of float because I get another monopolize with the.
Many other thoughts. I don't make space for other people. So I'll do a little bit of a pivot here. And I'm wondering, it seems like a lot of this project was thinking about usage by institutions, usually universities or other places that are going to have researchers that are producing articles and scholarly output.
But I wonder if the content providers, publishers and platforms that are providing this sort of access to this output would be interested in a similar project. Looking at these are the authors we're publishing and who are they collaborating with, and then they have a different data set the metrics they know, which articles get exactly what kind of which usage and so forth.
So I'm wondering if there are other stakeholders or potential markets in quotes, because no one's paying for this data and what they might do with it. Are there any people on this chat that are not from academia that might want to chime in? Or the speaker's. I can say that within our ORCID consortium we have funders as well.
And already one of the funding organizations contacted us saying that they were really interested in it. Of course, for funders it's going to be a lot different because they're working with researchers from a bunch of different organizations. So it's another scenario where if the funder has the ORCID IDs for all of the people that they've given funding to, there could be some work around where we kind of slide that in to the script instead of running it from the beginning and then go from there.
But of course, that would take some additional work and changing the script and working with that. But but yeah, that's definitely like a case that has already come up. And then in terms of publishers, yeah, I'm not really sure because we don't really work with publishers, but if anybody else has insights. We'd love to hear that.
I think it would be a similar situation where they would have to start with a set of doulas. And awkward IDs. I mean, both. They would have both. This isn't answering your question and is actually somewhat of a somewhat tangential, I guess. But we've also talked about potentially having a script and a sort of visualization that goes with it for individual researchers.
So still within academia, but really like getting to that individual level where someone can track and follow the work that they're doing. And most importantly, I think use it for things like promotion and tenure or funding applications and have not quite out of the box but a little bit closer to out of the box representation of the work that they do.
And especially like Olivia we've talked about, Olivia mentioned, we've talked about potentially at some point it would be really great to combine this with metrics. So really focusing on the research impact on the individual level could be useful and demonstrating that impact for gain, whatever that looks like at the individual level might be another direction to go in at some point, hopefully with future Fellows.
Yeah I'll, I'll just add to that. We're we are definitely very much encouraging anyone who is interested to look at the code and experiment with it. If you want to make changes or any, any kind of improvements or changes or adaptations to the script that we already have are very, very much welcome.
We literally this project went from July through December and we just kind of finished up wrapping up our initial testing. And so we're definitely looking for kind of like beta testers. Anyone who wants to go ahead and start using these resources for your own organization and then giving us feedback or, you know, any questions that come up, we haven't this is like the debut of these outputs.
We haven't even, like, announced. Well, we announced to our members that this is coming, but we haven't announced, like, hey, we're ready for you to test it. So so, yeah, just keep that in mind, everyone. We'd love to have you test the code and look at it and, you know. Figure out additional ways that this could be helpful to you or others.
I have a question and I can't raise my hand because I'm the host, I guess. I think this has been really interesting because it's very practical. It's doing, it's showing, it's really quite tangible. But I'm wondering, when you were put doing your development and putting things together, did you find gaps in the landscape that you wish had been filled that would have made your practical scriptwriting easier or better, besides the enhancements that you listed?
Were there things you're like, oh, I just wish things were like x or y? Or you were really I mean, it seems like you were a really focused on your task at hand and moving forward in a very effective way. And maybe you didn't it wasn't time or need for that. So I'm just curious.
Yeah I mean, there's just many gaps in the data that's frustrating. So data gaps then. Yeah I mean. And that, that. We don't really have any control over, but that's why we're thinking it could more be like a tool for encouraging adoption and completion and the machine writing.
I mean, I think another like I mentioned this briefly on a slide, but a code improvement that I would like to have more time to. Work on further is just like filling in blanks where actually somewhere the script kind of does know about it but it just needs to do another iteration of like fill in, you know. And I just stopped at a certain point because it became marginal improvement.
But I think like. That a more elegant, like programmer could probably, like, get that done better. So that I found that to be kind of a frustrating thing and just resolving like. Sometimes like slight text variations of names and stuff. There's many ways that the script tries to do that, but that that's kind of like can always be improved.
Cool so, Ted, go ahead and take the floor. Yeah, I can ask questions for a long time, but one thing the math was the only visualization you had, right? That that's sort of the main visualization, right? So there's two different ones. There's the Tableau dashboard, which has some high level stats, as well as the map, which looks at collaborations.
And then there's also a script that produces different forms of network visualizations. That's the one that I mentioned, is very much early stages and I think could go RHO in a lot of ways. So there those are the two primary outputs as far as on the visual end. So the network visualizations are like graphs, right? Is that what we mean? Yeah Yeah.
Like social network visualizations. Yeah Yeah. OK I put a link in the paper. There's some people that are doing some work that looks interesting for integrating graphs and graphics of graphs into jupyter notebooks. Which might be another interesting way that you could present this rather than public Tableau.
And one other thing. We're talking in the next session about instrument identifiers. And there are a lot of facilities around that run instruments and that have users of those instruments and are interested in tracking usage of those instruments. So so as we get and the instrument identifiers are in data sight but as we so as we start using identifiers for different kinds of things and new kinds of things, this kind of tool for who has used this microscope or who has used this telescope or who has used this airplane.
This could be a really nice or another thing is field stations who has worked at this field station and we're working with a bunch of field stations in the US and French Polynesia for identifiers, raws in this case for field stations. So we can sort of plug-in a lot, lots of different possibilities here. That's a great point, Ted.
There is a relatively new like within the last five years that still feels new. There's a section of the ORCID record called research resources where organizations like a field station or if an organization has like a special lab or facility or equipment or anything, can basically collect the ORCID ID of the people who are using that resource and then write that to the person's ORCID record.
So that is like another direction that we could go with. This is like instead of searching for your employment affiliations. Hopefully, as more organizations start using this research resources section of the ORCID record, we could search for that, like who's used this resource and then map that or whatever the case may be. Thanks, Paolo, for putting the link in the chat.
But but yeah, also having the identifiers for the specific resources is going to be really helpful too. So yeah, that sounds awesome. I'm glad you remind me of that because I was reading about research resources in ORCID and there was a, there was a working group that wrote a document and maybe this is that document that Paolo made the link to. But I was having a hard time following that up to the present.
I wasn't sure that it had continued to exist. So it's yeah, it's still there right now. Right now it's mostly like Doe national labs that are really using that. There's a few other programs that are using it. There's a program I think called seed and yeah, but I mostly know of like the National labs, like Oak Ridge National Laboratory will write to your ORCID record if you use their facility, that kind of thing.
But we haven't like, for example, across all of our ORCID us community members, which are mostly universities but also like some funders and, and e SIP and other groups. None of our members are using it yet, but there's a lot of potential there. So I think it's just a matter of time before more organizations start using that. So exceed is the NSF supercomputer thing, right?
Is that the seed that you're talking about? I know it's like a collaboration between multiple universities. I've seen an example of when they've written a research resource, but I don't feel qualified to speak to exactly what it is. I just know that it exists. That's probably what it is. Thanks very much. We will definitely talk.
Awesome So we're coming up on the scheduled end time. So I wonder if there are any final thoughts from anyone or especially from the speakers. And I'm just going to take this as a reminder to please go to the Google document and add any of your thoughts, questions that will continue to exist after this question and answer session. And so it can be a living document.
And can I add one more thing? Please do. You try this with any human humanities people? I ask because there's fewer collaborations, and I'm wondering how this would fall apart for humanists. We didn't really differentiate between discipline. We just got all of the researchers or all of the individuals that had a certain affiliation.
So so Yeah. If, if, if you're in the humanities or another discipline where you're not really publishing in or collaborating with other people with, with publications that have a crossref deal, then you're not going to show up in, in the visualization. So yeah, that's a really good yet another kind of gap. But yeah, we didn't we didn't really look for specific disciplines when we did this.
We just got like everyone who, who would show up in our, in our search. But we are making progress in the humanities. The humanities Commons platform now allows people to sign in using their ORCID ID and they are planning to do more with ORCIDs. So that's something to look out for. But it's definitely heavily dominated by STEM disciplines by far.
And sorry, I'm going to steal just like 10 more seconds. And add to Greg Greg's comment about working well for digital humanities. I think like output just looks so different in a lot of the humanities, even within digital humanities, and is not necessarily tied to a die. So if anyone does have ideas on ways to incorporate things like digital humanities projects, I think that would be a really great way to enhance the visualization, the visual component and really get a better sense of collaborations because a lot of digital humanities and digital Scholarship work is highly collaborative but is not necessarily tied to a Doi, is not necessarily recorded in an ORCID profile, but is still very meaningful scholarly output.
So anyway, that's definitely another area to that I think could grow a lot. That's a really great point. I'm glad we had time to cover that. All right. Well, we are at time. That doesn't mean that we all have to jump off.
If you want to stay on a little later, I think we can watch him for say yes or no here. But I know we have a discussion and I'm not going to end the session until everyone's gone, but the next session starts in 15 minutes. So if you need time to do whatever it is you do between sessions, that's the time. Or if you like to spend your time.
I do. Having with others. But thank you everyone for your interest and the project. Thank you all. Thanks, Olivia. Thanks again. See you soon. Talk to you soon. Thanks, everyone, for attending.
Have you ever been. Awesome later. Good work. Bye bye. All right. Bye all right. Maybe I should end the session, then. I think so. Everyone has enough.
See you next time. Bye bye.