Monday 13 December 2010

The scholarly discovery curve

We've been discussing recommenders and discovery and serendipity in the scholarly research process of late, and have started to put together some partially-formed ideas. One such idea is described below - please let us know your thoughts in the comments!  This is not a formal view from the project, just some - possibly provocative - ideas we're knocking around.

Information overload

For most of the time, most researchers are struggling with information overload, combined with heavy workloads and time pressure.  A key factor in this area is email - too many emails coming in, demanding response, consideration of papers, reviews, meetings, and so on.  Scholars working with "Web2.0" tools may also have RSS feeds, tweets, scholarly networking feeds, and more, adding to their daily information burden. Some of this information is valuable, some less so, some cannot be evaluated without absorbing a great deal of time.

It is hard to imagine that yet more information would be welcomed at this point, unless it was of high quality, and could be seen to be of high quality without investigation; for example, a suggestion of a paper from a renowned scholar who understands one's field of work is likely to be interesting and reading it fruitful. However, a paper suggested by a less trustworthy source may turn out to be poor quality, irrelevant, or to not add anything to the existing discourse, and adds to the feeling of overload.

A researcher will be keeping an eye on new and emerging research in his field, perhaps through watching newly issued journals or attending conferences, or RSS feeds and newsletters. Only new material is of interest, as a rule (we assume that our researcher has been working in his field for some time, and therefore has a good knowledge of what is relevant from previously published material).


This changes when a scholar is mugging up on a new subject, or digging out to explore a certain angle of investigation.  At this point, new information is sorely needed! Let us consider the phases of research here...

The three phases of researching a new field








First of all, the researcher knows nothing (or hardly anything) about the area he is starting to study. Assuming he begins by searching a catalogue or the web, he can try some keywords and will get many results back. Each single search gives the researcher many new papers he has not yet seen; because the field is new to him, almost anything is somewhat useful, as a source of new references from the bibliography, or for some new facts (for almost all facts are novel).  The list of papers to review grows exponentially. There is no shortage of new information or new papers.





Next, the scholar reduces the effort put into finding more papers, and concentrates on parsing and organising the information he has to hand.  New papers might come to light, but until they've had time to review what they have already found, it's hard to tell if these are useful. So there's a levelling off of growth of the list of papers yet to be read. 





Finally, the scholar has read many papers in the field, has identified what seems important and what less so. Now is a time of filling in the gaps - trying to see if anyone has published on one very specific topic, making sure that all the papers cited by a major review paper have been looked at, and so on.  Now, most of the papers that the scholar comes across have already been read or seen; most of the new papers that might be found aren't relevant to the specific topic of investigation and can be dismissed. The scholar is fussier - only publications which add to the body of knowledge built up in the previous phase of research are useful, things which fill in the gaps or reassure the researcher that there aren't any gaps, that the search has "bottomed out".




Recommenders

Now, in ConnectedWorks we've been thinking about recommenders. Recommenders in the field of scholarly work are still somewhat new and experimental, but most efforts focus on suggesting a few more papers based on the papers already read by the researcher, in some way. This might be suggesting other papers which are written by the author of a paper one has already read, or by suggesting papers with keywords matching the keywords of a paper already read, or by suggesting other papers read by scholars who also read a paper one has read, and so on.

Let's consider how effective this might be for the phases of research we just discussed - starting with the time when the scholar is not exploring a new field, but just doing day to day work. During this phase,  new papers are of interest, but high reputation sources are prized over those which might just add to information overload, without adding research value.  A quality recommendation from a trusted recommender system would be valued, but can any existing system deliver this? Also, how well do current systems do at recommending new papers, as opposed to old ones an experienced researcher has probably already seen?

Then, we have a time of intense study into a new area, where many papers are found through every search or in every bibliography and are added to the list to read. It is hard to imagine that a recommender could be more effective than a keyword search here - and a small number of additional papers brings little additional value to the process. The researcher's basic search techniques are very effective and need little supplement.

After this, the researcher is reading and digesting papers, and occasionally coming across a new one which he hasn't read before. If the recommender didn't know exactly what was being read, and entirely up to date, it would most likely be suggesting papers which were "already on the pile", and this would be more of an annoyance than a gain. As such, a recommender built into a reference management tool which also tracked reading might well be helpful - although this might become less effective if the researcher's colleagues hand him printouts of work they have found useful!

Finally, the scholar is filling in the gaps. Here, again, a useful recommendation system would need to consider exactly what had been read so far - repeat papers are a nuisance at this phase. The researcher's field of interest is now very narrow indeed - things out of the scope of the study and even outside the scope of the gap-filling are not of interest. It would be a very high quality recommender indeed which could deliver papers to this exacting standards.



Tentative conclusions

As such, it seems to us that many recommenders fitting the common template of suggesting papers based on papers already read may not provide much scholarly benefit for researchers already established in their main field of study.

What do you think? Please let us know in the comments. 



(Many thanks to Dan Sheppard from the JISC Library Widgets project for cross-fertilization of ideas in this area!)