First of all, the purpose of the ranking experts is not to give good or bad marks. Its objective is to suggest relevant experts on a given topic in order to facilitate a possible connection.
In order to establish this ranking, we need to carry out several operations. First, we question several data sources that return the most relevant documents for a given query (see article "How are experts identified on ideXlab?" for more details). Different sources sometimes return the same document, so we proceed to a de-duplication of identical documents. Then, we have to disambiguate the authors: indeed, before classifying the experts, we have to make sure that two different authors with the same name will not be confused, which would distort the ranking. This case is more common than it seems, especially for certain nationalities or surnames (Wang, Smith, Martin, Garcia, Rossi, Kim, etc.) Another problem, on the contrary, is the case of authors whose names are not written the same way in different publications : Juan Luis Da Ponte for example (fictitious) can be found as JL Daponte or Juan L. Da Ponte depending on the sources and publications. In this case, the experts must be merged, again so as not to penalize the ranking.
Once these operations have been carried out, it is necessary to compare the author's publications with the query. We use a "similarity measure" which compares the keywords of the query to the vocabulary used by the expert in his/her publications. This measure of similarity is quite subtle because it is not limited to the keywords of the query: it also takes into account an extended vocabulary which allows to be much more precise in measuring the "distance" between the expert and the query. Finally, we add to our 'secret sauce' bibliometric data such as the number of publications of the expert, the impact factor of the review in which the expert has published, the rank in the author's order (the first author in most scientific subjects is the one who has done most of the work, the last one is often the supervisor of the work).
All these elements allow us, for each query, to calculate in real time a score for each author and thus to carry out the ranking.
We have chosen not to use the 'h-index' or the publication's number of citations to calculate this score. We consider that taking these elements into account introduces a bias in favour of senior authors and disadvantages the most recent publications, which are also of great interest to our users.
We are encouraged in our choices by the calibration measurements that we perform regularly to verify that the lists established by our algorithms do indeed return the recognised experts in a field. With the help of reference lists, we calculate the "precision" and "recall" parameters of our results. We can maximize them by varying the parameters at our disposal.
Having said all this, ranking experts is not an exact science.
We are constantly looking for improvements, as are other companies and academic groups interested in the field.
So do not hesitate to send us your comments or suggestions!
Comments
0 comments
Please sign in to leave a comment.