Ayman
Maqsood
and
T. Jesper
Jacobsson
*
Institute of Photoelectronic Thin Film Devices and Technology, Key Laboratory of Photoelectronic Thin Film Devices and Technology of Tianjin, College of Electronic Information and Optical Engineering, Nankai University, Tianjin 300350, China. E-mail: jacobsson.jesper.work@gmail.com
First published on 2nd November 2022
One of the most common metrics of success for a perovskite solar cell is the device efficiency. Other metrics, like long-term stability and price, are also vital for commercialisation, but it is the increase in device efficiencies that has given the halide perovskites their place in the spotlight. From an academic perspective, another important metric is a paper's number of citations. That may be a flawed measure of a paper's importance but, for better or worse, it is used as a key indicator in evaluations and is thus of in-universe academic importance. A recurrent theme in perovskite research has been a race towards higher solar cell efficiency, and a common sentiment has been that efficiencies must be high for the work to be taken seriously and for it to attract citations. This paper looks closer at that assumption by analysing citation data, as a comparative measure of success, with respect to device efficiencies extracted from the Perovskite Database Project. An analysis of 7330 papers reporting original device data shows that except for papers that report record, or close to record, PCE-values, the device efficiency has little effect on the number of citations. Except for top papers, the device efficiency has close to zero predictive power with respect to the number of citations. The PCE-values are also only weakly linked to the journal impact factor. The device experimentalist can thus rest assured that unless top class efficiencies are near, the perovskite community will, on average, see beyond the device efficiencies and recognise the work for its other qualities. An increased focus on the core message may thus be more valuable with a better impact payoff than an intense optimisation endeavour increasing the efficiency with one or two more percent units.
While aspects other than efficiency merits attention and scrutiny, the race for efficiency easily gets the upper hand. What is the efficiency? That is regularly the first, and sometimes the only question asked to the experimentalist. Taking this question and the attention given to each new record into account, an implication is easily hypothesised. If only your devices were a bit better, your research would be more valuable, your conclusions more well founded, and the impact of your work would greatly increase. This assumption has some merits. High efficiencies are after all central for technological success, behaviour observed in poor devices may be artifacts absent in the more relevant case of better devices, and new records do get a disproportionate amount of attention. In this paper we show that in terms of impact, this idea may not be entirely true, which is interesting as it is an idea with potentially negative side effects. Examples of the latter is the slippery slope between considering efficiencies important and considering them as the only thing of importance. A mental shift in that direction can easily result in excessive resources being devoted to labour intensive device optimisation with limited outcome in terms of increased scientific understanding. In a worst-case scenario, it may even incentives sloppy statistics, cherry picking of results, and neglect of best practices during measurements. This could ultimately result in dissemination of overinflated device metrics, reproducibility problems, and a general erosion of trust.
A data analysis from the neighbouring organic PV community indicates a systematic problem with overinflated device performance.5 There are few reasons to believe that the perovskite community would be spared from such problems; especially given documented perovskite peculiarities like JV-hysteresis,6–8 initial burn-in,9 and problems with stability.10–12 The perovskite community is by no means oblivious to those perils, and much efforts has recently been given to development of measuring protocols, both for general evaluation of emerging solar cell materials,13–15 and for perovskites specifically.16–20
In this paper, we approach the problem by taking a closer look at the idea that a project's impact primarily is derived from device efficiencies by analysing citation data. The number of citations may be a flawed measure of a paper's importance.21–23 The underlying assumption that it correlates with the insights provided is, however, not unreasonable, and for better or worse, it is a key indicator when grants, positions, and prestige are distributed. Citations are thus by construction and usage a proxy of the in-universe academic importance of a paper, even if it may differ from its non-academic significance. Citation data thus provides an entry point for analysing the impact of device efficiencies. This is here done by utilising the Perovskite Database Project24,25 which records data for most of the perovskite solar cell devices described in the literature up to early 2020, and increasingly also for more recent devices. In December 2021, the database contained measured device efficiency for over 42000 devices, described in 7330 papers. The highest efficiency for each paper has been filtered out, and the corresponding citation data has been downloaded from Crossref.org on 2022-03-06.
Fig. 1 (a) Heatmap of the top device efficiency in papers vs. publication year. (b) Efficiency thresholds vs. publication year. I.e., the device efficiency required to belong to the 20, 10, or 5% of the papers that reports the highest device performance. (c) Distribution of citations per paper for the entire dataset. For data separated per year, see ESI.† To prevent undefined log-values, 1 has been added to papers with zero citations. (d) Highest reported PCE per paper vs. publication year with the number of citations given as point size. (e) The data in (d) averaged out into a heatmap. (f) The fraction of the citations attracted by the top papers. |
A few papers have gained the status of breakthrough papers and get a disproportionate number of citations. Not surprisingly, the papers with the highest efficiency at any given time tend to be among those (Fig. 1d and e). A record is, after all, state-of-the-art per default. It is, however, far from a winner takes all situation, and the number of citations per paper approximately follows a log-normal distribution with respect to device efficiency, both for the complete dataset and for data separated by year, (Fig. 1c and ESI†). This means that the top 5% papers, with respect to device efficiency, attract around 20% of the total number of citations, Fig. 1f.
What may be more interesting than the attention given to top papers is how the citations are distributed between the remaining papers. To address this, the papers have been sorted in order of their highest device efficiencies, and the cumulative citation count has been computed and divided with the total number of citations. This is illustrated for papers published in 2019 in Fig. 2a, where also the corresponding device efficiencies are given. The cumulative citation density takes the form of a straight line for papers with lower device efficiencies and bends upwards at the highest efficiencies. The upwards bend mirrors that the top papers attract a disproportionate number of citations. What may be surprising is how straight and smooth the part for lower efficiency papers is, and how far it stretches.
Fig. 2 (a) Papers are ordered with respect to the efficiency of the best device reported in them. The cumulative density of citations (i.e., the cumulative sum divided by the total number of citations) is plotted with respect to paper number (divided by the total number of papers to get a scale between 0 and 1). A straight line is fitted to the data between 0 and 0.8. The corresponding device efficiency is given as a red dashed line. (b) Average of the logarithm of the citations vs. PCE. The PCE data has been binned in 0.5% intervals, and the size of the dots corresponds to the number of papers with that efficiency. Straight lines have been fitted to the data for the top 20% and bottom 80% of all papers. The analysis is done for papers published in 2019. Corresponding data for 2013–2020 is given in the ESI.† (c) The same data as in (b) but given in the form of a box plot which illustrates the spread of the data. |
This shows that for papers not competing in the higher echelons of device efficiencies, the PCE has little impact on the average number of citations. That holds true also for the small fraction of papers reporting very modest efficiencies, possibly because they focus on non-device related aspects.
There is no clear-cut boundary, but at around 80%, in terms of the number of papers sorted after device efficiency, the efficiency begins to have a more positive impact on the average number of citations. I.e., for any given year, only the papers among the top 20% will on average see a clear positive impact on the number of citations due to high device performance. The absolute PCE value for this threshold increases from year to year (Fig. 1e).
To analyse this further, the PCE values have been binned, and the average of the logarithm of the number of citations for the papers in each bin have been plotted with respect to PCE (Fig. 2b). In this representation we see a small increase in the number of citations when the PCE increases for the bottom 80% of the papers. A linear regression to that part of the data gives a positive slope of 1.04. One extra percent unit of reported efficiency thus on average leads to approximately 4% more citations (from 2014 and forward). If a line instead is fitted to the remaining top 20% papers, the sloop is 1.65.
The spread in the citation data with respect to device efficiency is, however, large (Fig. 2c). The predictive power of the linear regression is thus low. For the bottom 80% papers, the R2-value is only 0.02 (after 2013). There is thus essentially no statistically significant correlation between device efficiency and the number of citations for those papers. For the top 20% papers, the R2-value is higher (0.18), but the device efficiency still only explains a minority of the variance in the number of citations.
The analysis has been done for each year between 2013 and 2020 (Table 1 and ESI†). Slight variations in the details are seen between the years, but the overall trends are surprisingly consistent given that they are based on experimental citation data.
Bottom 80% of papers | Top 20% of papers | ||||||||
---|---|---|---|---|---|---|---|---|---|
Year | PCEcutoff | dlog(C)/dx | dC/dx | MEA | R2 | dlog(C)/dx | dC/dx | MEA | R2 |
2013 | 12.0 | 0.08 | 1.20 | 0.41 | 0.15 | 0.01 | 1.01 | 0.34 | 0.00 |
2014 | 13.6 | 0.03 | 1.07 | 0.34 | 0.04 | 0.23 | 1.68 | 0.39 | 0.22 |
2015 | 15.3 | 0.02 | 1.05 | 0.36 | 0.02 | 0.20 | 1.57 | 0.37 | 0.19 |
2016 | 16.9 | 0.02 | 1.03 | 0.37 | 0.01 | 0.17 | 1.47 | 0.34 | 0.16 |
2017 | 18.0 | 0.01 | 1.03 | 0.37 | 0.01 | 0.20 | 1.60 | 0.32 | 0.17 |
2018 | 18.9 | 0.01 | 1.02 | 0.39 | 0.01 | 0.21 | 1.64 | 0.33 | 0.13 |
2019 | 19.6 | 0.01 | 1.03 | 0.36 | 0.02 | 0.25 | 1.79 | 0.35 | 0.15 |
2020 | 20.1 | 0.02 | 1.04 | 0.37 | 0.03 | 0.25 | 1.76 | 0.27 | 0.25 |
Another measure of a paper's significance is the journal impact factor. Given that a small fraction of papers stands for a disproportionate amount of all citations, the impact factor is a dubious measure of a single paper's merits. This is universally known, yet widely ignored, and in some countries, it remains a key performance measure in academic evaluations.
In the dataset, there are 335 journals represented. Many of them only feature one or two perovskite articles and 80% of all papers are found in 44 journals. 95% of papers are found in 120 journals (ESI†). Not surprisingly, also for perovskite papers the average number of citations correlates with the journal impact factor (Fig. 3a). The journals with highest impact factors not only get the most citations, but also feature reasonable high average device efficiencies (Fig. 3b). While breaking down the data on a year-by-year basis, the journal impact factor clearly correlates with the number of citations (Fig. 3c and ESI†). It also correlates with device efficiency over the entire range of journal impact factors (Fig. 3d and ESI†). The later correlation is not particularly strong. A linear fit of the PCE vs. impact factor shows an average R2-value of 0.11 (ESI†). It is, however, stronger than the correlation between PCE and the number of citations. This is curious as a higher PCE is correlated to a higher impact factor, which is correlated to more citations, but the PCE has as discussed above almost no correlation to the number of citations except for the best devices. It is difficult to entangle all possible reasons behind this observation. While speculative, one interpretation is that the idea that a paper's merits are based on its reported PCE-values is more strongly expressed in the review process than in the subsequent considerations that lead to the paper being cited. The top journals, e.g., Nature, Science, and a few more, are outliers. They tend to feature new records and larger breakthroughs and will score high in both reported PCE-values and citations frequency. For the remaining journals, the PCE-values seems to be one factor playing a role in the sorting that is a part of the review process where authors chaise the most prestigious journal possible for their manuscripts while reviewers judge their merits. Once published, most papers do, however, appear to be cited based on reasons extending beyond PCE-values, and those reasons appear to be correlated with the chance the manuscripts have of getting accepted by a more prestigious journal. This would imply that the perovskite community may judge the importance of high PCE-values differently while reviewing papers than while acting in the capacity of authors citing papers.
Fig. 3 (a) Average number of citations vs. journal impact factor for journals with perovskite papers. Each circle corresponds to one journal, and the size of the circles corresponds to the number of perovskite papers. (b) Median top-PCE vs. journal impact factor for the complete dataset. Each circle corresponds to one journal and the size of the circles corresponds to the median of the citations for all perovskite papers in that journal. (c) Average of the logarithm of the number of citations per journal vs. journal impact factor for all papers published in 2019. (d) A boxplot of device efficiency binned in 0.5% intervals vs. journal impact factor for all papers published in 2019. Complementary figures are found in the ESI.† |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2ya00261b |
This journal is © The Royal Society of Chemistry 2022 |