Sometimes ‘Year’ Isn’t a Year: How Web of Science Date Fields Mislead Bibliometric Analysis
Trend analysis in bibliometrics depends on accurate dates. But in Web of Science, a main source of bibliometric information, there are discrepancies between fields and the web search function. The official publication year of a document is recorded in the Publication Year field, labelled “PY”. However, some articles are uploaded online before their respective journal issue is officially published. The date of online publication is often, but not always, reflected in the Early Access field “EA”. The search function looks into both, which can lead to skewed results.
Why Dates Matter
Publication year is a fundamental variable in bibliometric analysis. It underpins trend studies, growth curves, and comparisons across authors, countries, or disciplines. If the definition of “year” is inconsistent, the analysis can be misleading.
My post, The Solo Act of Marxian Profitability Research: A Bibliometric Glance, presents an example of how publication year sits at the core of a bibliometric analysis.
The Problem
In data exports from Web of Science:
- Publication Year (PY): the official year an article appears in a journal issue.
- Early Access date (EA): the year an article is first published online.
When using the search function, it queries both these fields against the specified date range, and returns records where either PY or EA falls within it. This is where the distortion creeps in: it deflates early years and inflates later ones, if a researcher is only interested in the official publication year.
For instance, an article published online in 2024 but assigned to a 2025 issue may surface in searches filtered on either year. If you’re only interested in the official publication year, the article should only show up in 2025. But if your date range ends in 2024, its appearance inflates that year’s count. Conversely, if the range starts in 2025, the article may be excluded entirely, deflating the early years.
And that is how I came across this issue.
A Solution
What I did was to expand my search date range by a few years on either side. This ensured I captured early access documents that would otherwise sit outside the window. I then filtered the exports by the PY field, as my experience indicates PY is almost always filled and consistent with the official publication date.
This issue, like author counts, shows why bibliometric workflows need transparent tools. That’s the principle behind my project, BiblioPy.
About BiblioPy
BiblioPy is my open-source project designed to make bibliometric workflows transparent, modular, and reproducible. By exposing parsing logic and metadata definitions, it helps researchers avoid hidden pitfalls in bibliometric analysis.
Note: BiblioPy is still under active development. Features are being added iteratively, and feedback from early users is welcome.
If your organisation runs bibliometric or research-output analysis and wants a second set of eyes on data integrity, get in touch. This is exactly the kind of workflow audit Dianoetic does.