Best Practices for Processing Journals Content
Last updated on October 28, 2025
Each Journals content collection consists of two main folders:
- Backfile Folder:
Contains all XML files from the beginning of the collection up to a specific cutoff date, which is indicated in the folder name. This folder should be ingested once to establish the historical baseline. The latest 2 backfiles will be present in the folder. - Daily Updates Folder:
Contains ZIP files named by date (e.g., 2025-10-15.zip), each including all XMLs uploaded for that specific day. This folder is updated once daily around 04:00 GMT however, we suggest that customers process the data slightly later in case of any delays. Note - The date refers to the date the data was processed, and not the date that it appears in the folder (e.g. data produced on the 1st Jan, will appear in the folder dated 2025-01-01, but will be delivered first thing on the 2nd Jan).
Processing Strategy
- Initial Ingestion:
Begin by ingesting the backfile folder to capture all historical content. - Ongoing Updates:
Process the daily updates folder in chronological order, starting from the oldest available date. This ensures that article versions are handled in the correct sequence. - Handling Article Versions:
Articles may appear multiple times across different update files as they progress through publication stages. To manage this:
- Use the PII (Publisher Item Identifier) to identify duplicates.
- Retain the version with the highest publication stage.
Publication Stages (Field: item-stage)
Stage | Name | Meaning |
|---|---|---|
S300 or S350 | Published | Article is finalised and published. Article can still have changes, but the stage remains S300 or S350. S300 is for newer articles and S350 is used for older scanned articles. |
S250 | Article Ready but the Issue is In Progress | Content doesn’t change between S250 and S300/350. There can be multiple volumes that are in progress |
S200 | Corrected Proof | Article is ready but isn’t assigned a volume number or issue yet |
S100 | Uncorrected Proof | Authors correct mistakes in this version. It doesn’t look like final article, for example, it has line numbers and questions for author |
S5 | Journal pre-proof | This article stage was added later on and is the first stage nowadays. This is meant for early publishing, e.g., in 48 hours. PDFs include author’s manuscript. |
Article Retractions and Removals
Articles that are retracted, removed or retired are marked with:
- docsubtype = 'ret' (retracted)
- docsubtype = 'rem' (removed)
- docsubtype = 'rti' (retired)
These can occur at any stage but typically follow publication (S300). In such cases, the same PII may appear again with updated content and a changed docsubtype. Articles marked as retracted, retired or removed should be deleted from internal systems.
Journal Title-Level Considerations
Newly added journals – Elsevier constantly reviews journals for inclusion in the datasets offering. 3-4 times a year these journals will be added. Once these journals have been added customers will be notified of the addition and daily updates will include the new titles (where applicable). The backfile will also be refreshed and updated to include these titles. To fully benefit from the addition of the titles we suggest periodic updating of the backfile.
Changes in rightsholder or ownership – As well as new journals entering the Elsevier datasets offering at times, we also remove journals (e.g. Elsevier has divested a title to another publisher). Elsevier will provide a list of ISSNs that need to be removed from the dataset as per the customer contract. Journals will be removed on an annual basis unless there are urgent contractual reasons meaning we some changes may have to be done out of cycle.
Discontinued journals – For some discontinued journals Elsevier retains the rights to the content. In this case, we will keep the content in the backfile. If a journal is discontinued and we no longer have the rights to the content, we will then treat is as above.
Did we answer your question?
Related answers
Recently viewed answers
Functionality disabled due to your cookie preferences