Journals data that has been enriched (FAIR) also available

Last updated on November 22, 2022

With approximately 1.8 million research articles published a year it can be difficult to both find and access the information you need. The lack of consistent terminology across articles makes it difficult to infer and integrate meaning. Raw data is unstructured with different companies/people referring to the same thing using multiple terms. Unless you extend your search to cover all the bases, itself not an easy task, it is difficult to be assured you’ve not missed something. It’s not always the use of different terms that can cause a problem, even small inconsistencies can cause issues in data integration, search and analysis.

"The biggest roadblock is that people who generate the data usually don't analyze it, and if you don't use the data, then you don't realize that even small things like consistent capitalization make a big difference downstream."
                                                                                           - Senior genomics data scientist, small Pharma.

To enable the data to be useful, companies can spend around 70% of their time on curating the data, yet only 30% analyzing the data, where the biggest gains can be had. This curation includes data models, ontologies and dictionaries. It's a big investment and requires heavy engineering effort but it’s beneficial in terms of enabling users to draw more and new inferences.

The Journals FAIR data set from Elsevier has been created to address the data curation/cleaning issues. It contains FAIRified Life Sciences journal content from Science Direct. Content within the full text of articles is identified and tagged with entities from 24 different vocabularies with >20 million synonyms, specifically tuned for NER text analytics.

For more information, please visit the FAIR Journals Data Support Center.

For further assistance: