What is the FAIR Journals Data?

Last updated on January 16, 2025

Relevant, high-quality journals data that you can use quickly together with your own data.

The FAIR journal data set from Elsevier has been created to address the data curation/cleaning issues. It contains FAIRified Life Sciences full text journal content from Science Direct and is made available in SciBIte Search or via SFTP.

An abundance of literature databases exist out there allowing users to search across abstracts, however a wealth of information is “hidden” within the full text. Abstracts are usually comprised of short succinct sentences focusing on the most important findings, by comparison the full text contains the methods, models, tables, experimental details and much more. Analysis had found that the majority of claims within an article are not reported in the abstract and many explicit protein-protein interactions for example are only mentioned in the full text.

Accessing the information you require will be difficult as about 1.8 million research articles get published in a year. Integrating research also poses as a challenge due to the lack of consistent terminology across articles.

With FAIR Journals Data offerings, customers receive an XML file containing full-text and a JSON file containing a combination of normalized fields extracted from the XML and enrichments for each article included in the therapeutic area(s) to which they subscribe.

The entities are drawn from VOCabs, which is SciBite’s flagship collection of manually curated vocabularies with >20 million synonyms, specifically tuned for NER text analytics. The following vocabs are used for entity extraction.

VOCABULARY

EXAMPLES

ANATOMY

Heart, Lung

BIOASSAY

Radioligand binding method, fluorescence microscope filter

BIOCHEM

Trypsin, Glycine

BIOVERB

Binds, inhibits

CLINICAL PROCEDURES

Biopsy, blood cell count

ENDOGENOUS BIOLOGICAL MOLECULES

Serotonin

ETHNICITY

Caucasian, Japanese, Jewish

GEOGRAPHICAL LOCATIONS

Boston, Cambridge, London, New York

ENZYMES

Lipase, alpha-galactosidase

BIOLOGICAL PROCESSES

apoptosis

CHEMICALS

Ethanol, sodium chloride, aspirin

CHEMOTHERAPY

MEL-dex, ChlVPP,CHOP

CELL LINES

HeLa, CHO

CELL TYPES

T-Cell, Lymphocyte

COMPANIES

Pfizer, AstraZeneca

COUNTRIES

Spain

DBSNP

rs3737626

DRUGS

Lipitor, Viagra, Gleevec

DRUG TYPES

anti-histamines, painkillers

GENE ONTOLOGY

RNA splicing, B cell proliferation

HGNCGENE

P53, BRCA1

HUMAN PHENOTYPE

Abnormality of the pulmonary veins

INDICATIONS

Asthma, Psoriasis, Breast Cancer

LABCHEM

Dimethyl Sulfoxide, lithium aluminum hydride

LABPROC

Logistics Model, containment of biohazards

MIRNA

mir-101, let-7

NCIT

Small cell lung carcinoma

CLINICAL PHASE

Phase III

PROTEIN TYPE

Ion Channels, Protein Kinases

SPECIES

Human, mouse

VIRUSES

Ebola

The normalized and enriched data makes the full-text easier to interoperate and reuse with other customer data sources, and more readily supports modelling and search applications.

Did we answer your question?

Related answers

Recently viewed answers

Functionality disabled due to your cookie preferences

For further assistance: