What is the FAIR Journals Data?
Last updated on January 16, 2025
Relevant, high-quality journals data that you can use quickly together with your own data.
The FAIR journal data set from Elsevier has been created to address the data curation/cleaning issues. It contains FAIRified Life Sciences full text journal content from Science Direct and is made available in SciBIte Search or via SFTP.
An abundance of literature databases exist out there allowing users to search across abstracts, however a wealth of information is “hidden” within the full text. Abstracts are usually comprised of short succinct sentences focusing on the most important findings, by comparison the full text contains the methods, models, tables, experimental details and much more. Analysis had found that the majority of claims within an article are not reported in the abstract and many explicit protein-protein interactions for example are only mentioned in the full text.
Accessing the information you require will be difficult as about 1.8 million research articles get published in a year. Integrating research also poses as a challenge due to the lack of consistent terminology across articles.
With FAIR Journals Data offerings, customers receive an XML file containing full-text and a JSON file containing a combination of normalized fields extracted from the XML and enrichments for each article included in the therapeutic area(s) to which they subscribe.
The entities are drawn from VOCabs, which is SciBite’s flagship collection of manually curated vocabularies with >20 million synonyms, specifically tuned for NER text analytics. The following vocabs are used for entity extraction.
VOCABULARY |
EXAMPLES |
|---|---|
ANATOMY |
Heart, Lung |
BIOASSAY |
Radioligand binding method, fluorescence microscope filter |
BIOCHEM |
Trypsin, Glycine |
BIOVERB |
Binds, inhibits |
CLINICAL PROCEDURES |
Biopsy, blood cell count |
ENDOGENOUS BIOLOGICAL MOLECULES |
Serotonin |
ETHNICITY |
Caucasian, Japanese, Jewish |
GEOGRAPHICAL LOCATIONS |
Boston, Cambridge, London, New York |
ENZYMES |
Lipase, alpha-galactosidase |
BIOLOGICAL PROCESSES |
apoptosis |
CHEMICALS |
Ethanol, sodium chloride, aspirin |
CHEMOTHERAPY |
MEL-dex, ChlVPP,CHOP |
CELL LINES |
HeLa, CHO |
CELL TYPES |
T-Cell, Lymphocyte |
COMPANIES |
Pfizer, AstraZeneca |
COUNTRIES |
Spain |
DBSNP |
rs3737626 |
DRUGS |
Lipitor, Viagra, Gleevec |
DRUG TYPES |
anti-histamines, painkillers |
GENE ONTOLOGY |
RNA splicing, B cell proliferation |
HGNCGENE |
P53, BRCA1 |
HUMAN PHENOTYPE |
Abnormality of the pulmonary veins |
INDICATIONS |
Asthma, Psoriasis, Breast Cancer |
LABCHEM |
Dimethyl Sulfoxide, lithium aluminum hydride |
LABPROC |
Logistics Model, containment of biohazards |
MIRNA |
mir-101, let-7 |
NCIT |
Small cell lung carcinoma |
CLINICAL PHASE |
Phase III |
PROTEIN TYPE |
Ion Channels, Protein Kinases |
SPECIES |
Human, mouse |
VIRUSES |
Ebola |
The normalized and enriched data makes the full-text easier to interoperate and reuse with other customer data sources, and more readily supports modelling and search applications.
Did we answer your question?
Related answers
Recently viewed answers
Functionality disabled due to your cookie preferences