Skip to main content

Text and data mining: Licensed data sources

Useful information

If you would like to request a licensed data set, please complete the data request form. Be advised that a multi-collection provider (such as Adam Matthew, Proquest, and Gale) at this time will only allow certain collections within their platforms to be mined due to publisher restrictions to content. 

If the data is highly restrictive (e.g. single user) or does not have broad appeal to the Monash University community, payment and storage of the data may need to be provided by the researcher(s), not the Library.

Licensed data sources

Data source About Data access Further information
Adam Matthew digital Primary source collections from the social sciences and humanities API, via Library request Data mining/text mining statement
American Medical Association JAMA: The Journal of the American Medical Association is a peer-reviewed medical journal published 48 times a year by the AMA. It publishes original research, reviews, and editorials covering all aspects of the biomedical sciences. Register for an account with JAMA; Data is packaged in subscription-level sets (eg, JAMA Internal Medicine 1998-current) and downloadable in JSON format JAMA Network text and data mining services
Brill online Primary sources, books, and reference resources in the Humanities and Social Sciences, International Law and selected areas in the Sciences Content may be downloaded for TDM, via Library request  
Emerald insight Scholarly academic journals, case studies, and books in the fields of management, business, education, library studies, health care, and engineering Via CrossRef’s TDM service Emerald TDM FAQs and TDM License
Financial Times Global business publication covering news, analysis and comment * A REST api  for datamining real time content
* An archive which goes back to 2005 with metadata annotations in json format - delivered via sftp
* An archive which goes back to 1982 text and datestamps only in json format - delivered via sftp
* An archive of OCR on pdfs from print content from 1888-2010 text and date stamps only - xml delivered via sft
Not part of library subscription. Extra costs involved. Via Library request
Gale Primary Sources Historical primary source archives incorporating monographs, manuscripts, newspapers, maps, and photographs Specialised online search interface, using term frequency and cluster analysis. Delivery of physical hard drive. Data Mining, Textual Analytics, the Digital Humanities, and Gale
HathiTrust A collection of millions of titles digitized from libraries around the world.

A suite of tools and services through the HathiTrust Research Center 

JSTOR Selected content from JSTOR, a digital library of academic journals, books, and primary sources JSTOR Data for Research JSTOR Data for Research Dataset Services
NearMap Information to come    
Oxford University Press Access points for reference resources from Oxford University Press:
Oxford Art Online
Oxford Music Online
Oxford Scholarship Online

Via consultation:

Please copy into request for local assistance

Oxford Art FAQ
Oxford Music FAQ
Proquest Historical and Primary Source material, including journals from different fields Delivery of physical hard drive Via Library request
SAGE Information to come    
ScienceDirect Scholarly eJournals and selected eBooks published by Elsevier API with token or web harvest; contact Collection Development for access as a static IP is needed Text and data mining agreement
Scopus Abstracts and citation data from a large multidisciplinary corpus covering published material in the STEM and Humanities API with registration and key; contact Library for access Elsevier Scopus APIs
SpringerLink Multidisciplinary collection of online resources covering life, health and physical sciences, social sciences, and the humanities API or CrossRef's TDM service

Springer text and data mining policy

Taylor & Francis online Scholarly journals, ebooks, and reference works in the Humanities, Social Sciences, Behavioural Sciences, Science, Technology and Medicine sectors Via Library request  
Wiley Online Library Multidisciplinary collection of online resources covering life, health and physical sciences, social sciences, and the humanities API or CrossRef's TDM service Text and data mining agreement