Skip to Main Content

Linguistics: Corpora

This guide will introduce you to the systematic study of language in general


The following is a non-exhaustive list of corpora that are either freely accessible, or to which the Library subscribes that may be useful in Linguistics.

The Library also has a LibGuide on Text and data mining specifically.

Licensed data sources

Data source About Data access Further information
Adam Matthew digital Primary source collections from the social sciences and humanities API, via Library request (select your Faculty liaison librarian) Data mining/text mining statement
American Medical Association JAMA: The Journal of the American Medical Association is a peer-reviewed medical journal published 48 times a year by the AMA. It publishes original research, reviews, and editorials covering all aspects of the biomedical sciences. Register for an account with JAMA; Data is packaged in subscription-level sets (eg, JAMA Internal Medicine 1998-current) and downloadable in JSON format JAMA Network text and data mining services
Brill online Primary sources, books, and reference resources in the Humanities and Social Sciences, International Law and selected areas in the Sciences Content may be downloaded for TDM, via Library request (select your Faculty liaison librarian)  
Emerald insight Scholarly academic journals, case studies, and books in the fields of management, business, education, library studies, health care, and engineering Via CrossRef’s TDM service CrossRef community forum
Gale Primary Sources Historical primary source archives incorporating monographs, manuscripts, newspapers, maps, and photographs Access through Gale Digital Scholar Lab. Data Mining, Textual Analytics, the Digital Humanities, and Gale
HathiTrust A collection of millions of titles digitized from libraries around the world.

A suite of tools and services through the HathiTrust Research Center 

JSTOR Selected content from JSTOR, a digital library of academic journals, books, and primary sources JSTOR Data for Research JSTOR Data for Research Dataset Services
Oxford University Press Access points for reference resources from Oxford University Press:
Oxford Art Online
Oxford Music Online
Oxford Scholarship Online

Via consultation:

Please copy into request for local assistance

Oxford Art FAQ
Oxford Music FAQ
ProQuest TDM Studio Visualizations interface - newspapers (depending on Monash subscriptions) and ProQuest Dissertations and Theses
Workbench interface - includes most journals, newspapers, dissertations, theses, and primary sources available through Monash subscriptions 

Through record in Search - TDM Studio Visualizations (no coding skills required) or TDM Studio Workbench (R or Python needed)

ScienceDirect Scholarly eJournals and selected eBooks published by Elsevier API Elsevier ScienceDirect APIs
Scopus Abstracts and citation data from a large multidisciplinary corpus covering published material in the STEM and Humanities API Elsevier Scopus APIs
SpringerLink Multidisciplinary collection of online resources covering life, health and physical sciences, social sciences, and the humanities API or CrossRef's TDM service

Springer text and data mining policy

Taylor & Francis online Scholarly journals, ebooks, and reference works in the Humanities, Social Sciences, Behavioural Sciences, Science, Technology and Medicine sectors Via Library request (select your Faculty liaison librarian)  
Web of Science Journals API Supports rich searching across the Web of Science to retrieve full item-level metadata, including times cited counts, contributor addresses/affiliations and finding data.  The API is performance limited based on the API plan chosen by the institution. See further information for details on current API  plans.

Sign up and register an application for access to the API subscription.

Developer Portal

Web of Science Journals  API page



Wiley Online Library Multidisciplinary collection of online resources covering life, health and physical sciences, social sciences, and the humanities API or CrossRef's TDM service Text and data mining agreement