Skip to Main Content

Text and data mining: Open data sources

Disclaimer when using Open Data Sources

The Open Data sources listed here are provided as a courtesy. The Library does not license these resources, and is not obligated to assist with API management, text and data mining, and other services. The responsibility is with the individual to contact the sites listed for more information.

Open data sources (see Disclaimer above)

Data source Description Data access Further information
Registry of Open Data on Amazon Web Services Usage examples for all datasets listed in the Registry of Open Data on AWS   Further Information
arXiv Bulk Data Access Open access to 1,467,637 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering and Systems Science, and Economics   Further Information
Australian Bureau of Statistics Population data from the Australian Bureau of Statistics (ABS)
API
ABS.Stat web services user guide
Australian Data Archive National repository for digital research data
Direct download and API
 
Australian Government's open data Find, explore and reuse Australia's public data    
Australian Institute of Health and Welfare Data on health and welfare issues in Australia    Further information
BioMed Central Scholarly articles in STM (Science, Technology, and Medicine) fields from peer-reviewed open access journals Direct FTP download, web harvest, or API  
BOM data feeds Weather data from the Bureau of Meteorology Direct FTP download BOM weather data services
Canada Open Data Search open data that is relevant to Canadians, learn how to work with datasets, and see what people have done with open data across the country    
CIA factbook The United States' Central Intelligence Agency's most popular publication -- The World Factbook    
CrossRef Text and Data Mining Services Crossref metadata helps researchers get access to this content and enables publishers to provide it    
Data.gov.au Public datasets from Australian government agencies
API
CKAN API guide
DataCite DataCite gathers metadata for each DOI assigned to an object. The metadata is used for a large index of research data that can be queried directly to find data, obtain stats and explore connections. All the metadata is free to access and review.   https://search.datacite.org/
DBPedia DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects.   Further Information
Facebook Graph API Social media data from online social networking service API with access token Using the graph API
Figshare Open access scientific data
 
Bridges (the Monash figshare repository)
Google Public Data Explorer Public Data
 
Google Dataset Search
HathiTrust Datasets Public domain works from full text digital library Online computational environment, direct download, or API

HathiTrust Research Center documentation or 

Data availability and APIs, datasets

IMF eLibrary data International Monetary Fund eLibrary data API Data services
Pew Research Centre Pew Research Center regularly makes available the full datasets that underlie most of their reports.   How to access Pew Research Center survey data
PDB Protein Data Bank Since 1971, the Protein Data Bank archive (PDB) has served as the single repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies   Archive download instructions
PLOS Open access content and metrics from PLOS journals API Text and Data Mining at PLOS
PubMed Central Open Access Subset Open access subset of full text archive of biomedical and life sciences journal literature at the US National Institutes of Health’s National Library of Medicine    
re3data A global registry of research data repositories FAQ
Research Data Australia Data discovery service of the Australian Research Data Commons (ARDC)    
Trove Digital content from Australian libraries, hosted by the National Library of Australia API with key  
Twitter Social media data from online social networking service API with key and access token API overview
US Government’s Open data Includes data from Library of Congress, Census Bureau, etc and tools and resources to conduct research, develop web and mobile applications, design data visualizations, and more. API  https://www.data.gov/
US Health open data Includes National Institutes of Health, National Library of Medicine, and data on a range of topics, like environmental health, medical devices, Medicare & Medicaid, social services, community health, mental health, and substance abuse.   https://healthdata.gov/
WikiData Structured data from Wikipedia and other open knowledge bases Direct download or API Wikidata: data access
World Health Organization Information to come