Data Science Research Resources
Table of Contents
- Welcome Statement
- Software & Development Tools {#software-development-tools}
- Online Platforms and Interactive Tools {#online-platforms-interactive-tools}
- Programming Resources {#programming-resources}
- Textbooks & Learning Materials {#textbooks-learning-materials}
- Tutorials & Guides {#tutorials-guides}
- Data Sources & Databases {#data-sources-databases}
- Bioinformatics & Biomedical Resources {#bioinformatics-biomedical-resources}
- Interactive Data Visualizations {#interactive-data-visualizations}
- Women in Computing {#women-in-computing}
- Note
Welcome Statement
Hi! Welcome to research!
Of research, a wise person once wrote that, one never completes a research project in isolation of others. This bright quote signifies that finding and correctly harnessing appropriate resources for research is what makes for a successful conclusion.
Research is a systematic process of investigating a specific topic or question to gather information, analyze data, and reach conclusions. It involves forming hypotheses, collecting data, analyzing it, and sharing findings.
The Role of Resources in Research
To conduct research effectively, you need access to various resources that support your investigations. These resources provide the foundation for your work, helping you:
- Access existing knowledge: Books, journal articles, databases, and websites offer a wealth of information on a particular topic.
- Methodology and framework: Resources guide you on how to approach research, including best practices, data collection methods, and analytical techniques.
- Data and evidence: Primary and secondary data are essential for supporting or challenging hypotheses.
- Tools for analysis: Software, surveys, lab equipment, and online tools help you interpret complex information accurately and efficiently.
- Collaboration and networking: Interacting with other researchers or joining academic communities can facilitate knowledge sharing and new inquiry directions.

Software & Development Tools
Essential Installations
Programming Languages & IDEs
- RStudio - IDE for R programming
- R Programming Language
- Python Programming Language
- Jupyter Notebook - Interactive computing environment
- Google Colab - Cloud-based Python notebooks
Version Control
Bioinformatics-Specific
Data Science Platforms
- Databricks - Unified data platform
- Apache Spark - Big data processing
- RapidMiner - Data science platform
- KNIME - Visual analytics workflows
Visualization and Business Intelligence Tools
- Tableau - Data visualization and BI tool
- Microsoft Power BI - Analytics dashboards
- Plotly - Interactive graphs and dashboards
- D3.js - JavaScript visualization library
- Datawrapper - Charts used by journalists
Online Platforms and Interactive Tools
Code Execution & Development
- Deepnote - Collaborative notebooks
- Online R programming
- Snippets – Run any R code online
- JDoodle
- Jupyter Interactive Python
Analysis & Sentiment Tools
Programming Resources
Python Resources
- W3Schools Python Tutorial - Interactive learning
- Jupyter Interactive Python - Write code locally
- Python Programming Language
- Python for Biologists
- BioPython Tutorial
- Getting Started with Python in VS Code
- 11 Best VS Code extensions for Python (2022)
R Programming Resources
- Statmethods R programming Tutorial by Datacamp
- Machine Learning in R for beginners
- Intro to Machine Learning with R & caret
- ANOVA in R | A Complete Step-by-Step Guide with Examples
- Colors for Plotting in R
- Stat545: a reference for R and programming in Analytics
Textbooks & Learning Materials
Data Science & Statistics
Wickham, Hadley, and Garrett Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc., 2016.
Julia Silge And David Robinson. Text Mining With R: A Tidy Approach. O’Reilly Media, Inc., 2019.
Laurent Gatto. An Introduction to Machine Learning with R
Programming Fundamentals
- Allen B. Downey. Think Python (First Edition)
Specialized Topics
- Zuguang Gu. Circular Visualization in R
Tutorials & Guides
Machine Learning Tutorials
- Machine Learning in R for beginners (Interactive coding!)
- Your First Machine Learning Project in R Step-By-Step
- Machine learning with the “diabetes” data set in R
- Intro to Machine Learning with R & caret
- Data Science Dojo Tutorials
Statistical Analysis
- The Paired t-Test
- An Introduction to t-Tests | Definitions, Formula and Examples
- ANOVA in R | A Complete Step-by-Step Guide with Examples
Git & Version Control
- Allegheny College Department of Computer Science tutorials
- Coding Train’s Git and GitHub for Poets
- Setting Up Git on Linux
- Setting up Git
- ssh keys
- Luman’s ssh keys video tutorial
General Resources
Data Sources & Databases
General Data Repositories
- Kaggle - Datasets and competitions
- UCI Machine Learning Repository - Classic datasets
- awesome-public-datasets
- Google Dataset Search - Dataset search engine
- FiveThirtyEight Data - Datasets from journalism projects
- OpenML - Machine learning datasets
- AWS Open Data Registry - Large public datasets
Government & International Organizations
- Data.gov - U.S. government open data
- World Bank Open Data - Global development data
- UN Data - United Nations statistics
- U.S. Census Bureau - Demographic and economic data
- The US Census
- World Health Organization
- The World Bank
- US Food and Drug Administration
Health & Medical Data
- Centres for Disease Control and Prevention (CDC)
- COVID-19 Forecasts
- Noncommunicable Disease Surveillance, Monitoring and Reporting (NCDS)
- Demographic and Health Surveys
- Institute for Health Metrics and Evaluation
- Project Tycho
Agricultural & Environmental Data
Population & Demographics
Corporate & Other Sources
- IBM’s collection of opensource data sets
- Google’s opensource data sets
- Data.world: data for business-based questions
- Finviz
- Kaggle’s Star Trek Scripts
- Project Gutenberg: Free eBooks
Library Resources
Bioinformatics & Biomedical Resources
Sequence Analysis Tools
- Blast - Basic Local Alignment Search Tool
- Diamond - BLAST alternative
- TCoffee - Multiple sequence alignment
- SIM: Alignment tool for protein sequences
- Needleman-Wunsch algorithm Interactive demo
Genomics & Gene Analysis
- EGassembler
- HMMER: biosequence analysis using profile hidden Markov models
- Hidden Markov Models (Youtube)
- Gene Ontology Resource
- Panther Classification System
- AmiGo: Gene products and gene annotations
- Codon Usage Database
- Augustus [gene prediction] - Gene prediction in eukaryotic sequences
Protein Analysis & Structural Biology
- Protein Data Bank
- Predict protein
- String: a protein database
- The String Database For Analysis
- Database of protein domains, families and functional sites
- ELM tool kit
- PyMol - Molecular visualization system
- ModEval: An evaluation tool for protein structure models
- The Cell Map
- Northeast Structural Genomics Consortium
Medical & Disease Research
- The Comprehensive Antibiotic Resistance Database
- ermB anti resistance gene
- Blast analysis
- DrugBank - Pharmaceutical knowledge base
- Genetics Home Reference
- National Human Genome Research Institute
- Max Planck Institute For Molecular Genetics
- Cataracts and Genetics
Cell & Tissue Atlases
- The Lung Endothelial Cell Atlas
- The COPD Cell Atlas
- The COVID Cell Atlas
- Idiopathic Pulmonary Fibrosis Cell Atlas
Microbiome Analysis
- Qiime2 - Microbiome bioinformatics platform
Educational Resources
- Dealing with GenBank files in Biopython
- Virus Explorer
- The Double Helix Documentary (17 mins)
- The Chemical Structure of DNA
- The Structure of DNA
- The definition to 5’ end and 3’ end of a DNA strand
- What happens when your DNA is damaged?
- Mutations and Natural Selection
- Protein synthesis animation
Comprehensive Software Platforms
- UGENE - Free open-source cross-platform bioinformatics software
Cerebral Palsy Resources
- Cerebral Palsy Guidance
- Cerebral Palsy Associated Disorders
- Cerebral Palsy Guide
- The Cerebral Palsy Toolkit
- Causes
- Types of Cerebral Palsyβ
Interactive Data Visualizations
Music Visualizations
- How Music Taste Evolved
- How Billboard’s Top Hits Changed
- Largest Vocabulary in Hip Hop
- Interactive Info Visualization of Spotify Music Genre Overtime
- Exploratory Data Analysis of Various Music Datasets
Library Collections
COVID-19 & Health
Social & Environmental Topics
- Languages of the world
- Stream graph of Immigration to the US
- Pinellas County’s Public School Inequalities for Black Pupils
- Gender Pay Gap US and UK
- Fossil Fuels
- The Demographics of Others
- What’s Really Warming the World?
Science & Technology
- The world’s largest Open Database of Cell Towers
- Satellites Orbiting Earth
- Every Satellite Orbiting Earth and Who Owns Them
Transportation & Infrastructure
Creative & Entertainment
Politics & Current Events
Visualization Galleries & Resources
- Our World in Data - Interactive global charts
- Observable - Notebook-based visualizations
- Information Is Beautiful - Visual storytelling
- FlowingData - Tutorials and examples
- The Pudding - Visual essays
- Gapminder - Animated global data visuals
- Tableau Public - Public dashboards
Articles About Data Visualization
- WPdatatables
- 10 Must-Read Data Analytics Websites
- The Best Data Visualization Tools Of 2023
- The 5 Most Creative Music Visualization
- Industrial Careers in the Age of Machine Learning
Women in Computing

β
Opportunities and Support
Below are resources from the National Center for Women & Information Technology (NCWIT), which offers women support, encouragement and information to help them gain experience in computing and build meaningful careers in computing technology.
The National Center for Women & Information Technology (NCWIT) is the farthest-reaching network of change leaders focused on advancing innovation by correcting underrepresentation in computing.

β
Aspirations in Computing (AiC)
Get Involved

β
Awards & Recognition
Did you know that AIC offers awards for amazing work in computing?
Note
If you find resources that would fit nicely here, please let us know! More resources will be added as they are discovered.
Contact: obonhamcarter at allegheny dot edu
