Setting Up Your Data Science Environment
Welcome to Your First Data Science Setup!
Hey! So you’re ready to start doing data science research, but your computer isn’t quite ready yet. No worries – we’ll get you set up with everything you need. This might take an hour or two, but once you’re done, you’ll have a professional-grade research environment.
What we’ll install:
- Python (the most popular language for data science)
- R (another great option, especially for statistics)
- Jupyter Notebooks (for interactive coding)
- VS Code (a powerful text editor)
- Git (for version control)
- Essential packages and libraries
Don’t worry if these names are unfamiliar – by the end, you’ll know what they all do!
Step 1: Installing Python
Python is the most widely-used language in data science. Let’s get it on your computer.
For Mac Users
The easiest way is to use Homebrew, a package manager for Mac:
# First, install Homebrew (if you don't have it)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Then install Python
brew install python@3.11For Windows Users
- Go to python.org/downloads
- Download Python 3.11 or newer
- Important: Check “Add Python to PATH” during installation!
- Click “Install Now”
For Linux Users
# Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3-pip
# Fedora
sudo dnf install python3.11 python3-pipVerify Your Installation
Open a terminal (Mac/Linux) or Command Prompt (Windows) and type:
python3 --versionYou should see something like Python 3.11.x. Success! 🎉
Step 2: Installing R
R is fantastic for statistical analysis and has some unique packages Python doesn’t have.
All Operating Systems
- Go to r-project.org
- Click “download R”
- Choose a mirror near you
- Download for your operating system
- Install like any other application
Install RStudio (Recommended)
RStudio makes working with R much easier:
- Go to posit.co/downloads
- Download RStudio Desktop (free version)
- Install it
Verify Installation
Open RStudio or a terminal and type:
R --versionYou should see your R version. Great work!
Step 3: Setting Up Virtual Environments
Virtual environments keep your projects separate and organized. Think of them as separate workspaces for different projects.
Python Virtual Environments
# Install virtualenv
pip3 install virtualenv
# Create a new environment for your project
cd ~/my-research-project
python3 -m venv venv
# Activate it (Mac/Linux)
source venv/bin/activate
# Activate it (Windows)
venv\Scripts\activate
# Your prompt should now show (venv)When you’re done working:
deactivateUsing Conda (Alternative, Recommended)
Conda is another popular option that works great for data science:
# Download Miniconda from conda.io/miniconda.html
# Then create an environment:
conda create -n myresearch python=3.11
conda activate myresearch
# Install packages
conda install pandas numpy scikit-learn matplotlibStep 4: Installing Essential Python Packages
With your environment activated, install these essential packages:
# Data manipulation and analysis
pip install pandas numpy scipy
# Visualization
pip install matplotlib seaborn plotly
# Machine learning
pip install scikit-learn
# Jupyter notebooks
pip install jupyter jupyterlab
# Data loading
pip install openpyxl xlrd requests beautifulsoup4
# Statistics
pip install statsmodelsOr install them all at once:
pip install pandas numpy scipy matplotlib seaborn plotly scikit-learn jupyter jupyterlab statsmodels requests beautifulsoup4Step 5: Installing Essential R Packages
Open RStudio and run:
# Data manipulation
install.packages("tidyverse") # This includes dplyr, ggplot2, and more!
# Additional useful packages
install.packages(c(
"data.table", # Fast data manipulation
"caret", # Machine learning
"readr", # Reading data
"readxl", # Excel files
"jsonlite", # JSON data
"httr", # HTTP requests
"rvest" # Web scraping
))This might take a few minutes. Grab a coffee! ☕
Step 6: Installing Jupyter Notebook/Lab
Jupyter lets you write code in an interactive notebook format – perfect for research!
# Already installed if you followed Step 4
# To launch Jupyter Lab:
jupyter lab
# Or classic Jupyter Notebook:
jupyter notebookYour browser should open automatically. You’re now in Jupyter! 🎉
Create Your First Notebook
- Click “New” → “Notebook”
- Choose Python 3
- In the first cell, type:
print("Hello, research world!") - Press Shift + Enter to run
Step 7: Installing VS Code
VS Code is a powerful, free editor that works great for Python, R, and more.
Installation
- Go to code.visualstudio.com
- Download for your operating system
- Install it
Essential Extensions
Open VS Code and install these extensions (click the Extensions icon on the left):
- Python (by Microsoft)
- Jupyter (by Microsoft)
- R (by REditorSupport)
- GitLens (for Git integration)
- Markdown All in One (for documentation)
Step 8: Installing Git
Git tracks changes to your code and helps you collaborate.
For Mac
brew install gitFor Windows
Download from git-scm.com and install.
For Linux
# Ubuntu/Debian
sudo apt install git
# Fedora
sudo dnf install gitConfigure Git
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"Create a GitHub Account
Go to github.com and sign up. You’ll use this to share code and collaborate.
Step 9: Test Your Setup
Let’s make sure everything works together!
Create a Test Project
# Create a directory
mkdir ~/data-science-test
cd ~/data-science-test
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install packages
pip install pandas matplotlib jupyter
# Start Jupyter
jupyter notebookRun This Test Code
Create a new notebook and run:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
data = pd.DataFrame({
'x': range(10),
'y': np.random.randn(10)
})
# Make a plot
plt.figure(figsize=(8, 6))
plt.plot(data['x'], data['y'], marker='o')
plt.title('Test Plot - Your Setup Works! 🎉')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
print("Success! Your environment is ready for research!")If you see a plot, you’re all set! 🎊
Bonus: Working with Computing Clusters
Many universities provide computing clusters for bigger analyses.
SSH Setup
# Generate SSH key
ssh-keygen -t ed25519 -C "your.email@example.com"
# Copy public key to clipboard (Mac)
cat ~/.ssh/id_ed25519.pub | pbcopy
# Then add it to your cluster accountConnect to Cluster
ssh username@cluster.university.eduTransfer Files
# Upload to cluster
scp mydata.csv username@cluster.university.edu:~/
# Download from cluster
scp username@cluster.university.edu:~/results.csv ./Keeping Your Environment Organized
Save Your Package List
For Python:
pip freeze > requirements.txtTo recreate on another computer:
pip install -r requirements.txtFor R, use renv:
install.packages("renv")
renv::init() # Initialize
renv::snapshot() # Save current packages
renv::restore() # Restore saved packagesFor Conda:
conda env export > environment.yml
conda env create -f environment.ymlTroubleshooting Common Issues
“Command not found”
Make sure you’ve added the program to your PATH. On Mac/Linux:
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrcPython version conflicts
Use virtual environments! They keep different projects separate.
Package installation fails
Try:
pip install --upgrade pip
pip install --user package-nameJupyter kernel issues
python -m ipykernel install --user --name=myenvYour Setup Checklist
Before moving on, make sure you have:
- ✅ Python 3.11+ installed and working
- ✅ R and RStudio installed
- ✅ Can create and activate virtual environments
- ✅ Essential packages installed (pandas, matplotlib, etc.)
- ✅ Jupyter Notebook/Lab running
- ✅ VS Code installed with extensions
- ✅ Git installed and configured
- ✅ GitHub account created
- ✅ Successfully ran the test code
Next Steps
Now that your environment is ready:
- Finding Data for Your Research Project - Learn where to get data
- Writing Clean, Reproducible Code - Best practices for research code
- Exploratory Data Analysis - Start analyzing data
Quick Reference: Common Commands
# Python virtual environment
python3 -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
deactivate
# Conda
conda create -n myenv python=3.11
conda activate myenv
conda deactivate
conda list # See installed packages
# Jupyter
jupyter lab
jupyter notebook
# Git basics
git init
git add .
git commit -m "Message"
git pushCongratulations! 🎉 You now have a professional data science research environment. This might have felt like a lot, but you’ve just built the foundation for all your future research. Everything gets easier from here!
Ready to find some data? Check out the Finding Data tutorial!