Working with Data for REU Students, 2023

A four week workshop presented by Liz Dobbins

(liz@axiomdatascience.com)

Tip

This is the syllabus from the data workshop presented to the 2023 cohort of Research Experience for Undergraduates (REU) students of the Northern Gulf of Alaska Long Term Ecological Research (NGA LTER) project. The slides were done in Quarto. The design of the curriculum is influenced by Openscapes. The workshop was held in person!

Introduction

This four week workshop will introduce undergraduate students to current best practices regarding reproducible scientific research and data. Whereas the previous version of this workshop included Carpentries lessons for hands on practice, four weeks might not be long enough to to include those. Instead, the emphasis is on motivations and developing the culture of Open Science. This will hopefully enable further exploration of these topics by the students on their own. As such, this workshop is meant to be an introduction to topics students will confront if they continue as a graduate student or scientist.

Topic Overview

Week 1: Open Science is Better Science

  • Introductions
  • Open Science (Slide)
    • Mistakes
    • Growth Mindset
    • Scaffolding to enable Open Science
      • Scripting
      • Version control
      • Data Life Cycle
  • Introduction of a Possible Team Project

Week 2: Understanding Code

  • Interactive intro to Python (Colab Notebook)
  • Understanding code (Slides)
    • Learning code: why and how
    • Where we run into trouble
    • 6 Tips and Tricks
  • Practice best practices (Colab Notebook)

Week 3: Data Life Cycle

  • Pandas demo using GAK1 temperature (Colab Notebook)
  • Tidy data (Slides)
  • Data sources and archives (Slides)
    • DataONE data discovery activity

Week 4: TBD

  • FAIR
    • Findable
    • Accessible
    • Interoperable
    • Reproducible
Materials

Slides

Open Science is Better Science

Understanding Code

Tidy Data, Archives, Metadata

Explore the slides following these instructions for navigation.

Notebooks (rendered as HTML)

Python in One Hour

Practice Best Practices

Pandas and LTER Signature Datasets

You can download unrendered Notebooks at https://github.com/eldobbins/quarto-nga-docs/tree/main/notebooks

Hands-on Activities

Data Discovery using DataONE

Instructor information:

Liz Dobbins has been working with oceanographic data for more than 30 years in both academia and the private sector. She has collected data at sea, processed sensor data, mapped assets, utilized numerical models, and used open-source Python tools to ingest data into a public-facing data portal. She is a certified Carpentries instructor and is eager to talk about best practices regarding scientific computing.