## Integrating Jupyter Notebooks, Version Control, and GitHub Classroom [![TLT Logo](/images/tlt/tlt_logo.png)<!-- .element height="108" width="216" style="border:0;" -->](https://tltsymposium2020.sched.com/event/ZReB/integrating-jupyter-notebooks-version-control-and-github-classroom) [Eric Ford](https://personal.psu.edu/ebf11) [Dept. of Astronomy & Astrophysics](https://science.psu.edu/astro) [![Penn State Science mark](/images/pss_stack_rev.png)<!-- .element height="96" left="32px" style="border:0;" -->](https://science.psu.edu) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![ICDS mark](/images/PSU_CDS_RGB_REV_2C_web.png)<!-- .element height="96" right="32px" style="border:0;" -->](https://icds.psu.edu) Note: Jupyter Notebooks enable instructors and students to integrate text, equations, figures, and computer code into lessons and assignments. Git, a distributed version control system, enables multiple contributors such as instructors, students, colleagues to efficiently update and improve curricular materials and assignments. GitHub Classroom allows instructors to see student submissions and provide detailed feedback on their work. This session will explain how these three technologies were integrated into a 2019 graduate-level class and discuss how these similar approaches could be applied and adapted for larger courses. --- ## Outline - Four Valuable Tools for Education - The Problem: Combining them - My Solution - Room for Further Improvements - Discussion --- ## Four Valuable Tools 1. [Git](https://git-scm.com/) 2. [GitHub](https://github.com/) 3. [GitHub Classroom](https://classroom.github.com/) 4. [Jupyter](https://jupyter.org/) Note: The four tools that I referred to are: git, GitHub, GitHub Classroom, and Jupyter. Based on the pre-seminar survey, most participants weren't familiar with these, so I'm going to spend some time introducing each, in case some of you might be interested in using any of these on their own. --- ## 1. [Git](https://git-scm.com/) - Distributed version control - Works well for wide variety of projects from small to large - Facilitates collaboration - Open-source - Valuable skill to develop Note: Git is a system for providing version control. The simplest form of version control would be keeping all the old copies of each file. Git is much more powerful than that, as it's designed for multiple people to be editing multiple files. Git is used widely in industry, so it's a good skill for students considering careers in CS, Data Science, STEM, web-development, etc. ___ ## What is Git good for? - Software development - Big projects (e.g., Linux kernel) - Small packages - Documentation - Websites - Workflows Note: While git was designed for version controlling computer programs, but it's also proven quite useful for many other applications. It's widely used for websites and writing documentation. ___ ## What is Git good for? ### (in educational setting) Curricular materials - Reading Materials - Syllabus - Assignments* - Presentations** <small> \* This gets more complicated. More on this later. <br> \*\* Using less-common presentation tools, e.g., [Reveal.js](https://revealjs.com/#/) </small> Note: Several of you indicated that you were interested in preparing readings for students. If you're the only instructor who teaches a class, then you can just start from the documents you used last semester. But if another instructor taught the class last time, wouldn't it be nice to quickly be able to find and review just what they've changed? Others were interested in having students complete tutorials or submit lab reports or other homework assignments. Here, it's nice for students to be able to enter their contributions right in the middle of the tutorial or instructions, but for the instructor to be able to see just the students changed. Git's version controlling works well for both of those use cases. --- ## 2. [GitHub](https://github.com/)* - De facto standard for open-source software collaboration. - Adds valuable collaboration tools: - Managing _Pull Requests_: Submit, Discuss, Merge,... - Code Review - Bug/issue tracking - Documentation - Continuous Integration / Continuous Deployment <small> \* Other alternatives include [BitBucket](https://bitbucket.org/), [GitLab](https://about.gitlab.com/) and [Penn State's GitLab server](https://git.psu.edu/). </small> Note: GitHub is a popular website that provides the servers for people to distribute and collaborate on documents, websites or computer code while using git version control system. ___ <img data-src="/images/tlt/github_repo.png" width="100%"> ___ ## Example GitHub Repository [https://github.com/PsuAstro528/TLTDemo-start](https://github.com/PsuAstro528/TLTDemo-start) --- ## 3. [GitHub Classroom](https://classroom.github.com/) - Integrates with [GitHub.com](https://github.com/) - Automates repository creation and access control - Distribute starter code - Collect assignments on GitHub - Offers free private repositories for educational uses Note: GitHub classroom helps to automate some common tasks when using GitHub for classes. Also important, it provides free private repositories, so students work and feedback can be kept private. ___ <img data-src="/images/tlt/tlt_github_classroom_1.png" width="100%"> ___ <img data-src="/images/tlt/tlt_github_classroom_2.png" width="100%"> --- ## 4. [Jupyter Notebooks](https://jupyter.org/) - Browser-based _notebooks_ provide an interactive development environment - Notebooks are __interactive__ documents that contain narrative text, equations, ***live*** code, ***live*** output and/or visualizations - Supports ~40 programming languages - JSON format stores a complete record of the user's session Note: Jupyter notebooks are an open document format that contain a mix of text, hyperlinks, images, equations and code. Most important, they're interactive, so it's easy to give students starter code, let them make small edits, inspect the results, and write what they've found. By integrating all these parts into one document, it makes it much easier for instructors or students to show their reasoning. ___ <section> <iframe data-src="https://nbviewer.jupyter.org/github/PsuAstro528/TLTDemo-start/blob/completed/ex1.ipynb" width="100%" height="700" style="vertical-align:top"></iframe> </section> ___ ## Demo of interactive Jupyter notebook [![TLT Demo Notebook](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/PsuAstro528/TLTDemo-start/main?filepath=ex1.ipynb) ___ ## What is a Jupyter notebook good for? - Data wrangling (cleaning, visualizing, transforming) - Exploratory data analysis - Data visualization - Statistical modeling/machine learning - Report generation - Interactive Development Environment* <small>\* Many feel Jupyter notebooks are inferior to traditional IDEs for traditional software development.</small> Note: There are many applications of Jupyter notebooks. Common applications involve showing the input data, how that data was manipulated, what analyses were performed, the resulting figures, and the conclusions drawn. ___ ## What is a Jupyter notebook good for? ### (in educational setting) - Interactive lessons - Tutorials - Assignments involving data analysis or computation - Document of a student's work and decision making process Note: --- ## The Problem Comparison of two versions of a Jupyter notebook: - Many differences in meta-data - Effectively not human-readable - Important differences hard to isolate - Interferes with code review, feedback and discussion. ___ Comparing Jupyter notebooks directly <img data-src="/images/tlt/github_pr_ipynb.png" width="100%"> --- ## One Solution - Key idea: Version control and compare [_Markdown_](https://www.markdownguide.org/) versions of notebooks. Note: Jupyter notebooks contain lots of additional information that isn't directly shown and were never meant to be read by a human. On the other hand, Markdown was intended to be easily human readable. It peals away the extram information, leaving behind only the text, equations, links, and very basic formatting instructions. So comparing Markdown allows us to focus on what's important. ___ Comparing Jupyter notebooks exported to Markdown <img data-src="/images/tlt/github_pr_jmd.png" width="100%"> ___ ## Markdown example Simple text is just text. Markdown can include: - [Links](https://www.markdownguide.org/) - Formatting - **bold** - _italic_ - Equations ($E = mc^2$) - and much more. ![ICDS mark](/images/icds-sig-graphic-icon.png) ___ Markdown example ```markdown ## Markdown example Simple text is just text. Markdown can include: - [Links](https://www.markdownguide.org/) - Formatting - **bold** - _italic_ - Equations ($E = mc^2$) - and much more. ![ICDS mark](/images/icds-sig-graphic-icon.png) ``` --- GitHub Classroom Workflow: ### 1. Instructor creates assignment as a template repository: 1. Create personal git _repository (repo)_ with Jupyter notebook(s) for each assignment, including instructions, starter code, input data, etc. 2. Convert Jupyter notebooks into _Markdown_ format 3. Recreate Jupyter notebooks from _Markdown_ 4. Create clean __template repository__ 5. Distribute link to assignment via GitHub Classroom ___ ### What goes into the template repository? - Starter notebook(s) in both Jupyter & markdown formats - Any other files needed (e.g., input data, license, dependanices, Docker config) - Test scripts - config for CI testing (e.g., [Travis-CI.com](ttps://travis-ci.com/)) (optional) - Branches: - _main_: where students will complete assignment - _original_: copy of starting files in _main_ branch - _solution_: allows students to see your solutions (optional) ___ GitHub Classroom Workflow: ### 2. Student does assignment as Jupyter notebook: 1. Follow link to create a private repo with starter files 2. Clone the repository to their local computer (or ACI) 3. Complete assignment(s) as Jupyter notebooks 4. Perform tests and revise (optional) ___ <section> <iframe data-src="https://nbviewer.jupyter.org/github/PsuAstro528/TLTDemo-start/blob/completed/Self-check.ipynb" width="100%" height="700" style="vertical-align:top"></iframe> </section> ___ But if there are errors... <section> <iframe data-src="https://nbviewer.jupyter.org/github/PsuAstro528/TLTDemo-start/blob/main/Self-check.ipynb" width="100%" height="700" style="vertical-align:top"></iframe> </section> ___ GitHub Classroom Workflow: ### 3. Student submits assignment as markdown: 1. Convert Jupyter notebooks into _Markdown_ format 2. Commit _Markdown_ files into main branch 3. Push local files to their repo on GitHub. 4. Confirm code passes automated tests (hopefully) 5. Submit a *pull request* via GitHub to merge main branch into original branch ___ GitHub Classroom Workflow: ### 4. Discuss assignment collaboratively <p align="left"> Instructor: </p> - Update template repository with more tests (optional) <p align="left"> Student & Instructor: </p> - Check if code passes automatic tests Note: If instructor provides tests, then students can get immediate feedback on whether their submission passes the tests. ___ Automated testing provides prompt feedback <img data-src="/images/tlt/github_pr_test_failed.png" width="100%"> ___ Automated testing provides prompt feedback <img data-src="/images/tlt/travis_fail.png" width="100%"> ___ Automated testing provides prompt feedback <img data-src="/images/tlt/travis_success.png" width="100%"> ___ GitHub Classroom Workflow: ### 5. Discuss assignment collaboratively Instructor: - Review specific lines of code/equations/text that student changed (as Markdown). - Provide feedback Student: - Discuss, make changes, update/make new pull requests Iterate! Note: Github interface makes it easy to for both student and instructor to see which lines they're referring to. ___ Instructor easily comments on specific lines <img data-src="/images/tlt/github_pr_comment_2.png" width="100%"> ___ Record of discussion and changes <img data-src="/images/tlt/github_pr_comment_4.png" width="100%"> ___ <!-- .slide: style="text-align: left;"> --> For more detailed step-by-step instructions for: - [instructors creating lab assignments](https://github.com/PsuAstro528/TLTDemo-start/tree/main#step-by-step-instruction-to-create-lab-assignments-for-instructor) - [students using/submitting labs](https://github.com/PsuAstro528/TLTDemo-start/tree/main#step-by-step-instruction-to-begin-lab-assignments-for-students) Note: I've provided more detailed instructions in the README for the example repository used in this presentation. --- ## Room for Further Improvements - Automate generation of clean assignment from solutions - Reduce command-line operations for students - Automate process of generating markdown version via git hooks - Consider new tools/commercial services emerging - [NBDime](https://github.com/jupyter/nbdime): For diffing Jupyter noteboks (open source) - [NBReview](https://www.reviewnb.com/): For diffing and collaborative commenting on Jupyter noteboks. (commercial) - [IllumiDesk](https://www.illumidesk.com/): Autograding w/ Canvas integration. (commercial) --- ## Discussion - Other approaches to this problem? - Other problems where a similar approach could be useful? - Other questions about tools discussed? --- ## Resources to learn more Jupter notebook-related: - [Jupyter](https://jupyter.org/) - [NBViewer](https://nbviewer.jupyter.org/) - [Binder](https://mybinder.org/) - [NBDime](https://github.com/jupyter/nbdime) ___ ## Resources to learn more Git-related: - [Git](https://git-scm.com/) - [Learn Git Version Control using Interactive Browser-Based Scenarios](https://www.katacoda.com/courses/git) - [GitHub](https://github.com/) - [GitHub Classroom](https://classroom.github.com/) ___ ## Resources to learn more Markdown/Jupyter Presentation Tools: - [Source for this presentation](https://raw.githubusercontent.com/PsuAstro528/Spring2019-website-src/master/content/tools_used/tlt2020.md) - [Reveal.js](https://revealjs.com/#/) - [CSS theme w/ PSU colors](https://raw.githubusercontent.com/PsuAstro528/Spring2019-website-src/master/static/revealjs/css/theme/psu.css) - [Hugo](https://gohugo.io/) - [DocDock Theme](https://docdock.netlify.com/) - [Rise](https://rise.readthedocs.io/en/stable/index.html) ___ ## Resources to learn more [Example GitHub Repository](https://github.com/PsuAstro528/TLTDemo-start) Example Jupyter Notebook - [View only (via NBViewer)](https://nbviewer.jupyter.org/github/PsuAstro528/TLTDemo-start/blob/completed/ex1.ipynb) - [Interactive ![Launch Binder](https://mybinder.org/badge_logo.svg)]( https://mybinder.org/v2/gh/PsuAstro528/TLTDemo-start/main?filepath=ex1.ipynb)