Syllabus

Astro 528: High-Performance Scientific Computing for Astrophysics (Fall 2023)

Basic Info

  • Class Meeting Times:

    • 1:25-2:15pm Mondays in Davey 538 &

    • 1:15-2:55pm Wednesdays in Davey 538 (optionally Davey 432, too)

  • Instructor: Eric Ford

    • Email: ebf11 at psu dot edu

    • Phone: x3-5558

  • Graduate Teaching Assistant: Kadri Mohamad Nizam

    • Email: 445A at psu dot edu

  • Office Hours: Monday 2:15-3pm, Wednesday (3-3:30pm), or by appointment in Davey 408A or Zoom

  • Website: https://psuastro528.github.io/Fall2023/

Scope & Sprit of the Course

High-Performance Scientific Computing for Astrophysics will combine class discussion and programming exercises to train students in the use of modern computing hardware and programming strategies for application to astronomy and astrophysics research. Students will gain experience applying these practices during a class project (potentially in support of the student’s dissertation research). While Astro 528 is primarily intended for graduate students in the astronomy & astrophysics program, it is likely also beneficial for graduate students in other physical sciences and engineering.

This course can count toward the Penn State Graduate Minor in Computational Science. It is complementary to Phys/Astro 587 Computational Physics/Astrophysics, the recent Astro 585: Astrostatistics and Stat/IST 557: Data Mining. The combination of these four courses with the domain expertise from their Ph.D. program would provide students with an excellent introduction to Data Science.

Course Goals

Successful students in the class will:

  • Improve their skills in scientific computing and software development,

  • Develop software develoment programming practices that support efficient use of resources and reproducible results,

  • Develop capacity to pefofrm high-performance and/or high-throughput scientific computing with an eye towards scaling up to larger problems and Big Data, and

  • Appreciate how established software development practices and parallel computing can benefit astronomy & astrophysics research.

Learning Objectives

Successful students in the class will:

  • Apply common programming patterns to analyze astronomical data or model astrophysical phenomena.

  • Develop a basic understanding of modern computer architectures, memory systems and programming languages.

  • Organize code into modular functions and provide associated documentation and tests.

  • Apply established programming practices (e.g., version control, coding standards, unit testing)

  • Perform a code review and refactor code based on feedback from peer code review.

  • Benchmark and profile a scientific code to identify performance-critical portions of a code and to identify sources of inefficiency.

  • Parallelize code for shared-memory, distributed-memory and accelerator architectures.

  • Identify lessons learned and communicate those to their peers.

All of these objectives may not be realized within the one-semester course, and we will adapt the scope and emphasis based on student interests and programming experience.

Course Prerequisites

This course is designed for Astronomy & Astrophysics graduate students. While Astro 501 is listed as a corerequsite by the registrar, that is primarily a formality, so that non-Astrophysics students will contact me to discuss their background before registering. Previously, we have had some graduate students from physics and engineering who did very well in the course. Of course, some extra thought may be required to figure out how concepts from the course presented in the context of astronomy or astrophysics could be applied to their own field of study.

Entering Astronomy & Astrophysics graduate students span a wide range in terms of level of experience with programming. The goal is for all students to improve their understanding of computing, software development skills and programming patterns, regardless of whether they enter as a novice or experienced programmer. Students entering with significant computing skills should aim to achieve more than students entering with minimal experience. Astronomy graduate students who are not already familiar with using the Unix/Linux/OS X command line interface and/or with no programming experience are encouraged to take the course, but should be prepared to put in some extra effort early in the semester. If you let me know about your background, I can suggest some resources to help you get started and/or schedule a time to meet with you to answer questions.

Course Topics

  • Overview of Scientific Computing, High Performance Computing, Data Science & Big Data

  • Priorities for Scientific Computing

  • Types & Choices of Programming Languages

  • Floating Point Arithmetic

  • Best Practices for Scientific Programming

    • Version Control (e.g., git)

    • Testing

    • Continuous Integration

    • Documentation & Literate Programing

    • Debugging

    • Benchmarking & Profiling

    • Reproducible Results & Workflows

    • Packages, Environments, Containers & Images

    • Efficient Workflows

  • Optimizing Code Performance

    • Modern Processors, Memory & Networking Architectures

    • Choice of Algorithms

    • Choice of Data Structures

    • Serial Codes

    • Shared Memory Systems (i.e., muli-core within one node)

    • Distrbiuted Memory Systems (e.g., across multiple nodes)

    • Hardware Accelerators (e.g., GPUs, Intel Phi)

    • Cloud Computing (e.g., AWS, Open Science Grid)

    • Strong & Weak Scaling

  • Experience

    • Practice good programming habits on series of exercises

    • Apply best practices to a real science project

Course Schedule

  • Week 1: Developer Tools

  • Week 2: Priorities for Scientific Computing

  • Week 3: Scientific Software Design

  • Week 4: Modern Workstation Architectures

  • Week 5: Optimization of Serial Code

  • Week 6: Memory Systems

  • Week 7: Parallel Computing Architectures

  • Weeks 8 & 9: Parallel Programming

  • Week 10: Accelerators & GPUs

  • Week 11: GPU Programming

  • Week 12: Reproducibility

  • Week 13: Cloud Computing

  • Weeks 14-15: Project Presentations

The version of the syllabus on the course website will be updated throughout the semester. Students should check the website regularly for updates.

Assignments

The assessed work for this course consists of computer lab/homework assignments (40%), a class project (50%) and class discussion (10%; including either contributing to class discussion and/or submitting reading questions). There are no exams. The class project will have several components and will be described in more detail below.

The planned assignment due dates are in the schedule section of the website. Any revision to the due dates will be announced at the time they are assigned. Assignments are due on the same day as a class session are due by 1 hour before the start of class. If the University is closed on the due date of an assignment (due to bad weather or any other reason), then the assignment will be due by the same time the next business day (that classes are not canceled).

Lab/Homework Exercises

Early in the semester, homework assignments will be assigned once a week. Later in the semester, homework assignments will become shorter and/or less frequent, since students will be working on their class projects. Students will begin each homework exercise by following a link that will create a clone of the starter git repository. Students are to read and think about the questions posted, and add/edit code as suggested in the exercise, making multiple small commits as they go. Homework exercises are to be submitted by a pull request from the student's github repository.

Most assignments will not have a unique solution, and comparing the accuracy and/or performance of different solutions will likely prove educational. Therefore, rather than providing “the solution”, we will typically discuss selected student solutions during class to help illustrate the advantages and disadvantages of different approaches.

The homework assignments are designed to be educational. The experience of working on the assignment is more valuable than having “the solution”. As this is a three credit class, it is expected that students will devote an average of 5 hours per week to the course outside of class. If we estimate an average of ~1 hour/week for reading, then that leaves ~4 hours per week to work on homework assignments or your class project outside of class. If a student completes a homework assignment with less than ~2 hours of effort, then I would encourage them to go beyond the minimum to complete the assignment and try to come up with an even more efficient solution to the problem or to devote extra time to their project. Conversely, if you have done the readings, participated in class and devoted 4 hours of focused effort outside of class to a homework assignment, then you should stop coding! At that point, write up a short description of what you've done, what's working, what problems you've encountered and what you think you would try next. Don't let one homework assignment take an unreasonable amount of your time. Since some students will have more programming experience than others, I will try to make each homework assignment somewhat more than the average student can do in 4 hours, so that all students are challenged. If you are so interested that you choose to work longer on a homework assignment, then please create and tag a commit with where you were after 4 hours of focused effort, so that I have a realist idea of how much students are accomplishing in a reasonable amount of time.

Class Project

The class project (worth a total of 50% of final grade) includes the following key elements:

  • a written proposal outlining your project (5%),

  • implementing a solution to your problem that passes your tests and uses programming practices from class in time for the peer code review (10%),

  • performing a helpful code review on a peer's project (5%; see example code review checklist),

  • optimizing performance for a multi-core shared-memory system (i.e., modern workstation; 10%),

  • optimizing performance using either a distributed memory system (e.g., cluster), a many-core accelerator (e.g., GPU or Intel Phi), or on the cloud (e.g., Amazon Elastic Compute Cloud, JuliaHub, Open Science Grid) (10%), and

  • a ~15 minute presentation (including time for questions) to the class describing your project, comparing the performance of different versions of your code as a function of problem size, and describing lessons learned (10%).

Project Proposals

Students are encouraged to propose a project that closely relates to their research interests. Student are strongly encouraged to discuss their ideas with the instructor far enough in advance of the deadline for the project proposal, so that they can refine or change plans prior to the proposal due date. If you have no idea and would like the instructor to suggest a project, then let the instructor know by the end of the first week and suggestions can be offered.

The written project proposal should include:

  • the project goal,

  • a description of the inputs (e.g., initial conditions or input datasets, astrophysical model parameters, implementation parameters),

  • a description of the outputs,

  • a detailed plan for how the code will be tested (from unit tests to verification),

  • a discussion of the relevant range of problem sizes,

  • what computer architectures, programming languages, and libraries you will use, as well as an justification of your choices.

More information about the expectations for class projects and grading rubrics are provided in the class project section of the website.

Readings & Reading Questions

Students will be expected to read assignments (or review a website, watch an online presentation, or listen to audio) before class on the days indicated, so they will be prepared to participate in class discussions and/or make progress writing code for the assignment. All students should submit an average of at least one question per week about the week's readings via TopHat by one hour before discussion-based classes (typically Mondays; as opposed to Wednesdays which will typically be lab-based classes). There is a link to the course TopHat site inside the Canvas webpage. Submitting well before class starts is important, so the instructor will have time to read the questions and organize the day’s discussion based on actual student questions. You're also encouraged to take a look at questions submitted by other students and give a "thumbs up" to indicate those questions that you'd also like to be addressed in class. In the event of technically difficulties, you can email your question to the instructor with "Astro 528 Reading Question" in the subject line.

Class Participation

In-class discussions and coding sessions will be an important part of the course, so students should aim to participate regularly. Students who are reluctant to ask questions in class are especially encouraged to ask extra questions prior to class, as described above. If you know you need to miss class due to research travel or health issues, then let the instructor know in advance whenever practical. It may be possible to watch recordings of missed classes. If that is not practical for some reason, make plans to get a classmate's notes for any missed class sessions.

Exam Policy

There will be no exams.

Timeliness of assignments

Students should start all assignments well before the due date, so they can resolve any technical difficulties well in advance of the deadline. When assignments are discussed in class on the day they are due, then credit will be given based on what is submitted prior to class. In cases where turning in assignments on time is not practical due to illness, family emergency, or other university-approved excuse, assignments should still be completed and turned in, but those assignments may not be included when computing the course grade. If portions of the class project totalling more than 10% of course grade can not be completed before the course end date due to illness, family emergency, etc., then the student can elect to receive a “deferred grade” (DF) and to submit the remaining portions of the project no later than eight weeks after the course end date. Students electing this option should be familiar with the PSU DF policies.

Textbooks

The required textbooks for this course are:

  • Writing Scientific Software: A Guide to Good Style by Suely Oliveira, Cambridge University Press, 1st edition, September 18, 2006, ISBN-10: 0521858968. (I recommend considering a used version.)

  • Think Julia: How to Think like a Computer Scientist by Ben Lauwens and Allen Downey, 1st edition, May 14, 2019. ISBN-10:1492045039 (I recommend using the version avaliable online for free.)

I will also suggest readings from an optional textbook:

  • Introduction to High Performance Computing for Scientists and Engineers by Georg Hager & Gerhard Wellein, CRC Press, 1st edition, July 2, 2010, ISBN-10: 143981192X

Whenever readings from Introduction to High Performance Computing for Scientists and Engineers are suggested, there will be alternative online reading assignments for students who prefer not to purchase another textbook. Hopefully, different students reading from different sources will help stimulate interesting discussion about commonalities and differences across sources.

Additional Readings & Resources

We will also make use of a variety of web resources, such as those on the additional resources page.

Computing Requirements

Hardware

We anticipate that students will have access to a laptop computer or workstation with good internet to work on exercises both during and outside of classes (pairing up is encouraged even if you both have laptops). As long as students have a good internet connection, then their local computer can be used for accessing cloud resources and need not be high-powered. If anyone is likely to work form a location with poor internet speed/reliability, then they are encouraged install and run software locally, particularlly for the early part of the class. While students will still need to submit jobs to the ICDS Roar supercomputer during the second half of the class, much of the software development can be done locally before connecting to Roar to submit jbos and retrieve results.

Basic Software

Students will need regular access to the following software:

  • Browser: Many cloud resources such as those we will be using regularly (e.g., Roar Collab Portal/, GitHub, etc.) require a modern browser. Based on documentation that I've found for the most demanding sites, I beleive that that Chrome (22+), Firefox (16+), and Internet Explorer (11+) should work, however it's not practical for the instructor to test each possible browser, OS, etc. I plan to test the in class and homework exercises using Chrome. If you find a problem that arises or is fixed by changing browsers, please let the instructor and class know, so others can benefit from your experience.

Many of the early assignments could be executed either on the student's local computer, Penn State's Roar Collab supercomputer, also known as Advanced CyberInfrastructure (ACI) operated by the Institute for Computational & Data Sciences. However, once we get to parallelizing code, students will need to use the HPC resources provided by ACI anyway. Therefore, all students should setup an account for use ACI during the later parts of the course, regardless of whether they install local software.

Optional Software

For students who find it convenient to install additional software on their local computer, they would likely to want to setup:

Accounts

  • All students should request an ACI account (via theICDS website before the second class meeting.

  • Students should create an account on GitHub. Note that we will examine and discuss student's code both during class and via peer code review. Students may choose to protect their privacy by choosing a github account id that does not identify them. Students may wish to create a separate github account just for this class, so as to avoid being identified by other projects.

  • Prior to the second class meeting, students should send the instructor their GitHub userid.

  • Students should make use of Top Hat for submitting reading questions. Top Hat Activation Instructions are avaliable.

Safety

While attendance and participation in class is important to the class and your learning, it is more important that we all stay safe and healthy. All students must follow all safety protocols (e.g., COVID) required or recommended by the university. University policies and recommendations may change during the semester. As of the start of class, current CDC and university guidance is to wear a facemask for 10 days after a known COVID exposure. The most up-to-date information can be accessed at https://virusinfo.psu.edu/university-status/.

Any student who does not feel well must not attend class in Davey Lab. If you have reason to beleive that you have been exposed to an infectous disease (e.g., COVID), you should either self-isolate or wear a well-fitting high-quality mask (e.g., N95, KN95, KF94) while in Davey Lab. If you choose to self-isolate, then you can still earn full credit for reading questions and class participation by submitting reading questions prior to class and submitting lab and project assignments. Students should make plans to get a classmate’s notes for any missed class sessions. Some of the class sessions may be moved online, based on community conditions or if the instructor needs to quarantine or isolate.


While COVID-19 cases have decreased substantially since fall of 2021, COVID-19 remains a pandemic. More transmissible variants are a major concern. Penn State urges everyone to continue to take steps to protect not only themselves, but their colleagues, friends, and the campus by practicing good hand hygiene, staying home if you are sick, being up to date on vaccinations and boosters, and wearing a high-quality, well fitting mask indoors. There is evidence that masks are effective in reducing the transmission of COVID-19 (e.g., Li et al., 2020, Lima et al., 2020, Talic et al., 2021). Everyone is strongly encouraged to wear masks while indoors, so as to reduce the risk of any class participants getting sick or needing to self-isiolate and to allow everyone to focus on the class, rather than being distracted by safety concerns.

While some students may be comfortable working in close proximity with a partner on a lab assignment or class project, others may prefer to maintain more physical distance or to collaborate remotely. Students are expected to respect others' requests for physical distancing.

Expectations

Etiquette

Students are expected to be civil and considerate during class, regardless of whether it is online or in person. In particular, we want to create an environment where everyone feels comfortable asking questions and sharing imperfect code. Students should refrain from any actions that distract their classmates, instructor or the class. It's understandable the cell phones will often be used for two factor authentication, but they should be silenced and put away during class once you've authenticated. Taking notes on laptops or looking up information relevant to class discussion is encouraged. However, apps and windows unrelated to the class should be closed throughout class.

If you join class from Davey 538, then please bring headphones, so everyone won't hear audio from your computer (especially during breakout sessions on Wednesdays, but potentially also during group discussions if different computers have different audio lag).

The Eberly College of Science has a Code of Mutual Respect and Cooperation. This code embodies the values that we hope our faculty, staff, and students possess and will endorse to make The Eberly College of Science a place where every individual feels respected and valued, as well as challenged and rewarded.

All students are responsible for knowing and following all the rules and regulations for this course as set forth in the syllabus (including the details on the class web site) and what is announced in class. In case of any ambiguity, ask the instructor to clarify.

Academic Integrity

Students are expected to present their own work for homework assignments and the class project. Students are be strongly encouraged to consult with each other as part of completing assignments (in addition to making use of pair coding, as described below). How does one reconcile these two? One good rule of thumb is that you (whether an individual or a pair coding team) want to ask for help in planning what to do or figuring out what could be causing a problem, but when it comes time to implement those ideas, you should write the code yourself. When you collaborate with a classmate to develop a plan, you should each implement it individually.

A second good rule of thumb is that you should not copy and paste text or code for a homework assignment. Any time you do (e.g., if you were to modify code from the Julia base or a package developed by a third party), you should clearly credit the source and indicate this via inline documentation in both the code which parts are you own and which were borrowed. That doesn't mean that you'll get credit for other people's work, but it will mean you've been upfront about what was your contribution. If you're ever unsure whether something is ok, you should ask and include an explanation of the contributions of others in your code and whatever you turn in.

Pair Coding

You are encouraged to engage in “pair coding” for the homework assignments and/or the class project. When pair coding, you can choose to either: 1) have each student be the “driver” for their own part of the assignment (probably best for class projects) or 2) swap between “driver” and “navigator” roles frequently within each question (probably best for homework). Any time you pair code, you should always indicate who you paired with for each task. You may not have one student be the driver for all of exercise 1, then swap and have another student be the driver for all of exercise 2, as that makes it likely that the “navigator” will not understand the solution as well as the “driver”. When you engage in pair coding, then you should clearly indicate which student you worked with, so you can both get credit.

Comparing work with others

Whether you complete assignments individually or in pairs, you are encouraged to compare your implementation’s code, accuracy and performance to that of your other classmates. Before you make changes after such a comparison, tag your repository with "precompare" (if for the whole assignment) or "precompare-N" (where N is the exercise number if you compare one exercise at a time within an assignment). In the pull request, add a few lines summarizing what changes you made and what you learned from the experience (e.g., how much of a difference the change made, if there are are drawbacks to the new approach).

Artificial Intelligence

Students are strongly discourgaed from using artificial intelligence (AI) tools while completing assignments for this class. The assignments are designed to help sutdents develop and practice programming good habits. Using AI may help complete an assignment faster, but interferes with the intended learning and practice. If a student does use AI (or any other technology) to generate words, code or images for the class, they must clearly disclose the source and use of of such tools at the same time the work is submitted, just as they must disclose the use of materials generated by another person.

Using spell check and/or grammar check to help improve writing before submission is appropriate, does not interfere with the intended learning, and does not need to be disclosed for this class.

Timeliness of assignments

Students should start all assignments well before the due date, so they can resolve any technical difficulties comfortably in advance of the deadline. Since assignments will typically be discussed in class on the day they are due, credit will be given based on what is submitted prior to class. In cases where turning in assignments on time is not practical due to illness, family emergency, or other university-approved excuse, assignments should still be completed and turned in as soon as practical. If portions of the class project totalling more than 10% of course grade can not be completed before the course end date due to illness, family emergency, etc., then the student can elect to receive a “deferred grade” (DF) and to submit the remaining portions of the project no later than eight weeks after the course end date. Students electing this option should be familiar with the PSU DF policies.

Recordings of classes

In anticipation that some students may miss classes due to health issues, classes may be recorded. Ay students who prefer to not ask questions while being recorded are encouraged to submit questions in advance of class.

Audio recordings of classes are part of the class activities. Any recordings are used for educational use/purposes and only may be made available to all students presently enrolled in the class. For purposes where the recordings will be used in future class sessions/lectures, any type of identifying information will be adequately removed.

According to University Policy, students must get express permission from their instructor to record class sessions. Screenshots showing instructors and students are considered recordings. Even if permission is granted, student-initiated recordings must be used only for educational purposes for the students enrolled in the initiating student’s class. Recordings may be used only during the period in which the student is enrolled in the class. Authorized student-initiated recordings may not be posted or shared in any fashion outside of the class, including online or through other media, without the express written consent of the course instructor or appropriate University administrator. Students who engage in the unauthorized distribution of class recordings may be held in violation of the University’s Code of Conduct, and/or liable under Federal and State laws.

Instructions for a campus closure or other adjustment

In the event of any changes to the schedule (e.g., due to a campus closure or delayed start, instructor illness, etc.), any changes in class meeting times, class format (in-person or Zoom), assignment deadlines, submission procedures, exam procedures, or any other necessary instructions will be communicated via an announcement in Canvas. Students should make a habit of checking their Canvas inbox at least daily.

Code of Mutual Respect and Cooperation

The Eberly College of Science Code of Mutual Respect and Cooperation embodies the values that we hope our faculty, staff, and students possess and will endorse to make The Eberly College of Science a place where every individual feels respected and valued, as well as challenged and rewarded. Please visit the link to review the 12 points that comprise this code.

Academic Support

The Eberly College of Science is committed to the academic success of students enrolled in the College's courses and undergraduate programs. When in need of help, students can utilize various College and University wide resources for learning assistance. https://science.psu.edu/current-students/support-network.

Disability Accommodation Statement

Penn State welcomes students with disabilities into the University’s educational programs. Every Penn State campus has an office for students with disabilities. Student Disability Resources (SDR) website provides contact information for every Penn State campus (http://equity.psu.edu/sdr/disability-coordinator). For further information, please visit Student Disability Resources website (http://equity.psu.edu/sdr/).

In order to receive consideration for reasonable accommodations, you must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake interview, and provide documentation: See documentation guidelines (http://equity.psu.edu/sdr/guidelines). If the documentation supports your request for reasonable accommodations, your campus disability services office will provide you with an accommodation letter. Please share this letter with your instructors and discuss the accommodations with them as early as possible. You must follow this process for every semester that you request accommodations.

Counseling & Psychological Services Statement

Many students at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional wellbeing. The university offers a variety of confidential services to help you through difficult times, including individual and group counseling, crisis intervention, consultations, online chats, and mental health screenings. These services are provided by staff who welcome all students and embrace a philosophy respectful of clients’ cultural and religious backgrounds, and sensitive to differences in race, ability, gender identity and sexual orientation.

  • Counseling and Psychological Services at University Park (CAPS): http://studentaffairs.psu.edu/counseling/, 814-863-0395

  • Penn State Crisis Line (24 hours/7 days/week): 877-229-6400

  • Crisis Text Line (24 hours/7 days/week): Text LIONS to 741741

Reporting Educational Equity Concerns

Penn State takes great pride to foster a diverse and inclusive environment for students, faculty, and staff. Acts of intolerance, discrimination, or harassment due to age, ancestry, color, disability, gender, gender identity, national origin, race, religious belief, sexual orientation, or veteran status are not tolerated and can be reported through Educational Equity via the Report Bias webpage (http://equity.psu.edu/reportbias/).