To be able to edit code and run cells, you need to run the notebook yourself. Where would you like to run the notebook?

This notebook takes about 1 minute to run.

In the cloud (experimental)

Binder is a free, open source service that runs scientific notebooks in the cloud! It will take a while, usually 2-7 minutes to get a session.

On your computer

(Recommended if you want to store your changes.)

  1. Download the notebook:
  2. Run Pluto

    (Also see: How to install Julia and Pluto)

  3. Open the notebook file

    Type the saved filename in the open box.

Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Preview

Author 1

Astro 528: High-Performance Scientific Computing for Astrophysics (Fall 2023)

md"> Astro 528: High-Performance Scientific Computing for Astrophysics (Fall 2023)"
6.3 ms
TableOfContents(aside=toc_aside)
13.5 μs
     
ChooseDisplayMode()
5.0 ms

ToC on side

md"ToC on side $(@bind toc_aside CheckBox(;default=true))"
146 ms

Week 12 Discussion Topics

  • Reproduciblity & Replicability

    • Code behind the figures

    • Sharing code

  • Package managers & Environments

    • Creating your own package

    • Registering your own package

  • Reproducibile Computing Environments

    • Julia

    • Docker/Singularity

  • Q&A

md"""
# Week 12 Discussion Topics
- Reproduciblity & Replicability
- Code behind the figures
- Sharing code
- Package managers & Environments
- Creating your own package
- Registering your own package
- Reproducibile Computing Environments
- Julia
- Docker/Singularity
- Q&A
"""
17.6 ms

Reproducibility & Replicability

md"""
# Reproducibility & Replicability
"""
191 μs

Data behind the figures

  • AAS Journals Data Guide

  • AAS Web converter

  • NASA grants:

    • ``At a minimum the Data Management Plan (DMP) for ROSES must explain how you will release the data needed to reproduce figures, tables and other representations in publications, at the time of publication. Providing this data via supplementary materials with the journal is one really easy way to do this and it has the advantage that the data and the figures are linked together in perpetuity without any ongoing effort on your part.'' and

    • ``Software, whether a stand-alone program, an enhancement to existing code, or a module that interfaces with existing codes, created as part of a ROSES award, should be made publicly available when it is practical and feasible to do so, and when there is scientific utility in doing so... SMD expects that the source code, with associated documentation sufficient to enable use of the code, will be made publicly available as Open Source Software (OSS) under an appropriately permissive license (e.g., Apache-2, BSD-3-Clause, GPL). This includes all software developed with SMD funding used in the production of data products, as well as software developed to discover, access, visualize, and transform NASA data.'' – NASA SARA DMP FAQ

  • NSF:

    • ``Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.'' – NSF Data Management Plan Requirements

    • ``Providing software to read and analyze scientific data products can greatly increase value of these products. Investigators should use one of many software collaboration sites, like Github.com. These sites enable code sharing, collaboration and documentation at one location.'' – AST-specific Advice to PIs on the DMP

md"""
## Data behind the figures
- [AAS Journals Data Guide](https://journals.aas.org/data-guide/#machine_readable_tables)
- [AAS Web converter](https://authortools.aas.org/MRT/upload.html)

- **NASA grants:**
- ``At a minimum the Data Management Plan (DMP) for ROSES must explain how you will release the data needed to reproduce figures, tables and other representations in publications, at the time of publication. Providing this data via supplementary materials with the journal is one really easy way to do this and it has the advantage that the data and the figures are linked together in perpetuity without any ongoing effort on your part.'' and
- ``Software, whether a stand-alone program, an enhancement to existing code, or a module that interfaces with existing codes, created as part of a ROSES award, should be made publicly available when it is practical and feasible to do so, and when there is scientific utility in doing so... SMD expects that the source code, with associated documentation sufficient to enable use of the code, will be made publicly available as Open Source Software (OSS) under an appropriately permissive license (e.g., Apache-2, BSD-3-Clause, GPL). This includes all software developed with SMD funding used in the production of data products, as well as software developed to discover, access, visualize, and transform NASA data.'' -- [NASA SARA DMP FAQ](https://science.nasa.gov/researchers/sara/faqs/dmp-faq-roses)
8.1 ms

How to share code

Old-school

  • Source code for a few functions published as an appendix.

  • Source code avaliable upon request.

  • Source code avaliable from my website.

Modern

Practical sharing of evolving code:

Archiving of code (& data):

md"""
## How to share code
### Old-school
- Source code for a few functions published as an appendix.
- Source code avaliable upon request.
- Source code avaliable from my website.

### Modern
Practical sharing of evolving code:
- [GitHub](http://github.com/)
- Institutional Git server (e.g., [PSU's GitLab](https://git.psu.edu/help/#getting-started-with-gitlab))
Archiving of code (& data):
- Dedicated archive with
- Long-term plan
- [Digital Object Identifier (DOI)](https://www.doi.org/) for your work
- Standard file format
- Metadata
- Examples:
- [Zenodo](https://zenodo.org/) (by CERN)
- [Dataverse](https://dataverse.harvard.edu/) (by Harvard)
- [ScholarSphere](https://scholarsphere.psu.edu/) (by Penn State Libraries)
8.4 ms

Problems with sharing non-trivial codes

  • Compiling for each processor/OS

  • Linking to libraries

  • Installing libraries that are needed

  • Multi-step instructions (different for each OS) that become out-of-date

md"""
## Problems with sharing non-trivial codes
- Compiling for each processor/OS
- Linking to libraries
- Installing libraries that are needed
- Multi-step instructions (different for each OS) that become out-of-date
"""
480 μs

Package managers

  • Find package you request

  • Indentify dependancies (direct & indirect).

  • Find versions that satisfy all requirements

  • Download requested packaged & dependancies.

  • Install requested packaged & dependancies.

  • Perform any custom build steps.

md"""
# Package managers
- Find package you request
- Indentify dependancies (direct & indirect).
- Find versions that satisfy all requirements
- Download requested packaged & dependancies.
- Install requested packaged & dependancies.
- Perform any custom build steps.
"""
579 μs

What if you have two projects?

  • Could let both projects think that they depend on everything the other depends on.

  • If a dependancy breaks, which project(s) break?

  • What if two projects require different versions?

⇒ Environments

md"""
### What if you have two projects?
- Could let both projects think that they depend on everything the other depends on.
- If a dependancy breaks, which project(s) break?
- What if two projects require different versions?
⇒ Environments
"""
445 μs

Environments

Environments allow you to have multiple versions of packages installed and rapidly specify which versions you want made avaliable for the current session. In Julia,

  • Project.toml: Specifies direct dependencies & version constaints (required)

  • Manifest.toml: Specifies precise version of direct & indirect dependancies, so as to offer a fully reproducible environment (optional)

  • If no Manifest.toml, then package manager can find most recent versions that satisfy Project.toml requirements.

julia starts julia with default environment (separate environment for each minor version number, e.g., 1.9)

julia --project=. or julia --project starts julia using environment specified by Project.toml and Manifest.toml in current directory (if don't exist, will create them).

md"""
## Environments
Environments allow you to have multiple versions of packages installed and rapidly specify which versions you want made avaliable for the current session. In Julia,
- Project.toml: Specifies direct dependencies & version constaints (required)

- Manifest.toml: Specifies precise version of direct & indirect dependancies, so as to offer a fully reproducible environment (optional)

- If no Manifest.toml, then package manager can find most recent versions that satisfy Project.toml requirements.

`julia`
starts julia with default environment (separate environment for each minor version number, e.g., 1.9)

`julia --project=.` or `julia --project` starts julia using environment specified by Project.toml and Manifest.toml in current directory (if don't exist, will create them).

"""
585 μs

What do the Project.toml and Manifest.toml files do?

What is the difference between Project.toml and Manifest.toml?

blockquote(md"""
What do the Project.toml and Manifest.toml files do?
What is the difference between Project.toml and Manifest.toml?""")
6.3 ms

Project.toml from Lab 3:

name = "lab3"
uuid = "3355e5e9-99a6-4e94-be24-d3293f18bccc"
authors = ["Eric Ford <ebf11@psu.edu>"]
version = "0.1.0"

[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
FITSIO = "525bcba6-941b-5504-bd06-fd0dc1a4d2eb"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PlutoTeachingTools = "661c6b06-c737-4d37-b85c-46df65de6f69"
PlutoTest = "cb4044da-4d16-4ffa-a6a3-8cad7f73ebdc"
PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
Query = "1a8c2f83-1ff3-5112-b086-8aa67b057ba1"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
md"""
**Project.toml** from Lab 3:

```code
name = "lab3"
uuid = "3355e5e9-99a6-4e94-be24-d3293f18bccc"
authors = ["Eric Ford <ebf11@psu.edu>"]
version = "0.1.0"

[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
FITSIO = "525bcba6-941b-5504-bd06-fd0dc1a4d2eb"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PlutoTeachingTools = "661c6b06-c737-4d37-b85c-46df65de6f69"
PlutoTest = "cb4044da-4d16-4ffa-a6a3-8cad7f73ebdc"
272 μs

Manifest.toml from Lab 3:

# This file is machine-generated - editing it directly is not advised

[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "84918055d15b3114ede17ac6a7182f68870c16f7"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.3.1"

[[ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[BenchmarkTools]]
deps = ["JSON", "Logging", "Printf", "Statistics", "UUIDs"]
git-tree-sha1 = "aa3aba5ed8f882ed01b71e09ca2ba0f77f44a99e"
uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
version = "1.1.3"

[[Bzip2_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "c3598e525718abcc440f69cc6d5f60dda0a1b61e"
uuid = "6e34b625-4abd-537c-b88f-471c36dfa7a0"
version = "1.0.6+5"

[[CFITSIO]]
deps = ["CFITSIO_jll"]
git-tree-sha1 = "c860f5545064216f86aa3365ec186ce7ced6a935"
uuid = "3b1b4be9-1499-4b22-8d78-7db3344d1961"
version = "1.3.0"

[[CFITSIO_jll]]
deps = ["Artifacts", "JLLWrappers", "LibCURL_jll", "Libdl", "Pkg"]
git-tree-sha1 = "2fabb5fc48d185d104ca7ed7444b475705993447"
uuid = "b3e40c51-02ae-5482-8a39-3ace5868dcf4"
version = "3.49.1+0"

[[CSV]]
deps = ["Dates", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "Tables", "Unicode"]
git-tree-sha1 = "b83aa3f513be680454437a0eee21001607e5d983"
uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
version = "0.8.5"
...
md"""
**Manifest.toml** from Lab 3:
```code
# This file is machine-generated - editing it directly is not advised

[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "84918055d15b3114ede17ac6a7182f68870c16f7"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.3.1"

[[ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[BenchmarkTools]]
deps = ["JSON", "Logging", "Printf", "Statistics", "UUIDs"]
git-tree-sha1 = "aa3aba5ed8f882ed01b71e09ca2ba0f77f44a99e"
uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
version = "1.1.3"

[[Bzip2_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
261 μs

Providing both Project.toml and Manifest.toml for an environment maximizes reproducibility (e.g., for code to reproduce figures in a paper).

But packages that are meant to be imported by others typically provide only a Project.toml, so the package manager can figure out how best to combine packages. Julia's default registry requires packages to provide [compat] constraints for each dependency.

md"""
Providing both `Project.toml` and `Manifest.toml` for an environment maximizes reproducibility (e.g., for code to reproduce figures in a paper).

But packages that are meant to be imported by others typically provide only a `Project.toml`, so the package manager can figure out how best to combine packages. Julia's default registry requires packages to provide `[compat]` constraints for each dependency.
"""
275 μs

Project.toml for a simple registered package.

name = "PlutoTeachingTools"
uuid = "661c6b06-c737-4d37-b85c-46df65de6f69"
authors = ["Eric Ford <ebf11@psu.edu> and contributors"]
version = "0.2.13"

[deps]
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
HypertextLiteral = "ac1192a8-f4b3-4bfe-ba22-af5b92cd3ab2"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
PlutoLinks = "0ff47ea0-7a50-410d-8455-4348d5de0420"
PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
HypertextLiteral = "0.9"
LaTeXStrings = "1"
Latexify = "0.15, 0.16"
PlutoLinks = "0.1.5"
PlutoUI = "0.7"
julia = "1.7, 1.8, 1.9"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]
md"""
`Project.toml` for a simple registered package.
```toml
name = "PlutoTeachingTools"
uuid = "661c6b06-c737-4d37-b85c-46df65de6f69"
authors = ["Eric Ford <ebf11@psu.edu> and contributors"]
version = "0.2.13"

[deps]
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
HypertextLiteral = "ac1192a8-f4b3-4bfe-ba22-af5b92cd3ab2"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
PlutoLinks = "0ff47ea0-7a50-410d-8455-4348d5de0420"
PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
HypertextLiteral = "0.9"
LaTeXStrings = "1"
200 μs

In the readings, they describe package versions as something like x.y.z, what is the difference between x, y, and z, and how do I decide which number my current update should increment?

blockquote(md"""
In the readings, they describe package versions as something like x.y.z, what is the difference between x, y, and z, and how do I decide which number my current update should increment?
""")
206 μs

Semantic Versioning 2.0:

  • X: Major: Can break things, e.g., improve API.

  • Y: Minor: Minor changes, new features, bugfixes, refactoring internals, improvements that are unlikly to break things.

  • Z: Patch: Bugfixes, documentation improvements, low-risk performance upgrades

md"""
#### [Semantic Versioning 2.0](https://semver.org/):
- X: Major: Can break things, e.g., improve API.
- Y: Minor: Minor changes, new features, bugfixes, refactoring internals, improvements that are unlikly to break things.
- Z: Patch: Bugfixes, documentation improvements, low-risk performance upgrades
"""
7.3 ms

[compat] allows developer to specify what versions/upgrades will be allowed.

# A leading caret (^) allows upgrades that would be compatible according to semver
PkgA = "^1.2.3" # [1.2.3, 2.0.0)
PkgB = "^1.2"   # [1.2.0, 2.0.0)
PkgC = "^1"     # [1.0.0, 2.0.0)
PkgD = "^0.2.3" # [0.2.3, 0.3.0)
# ^ is the default
Example = "0.2.1" # [0.2.1, 0.3.0)
# ~ is more restrictive
PkgA = "~1.2.3" # [1.2.3, 1.3.0)
PkgB = "~1.2"   # [1.2.0, 1.3.0)
PkgC = "~1"     # [1.0.0, 2.0.0)
# = requires exact equality
PkgA = "=1.2.3"           # [1.2.3, 1.2.3]
PkgA = "=0.10.1, =0.10.3" # 0.10.1 or 0.10.3
# - allows for ranges
PkgA = "1.2.3 - 4.5.6" # [1.2.3, 4.5.6]
PkgA = "0.2.3 - 4.5.6" # [0.2.3, 4.5.6]
PkgA = "1.2.3 - 4.5"   # 1.2.3 - 4.5.* = [1.2.3, 4.6.0)
PkgA = "1.2.3 - 4"     # 1.2.3 - 4.*.* = [1.2.3, 5.0.0)
PkgA = "1.2 - 4.5"     # 1.2.0 - 4.5.* = [1.2.0, 4.6.0)
PkgA = "1.2 - 4"       # 1.2.0 - 4.*.* = [1.2.0, 5.0.0)

For details and more examples, see documentation.

md"""
`[compat]` allows developer to specify what versions/upgrades will be allowed.
```toml
# A leading caret (^) allows upgrades that would be compatible according to semver
PkgA = "^1.2.3" # [1.2.3, 2.0.0)
PkgB = "^1.2" # [1.2.0, 2.0.0)
PkgC = "^1" # [1.0.0, 2.0.0)
PkgD = "^0.2.3" # [0.2.3, 0.3.0)
# ^ is the default
Example = "0.2.1" # [0.2.1, 0.3.0)
# ~ is more restrictive
PkgA = "~1.2.3" # [1.2.3, 1.3.0)
PkgB = "~1.2" # [1.2.0, 1.3.0)
PkgC = "~1" # [1.0.0, 2.0.0)
# = requires exact equality
PkgA = "=1.2.3" # [1.2.3, 1.2.3]
PkgA = "=0.10.1, =0.10.3" # 0.10.1 or 0.10.3
# - allows for ranges
PkgA = "1.2.3 - 4.5.6" # [1.2.3, 4.5.6]
PkgA = "0.2.3 - 4.5.6" # [0.2.3, 4.5.6]
PkgA = "1.2.3 - 4.5" # 1.2.3 - 4.5.* = [1.2.3, 4.6.0)
302 μs

Pluto & Package Management/Environments

Pluto has it's own package manager!

  • Automatically creates a new temporary environment for each notebook, based on where it sees using or import and a package name.

    • Great for reproducibility

    • Adds a little extra startup time

  • Each notebook embeds a Project.toml and Manifest.toml

  • Can edit embedded environment

import Pkg, Pluto
Pluto.activate_notebook_environment("~/Documents/hello.jl")
Pkg.update()

You can disable Pluto's package manager and use Julia's default package manager by including Pkg.activate(path) anywhere in notebook (as code, not as text).

begin
    import Pkg
    # activate an existing project environment that 
	# can be shared across multiple sessions and/or notebooks
    Pkg.activate(Base.current_project())
	# load packages that are included in the existing Project.toml & installed
    using Plots, PlutoUI, LinearAlgebra
end
  • This reduces startup cost by reusing an existing environment

  • But all packages to be used by the notebook must be included in the specified Project.toml and already installed locally.

md"""
## Pluto & Package Management/Environments
Pluto has [it's own package manager](https://github.com/fonsp/Pluto.jl/wiki/%F0%9F%8E%81-Package-management)!
- Automatically creates a new temporary environment for each notebook, based on where it sees `using` or `import` and a package name.
- Great for reproducibility
- Adds a little extra startup time
- Each notebook embeds a Project.toml and Manifest.toml
- Can edit embedded environment
```
import Pkg, Pluto
Pluto.activate_notebook_environment("~/Documents/hello.jl")
Pkg.update()
```


You can disable Pluto's package manager and use Julia's default package manager by including `Pkg.activate(path)` anywhere in notebook (as code, not as text).
```julia
begin
import Pkg
# activate an existing project environment that
22.4 ms
Loading...