aboutsummaryrefslogtreecommitdiff
path: root/software/chapter.tex
diff options
context:
space:
mode:
Diffstat (limited to 'software/chapter.tex')
-rw-r--r--software/chapter.tex49
1 files changed, 25 insertions, 24 deletions
diff --git a/software/chapter.tex b/software/chapter.tex
index 0027c5f..b52764b 100644
--- a/software/chapter.tex
+++ b/software/chapter.tex
@@ -1,5 +1,4 @@
% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods)
-% TODO: http://pubs.acs.org/doi/10.1021/cen-09535-scitech2
\chapter{Software}
@@ -146,11 +145,11 @@ parallelization paradigms''
\section{Good-enough practices} % ================================================================
In their [...] perspective, ``Good enough practices in scientific computing'', (from which this
-section gets its name) [WILSON ET AL] describe a set of techniques that, in their words, ``every
-researcher can and should consider adopting''. %
+section gets its name) \textcite{WilsonGreg2017a} describe a set of techniques that, in their
+words, ``every researcher can and should consider adopting''. %
In this section, I attempt to very quickly summarize my personal perspective on what makes good
software development good---with citations to literature that supports each idea. %
-These practices are not, generally, ``extra work''. %
+These practices are not, generally, \emph{extra work}. %
In fact, many of them save massive amounts of time and effort in the long \emph{and} short run,
when properly applied. %
@@ -158,8 +157,8 @@ when properly applied. %
Before you sit down and implement a piece of software, stop! %
First you should try hard to find a library that already has what you need. %
You'll often surprise yourself with what you can find. %
-Search the package repository for your language, such as PyPI [CITE], MATLAB File Exchange [CITE]
-or CRAN [CITE]. %
+Search the package repository for your language, such as PyPI \cite{PyPI}, MATLAB File Exchange
+\cite{FileExchange} or CRAN \cite{CRAN}. %
Even if there is not a full solution to your problem out there, there is almost certainly a
solution to some part of it. %
Much better to have a dependency than a custom implementation. %
@@ -177,8 +176,8 @@ As a general rule, once you have two classes you need multiple files. %
Choose a non-proprietary format if at all possible---remember: you yourself might not have access
to the proprietary software in 10 years. %
Choose plain text if you can. %
-Consider conforming to specifications, such as Tidy Data. [CITE] %
-If you must, use open binary formats such as HDF5. %
+Consider conforming to specifications, such as Tidy Data \cite{WickhamHadley2014a}. %
+If you must, use open binary formats such as HDF5 \cite{FolkMike2011a}. %
Put as much metadata as you can into the file. %
Any piece of metadata that can automatically be added by the computer is essentially free---you
might as well do it. %
@@ -196,12 +195,13 @@ This is indispensable when trying to diagnose software problems. %
In order to use version control as effectively as possible, try to save the package after every
change (feature addition, bugfix, etc). %
Typically version control is coupled with uploading to a remote server, for example using git with
-GitHub [CITE] or git.chem.wisc.edu [CITE], but version control need not be synonymous with
-uploading and distribution. %
+GitHub \cite{GitHub} or git.chem.wisc.edu \cite{git.chem.wisc.edu}, but version control need not be
+synonymous with uploading and distribution. %
Tools like git have a lot of fantastic features beyond simply saving [CITE], but those are beyond the
scope of these ``good enough'' recommendations. %
Also consider defining a version for the software package as a whole. %
-Use semantic versioning [CITE], unless there is a strong reason not to. %
+Use semantic versioning (MAJOR.MINOR.PATCH) \cite{SemanticVersioning}, unless there is a strong
+reason not to. %
If the language you are using has a convention for representing the version programmatically, such
as a \python{__version__} attribute in Python, comply with that convention. %
@@ -220,7 +220,7 @@ unless your project becomes very important. %
Distribute test datasets, when appropriate. %
Remember, your tests can serve double duty as simple minimal examples. %
-\textbf{Collaborate and share.} \cite{WilsonGreg2017a} %
+\textbf{Collaborate and share.} \cite{WilsonGreg2017a, BarnesNick2010a} %
If you are part of a team, consider sharing software and collaborating to create it. %
Try using practices like code review and issue tracking, but don't feel obligated to use them if it
doesn't make sense for your project. %
@@ -228,7 +228,7 @@ When working as part of a team, making incremental changes and using version con
more important. %
Earlier we mentioned ``do not reinvent''. %
The other side of that coin is ``if you make something, consider sharing it''. %
-Put your software on an open platform, like GitHub, and mint a DOI. %
+Put your software on an open platform, like GitHub \cite{GitHub}, and mint a DOI. %
Cite your software, and ask other people who are using your software to do the same. %
Choose a license early, and choose permissive and commercially compatible unless you 1. know what
you are doing and 2. plan to enforce. %
@@ -245,9 +245,11 @@ Try to follow the recommended style for your language, but don't obsess about it
Don't get pulled into the trap of trying to make things perfect the first time. %
Software design is typically a very iterative process, and for good reason. %
Write first, and if it works, consider optimization. %
-If you do need to make your software faster, use profiling tools like cProfile [CITE] and SnakeVis
-[CITE] to empirically determine what operations are taking the longest, rather than trying to guess
-or use intuition. %
+If you do need to make your software faster, use profiling tools like cProfile
+\cite{PythonProfilers} and SnakeViz \cite{SnakeViz} to empirically determine what operations are
+taking the longest, rather than trying to guess or use intuition. %
+Only optimize speed-limiting operations, and stop optimizing once the code runs as quickly as
+needed. %
\section{Object oriented programming} % ----------------------------------------------------------
@@ -364,17 +366,15 @@ Certain metadata conventions were also introduced, including named dimensions.
NetCDF remains popular in the aerospace and
The Flexable Image Transform System (FITS) is a similar format with a focus on visualization and
-backwards compatibility. \cite{WellsDC1981a} %
-% CITE https://fits.gsfc.nasa.gov/
-% CONSIDER CITING https://fits.gsfc.nasa.gov/rfc4047.txt
+backwards compatibility. \cite{FITS, WellsDC1981a} %
Fits is still popular in the astronomy community. %
Today, these hierarchical data formats have gathered under the umbrella of the HDF5 format, built
-and maintained by the HDF Group. [CITE] %
+and maintained by the HDF Group. \cite{FolkMike2011a} %
This format has all of the advantages of FITS, CDF, and NetCDF. %
It can support arbitrary datatypes and is optimized to quickly process large and complex
datasets. %
-In Python, HDF5 is supported primarily through the h5py package. [CITE] %
+In Python, HDF5 is supported primarily through the h5py package. \cite{h5py} %
\section{Scientific Python} % --------------------------------------------------------------------
@@ -383,7 +383,8 @@ SciPy is a collection of ``open-source software for mathematics, science, and eg
SciPy was an absolute essential component of this dissertation and the work it describes. %
There are packages under the SciPy umbrella. %
NumPy is a very powerful and fast package for working with multidimensional arrays.
-\cite{vanderWaltStefan2011a} %
+\cite{OliphantTravisE2006a} %
The SciPy library contains a vast number of scientific computing tools, including many mathematical
-operations that this work depends on. [CITE] %
-Matplotlib is a beautiful visualization package for 1, 2, and 3D plotting. [CITE] %
+operations that this work depends on. \cite{SciPy} %
+Matplotlib is a beautiful visualization package for 1, 2, and 3D plotting.
+\cite{HunterJohnD2007a} %