aboutsummaryrefslogtreecommitdiff
path: root/software
diff options
context:
space:
mode:
authorBlaise Thompson <blaise@untzag.com>2018-04-05 09:47:06 -0500
committerBlaise Thompson <blaise@untzag.com>2018-04-05 09:47:06 -0500
commitf8c9747d3b6425b420839ff06931b63692318f03 (patch)
tree1d701277b4798a07deea8913aac06d90f4cf1d8c /software
parent8ef39cd4054e408700b96beff9ee6cbc5df32d6b (diff)
2018-04-05 09:47
Diffstat (limited to 'software')
-rw-r--r--software/chapter.tex66
1 files changed, 44 insertions, 22 deletions
diff --git a/software/chapter.tex b/software/chapter.tex
index b52764b..b235ecb 100644
--- a/software/chapter.tex
+++ b/software/chapter.tex
@@ -1,6 +1,4 @@
-% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods)
-
-\chapter{Software}
+\chapter{Software} \label{cha:sof}
\begin{dquote}
The following guidelines are to be used in the documentation of all software developed in the
@@ -55,8 +53,8 @@ basic software engineering concepts. %
This is in part due to the their general lack of formal training in programming and software
development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software
development through `informal self study', while \textcite{SegalJudith2004a} mentions that
-``[scientists] do not describe themselves as software developers and have little formal education
-or training in software development''. HannayJoErskine2009a agrees. JoppaLucasN2013a aggrees.
+\emph{``[scientists] do not describe themselves as software developers and have little formal
+ education or training in software development''}.
This lack of training is not in-and-of-itself a problem. %
After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which
@@ -105,7 +103,15 @@ Software development ``by-and-for'' scientists poses unique challenges. %
In this section, I attempt to summarize the literature about these challenges, with a focus on
those challenges that I have found most relevant. %
-\textbf{Extensibility.} % TODO: cite
+\textbf{``End-user developers.''} \cite{SegalJudith2005a, HannayJoErskine2009a, JoppaLucasN2013a}
+% TODO: see Joppa ref 17, 21 22
+Typically the developers of scientific software are not trained software developers. %
+This is perfectly appropriate, because scientific software development typically requires a large
+amount of domain knowledge that only ``end-users'' possess. %
+Software development practices may not be valued in a scientific environment. %
+
+\textbf{Extensibility.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a,
+ PrabhuPrakash2011a}
Many traditional software development paradigms demand an upfront articulation of goals and
requirements. %
This allows the developers to carefully design their software, even before a single line of code is
@@ -122,13 +128,18 @@ of researchers and a contracted team of software engineers. %
\end{dquote}
-PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorously test their programs''
+Scientific software is \emph{explorative}, and it needs to be flexible and extendable. %
+Scientific software developers cannot know what will be required before they set out to try. %
+This is probably the most fundamental challenge in such projects, and a big part of why science
+cannot simply ``contract out'' a large part of its software development needs. %
+Sometimes, a scientific problem is worked out though the iterative process of developing software
+to solve it. %
-\textbf{Lifetime.}
-PrabhuPrakash2011a--- subsection ``long history of software development''
-Challenges with portability, and updating to ``modern standards''.
+\textbf{Lifetime.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a}
+Many scientific software projects have long life cycles, measured in decades or more. %
+Challenges with portability, and updating to ``modern standards''. %
-\textbf{Maintenance}
+\textbf{Maintenance.} \cite{PrabhuPrakash2011a}
Scientific software, especially software maintained by graduate students, tends to be very hard to
maintain. %
This problem is compounded by the long lifetime of such software, and the poorly defined
@@ -138,9 +149,17 @@ written by generation upon generation of student. %
Worse, software is sometimes abandoned or left untouched to become a crucial but arcane component
of a scientific research project. %
-\textbf{Optimization}
-PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of
-parallelization paradigms''
+\textbf{Testing.} \cite{SandersRebecca2008a, PrabhuPrakash2011a}
+Testing is a huge part of software development practices, but many researchers do not engage in
+sufficient testing of their software... %
+The issue of testing is also consistent with the system of peer review...
+Software is not typically peer reviewed...
+Especially for domain-specific computational software, determining the ``correct outcome'' to test
+against is often infeasible. %
+
+\textbf{Optimization.} \cite{PrabhuPrakash2011a}
+Scientists do not optimize for the common case. %
+Scientists are unaware of parallelization paradigms. %
\section{Good-enough practices} % ================================================================
@@ -151,7 +170,7 @@ In this section, I attempt to very quickly summarize my personal perspective on
software development good---with citations to literature that supports each idea. %
These practices are not, generally, \emph{extra work}. %
In fact, many of them save massive amounts of time and effort in the long \emph{and} short run,
-when properly applied. %
+when properly applied. \cite{WilsonGreg2006a} %
\textbf{Do not reinvent.} \cite{WilsonGreg2017a} %
Before you sit down and implement a piece of software, stop! %
@@ -172,7 +191,7 @@ that accepts a set of arguments. %
If your software package grows to contain multiple files, make those files modular. %
As a general rule, once you have two classes you need multiple files. %
-\textbf{Choose good data formats.} \cite{WilsonGreg2017a} %
+\textbf{Choose good data formats.} \cite{BaxterSusanM2006a, WilsonGreg2017a} %
Choose a non-proprietary format if at all possible---remember: you yourself might not have access
to the proprietary software in 10 years. %
Choose plain text if you can. %
@@ -185,7 +204,7 @@ Make sure that it is clear what each piece of data means. %
For tabular data, use headers. %
Don't forget units. %
-\textbf{Use version control.} %
+\textbf{Use version control.} \cite{BaxterSusanM2006a, WilsonGreg2006a} %
Version control systems allow programmers to save a software package such that they can always
return to that save point. %
All of the files in the package are saved together. %
@@ -205,7 +224,7 @@ reason not to. %
If the language you are using has a convention for representing the version programmatically, such
as a \python{__version__} attribute in Python, comply with that convention. %
-\textbf{Test.} \cite{WilsonGreg2017a} %
+\textbf{Test.} \cite{BaxterSusanM2006a, WilsonGreg2006a, WilsonGreg2017a} %
As the old saying goes, ``if it's not tested, it's broken''. %
If you rely on a piece of functionality in your software, consider writing a test that defines that
functionality. %
@@ -220,7 +239,7 @@ unless your project becomes very important. %
Distribute test datasets, when appropriate. %
Remember, your tests can serve double duty as simple minimal examples. %
-\textbf{Collaborate and share.} \cite{WilsonGreg2017a, BarnesNick2010a} %
+\textbf{Collaborate and share.} \cite{BaxterSusanM2006a, WilsonGreg2017a, BarnesNick2010a} %
If you are part of a team, consider sharing software and collaborating to create it. %
Try using practices like code review and issue tracking, but don't feel obligated to use them if it
doesn't make sense for your project. %
@@ -232,7 +251,10 @@ Put your software on an open platform, like GitHub \cite{GitHub}, and mint a DOI
Cite your software, and ask other people who are using your software to do the same. %
Choose a license early, and choose permissive and commercially compatible unless you 1. know what
you are doing and 2. plan to enforce. %
-% TODO: cite 'publish your code it is good enough'
+Afraid to share because your code needs more polish? %
+If your software is good enough to be used in active scientific research, it's worth sharing. %
+As Nick Barnes says, \emph{``Publish your computer code: it is good enough''}.
+\cite{BarnesNick2010a} %
\textbf{Write human readable code, and document it well.} \cite{WilsonGreg2017a} %
Let the computer do the work, but write the program to be read by a human. %
@@ -290,9 +312,9 @@ class Person():
Now I can make some instances of that class, and access their attributes and methods. %
\begin{codefragment}{python}
>>> mary = Person(name='Mary', favorite_food='pizza', hated_food='falafel')
->>> jane = Person(name='Jane', favorite_food='salad')
+>>> jane = Person(name='Jane', favorite_food='salad'')
>>> mary.react_to('falafel')
-'gross---no thank you'''
+'gross---no thank you''''''
>>> jane.react_to('salad')
'yum! my favorite'
>>> mary.favorite_food