aboutsummaryrefslogtreecommitdiff
path: root/software/chapter.tex
blob: 8bc81aff3754d24f3cd38e42b41d289ce25825b6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods)
% TODO: add MillmanKJarrod2011a (Python for Scientists and Engineers)
% TODO: add vanderWaltStefan2011a (The NumPy Array: A Structure for Efficient Numerical Computation)
% TODO: reference https://www.nsf.gov/pubs/2016/nsf16532/nsf16532.htm (Software Infrastructure for
% Sustained Innovation (SI2: SSE & SSI))
% TODO: http://pubs.acs.org/doi/10.1021/cen-09535-scitech2

\chapter{Software}

\begin{dquote}
  The following guidelines are to be used in the documentation of all software developed in the
  Wright group for the IBM 9000 computer.  %
  These rules have arisen as a necessary consequence of the group's programming philosophy of writing
  software in the form of units which can be readily shared among a number of programmers.  %
  The approach outlined here should help to avoid some of the confusion otherwise produced by several
  persons simultaniously developing and modifying shared software.  %

  % Roger Carlson, Appendix 2.3, Software Development Guidelines
  \dsignature{Roger Carlson, ``Software Development Guidelines'' (1988) \cite{CarlsonRogerJ1988a}}
\end{dquote}

\clearpage

\section{Science needs software}  % ===============================================================

Cutting-edge science increasingly relies on custom software.  %
Software does more than just help scientists analyze data---scientific software enables scientists
to collect, analyze, and model results in ways that would otherwise be wholly impossible.  %

How does scientific software get made?  %
Who makes it, and what is the quality of that product?  %
Much has been written about these questions.  %
To this authors knowledge, there are at least 8 case studies and surveys dedicated to how
scientists develop and use scientific software. \cite{CardDavidN1986a, SeamanCarolynB1997a,
  MullerMatthias2001a, SegalJudith2004a, SegalJudith2005a, CarverJeffreyC2007a,
  HannayJoErskine2009a, PrabuPrakash2011a}  %
Although they focus on different disciplines, and were published at different times, these articles
present a remarkably consistent perspective on what challenges tend to arise when developing
software ``by and for'' scientists.  %

Scientists do more than just use software: they develop it.  %
In their 2008 survey, \textcite{HannayJoErskine2009a} showed just how much of the work of science
comes down to software development:  %
\begin{ditemize}
	\item 84.3\% of surveyed scientists state that developing scientific software is important or
    very important for their own research.
	\item 91.2\% of surveyed scientists state that using scientific software is important or very
    important for their own research.
	\item On average, scientists spend approximately 40\% of their work time using scientific
    software.
	\item On average, scientists spend approximately 30\% of their work time developing scientific
    software.
\end{ditemize}
PrabhuPrakash2011a---35\% developing, breakdown by type of work...

Despite the importance of software to science and scientists, most scientists are not familiar with
basic software engineering concepts.  %
This is in part due to the their general lack of formal training in programming and software
development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software
development through `informal self study', while \textcite{SegalJudith2004a} mentions that
``[scientists] do not describe themselves as software developers and have little formal education
or training in software development''. HannayJoErskine2009a agrees. 

This lack of training is not in-and-of-itself a problem.  %
After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which
they receive no formal training: everything from plumbing and electrical engineering to human
resources and project management.  %
So why pay particular attention to software development practices and skills?  %

One reason to pay special attention to software is that software mistakes can have particularly
dramatic consequences.  %
As experimentalists in the physical sciences, we are often tempted by the intuition that small
mistakes lead to small errors.  %
These intuitions do not typically apply to software---software is ``brittle'' and small bugs have
huge consequences.  %
In his 2015 opinion article ``Rampant software errors may undermine scientific results'', David A.
W. Soergel attempts to estimate how many errors there might be in scientific software, and how far
reaching the consequences might be.  %
Quoting Soergel:

\begin{dquote}
  ...software is profoundly brittle: ``small'' bugs commonly have unbounded error propagation.  %
  A sign error, a missing semicolon, an off-by-one error in matching up two columns of data, etc.
  will render the results complete noise.  %
  It is rare that a software bug would alter a small proportion of the data by a small amount.  %
  More likely, it systematically alters every data point, or occurs in some downstream aggregate
  step with effectively global consequences.  %
  In general, software errors produce outcomes that are inaccurate, not merely imprecise.  %
 
\end{dquote}

On a more positive note, better software development practices may be ``low-hanging-fruit'' that
can greatly improve researcher's lives without huge amounts of investment.  %
Great software makes science easier, faster, and often of higher quality.  %
And making great software isn't necessarily harder than the development practices that scientists
are following today---indeed sometimes it is easier to follow best practices.  %

\section{Challenges in scientific software development}  % ========================================

Software development ``by-and-for'' scientists poses unique challenges.  %

\subsection{Extensibility}  % ---------------------------------------------------------------------

Many traditional software development paradigms demand an upfront articulation of goals and
requirements.  %
This allows the developers to carefully design their software, even before a single line of code is
written.  %
In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collaboration between a team
of researchers and a contracted team of software engineers.  %

\begin{dquote}
  Unlinke traditional commercial software developers, but very much like developers in open source
  projects or startups, scientific programmers usually don't get their requirements from customers,
  and their requirements are rarely frozen.
  In fact, scientists often can't know what their programs should do next until the current version
  has produced some results.

\end{dquote}

\subsection{Testing}  % ---------------------------------------------------------------------------

PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorusly test their programs''

\subsection{Lifetime}  % --------------------------------------------------------------------------

PrabhuPrakash2011a--- subsection ``long history of software development''

\subsection{Optimization}  % ----------------------------------------------------------------------

PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of
parallelization paradigms''

\subsection{Maintenance}  % -----------------------------------------------------------------------

\section{Good-enough practices}  % ================================================================

In their [...] perspective, ``Good enough practices in scientific computing'', (from which this
section gets its name) [WILSON ET AL] describe a set of techniques that, in their words, ``every
researcher can and should consider adopting''.  %

Let the computer do the work...

Write programs for people, not computers.  %

Don't repeat yourself, or others (we built on top of scipy, hdf5).

Plan for mistakes / use testing.

Write first, optimize later.

Document document docuement.

Collaborate.
Code review...
Issues...
Make incremental changes...

\subsection{Data formats}  % ----------------------------------------------------------------------

% HDF5

% SELF-DESCRIBING DATA

% OBJECT ORIENTED PROGRAMMING

\subsection{Version control}  % -------------------------------------------------------------------

% SOURCE CONTROL AND VERSIONING

\subsection{Licensing and distribution}  % --------------------------------------------------------

% LICENSING AND DISTRIBUTION