From d80603d70f1dbd7bc12f9fe2d67ab25f4b60cc56 Mon Sep 17 00:00:00 2001 From: Blaise Thompson Date: Thu, 5 Apr 2018 12:09:16 -0500 Subject: 2018-04-05 12:09 --- software/chapter.tex | 61 +++++++++++++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 27 deletions(-) (limited to 'software') diff --git a/software/chapter.tex b/software/chapter.tex index b235ecb..3ab3831 100644 --- a/software/chapter.tex +++ b/software/chapter.tex @@ -19,16 +19,16 @@ \section{Science needs software} % =============================================================== Cutting-edge science increasingly relies on custom software. % -Software does more than just help scientists analyze data---scientific software enables scientists -to collect, analyze, and model results in ways that would otherwise be wholly impossible. % +Scientific software enables scientists to collect, analyze, and model results in ways that would +otherwise be wholly impossible. % How does scientific software get made? % Who makes it, and what is the quality of that product? % Much has been written about these questions. % -To this authors knowledge, there are at least 8 case studies and surveys dedicated to how -scientists develop and use scientific software. \cite{CardDavidN1986a, SeamanCarolynB1997a, - MullerMatthiasM2001a, SegalJudith2004a, SegalJudith2005a, CarverJeffreyC2007a, - HannayJoErskine2009a, PrabuPrakash2011a} % +To my knowledge, there are at least 8 case studies and surveys dedicated to how scientists develop +and use scientific software. \cite{CardDavidN1986a, SeamanCarolynB1997a, MullerMatthiasM2001a, + SegalJudith2004a, SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a, + PrabuPrakash2011a} % Although they focus on different disciplines, and were published at different times, these articles present a remarkably consistent perspective on what challenges tend to arise when developing software ``by and for'' scientists. % @@ -47,6 +47,7 @@ comes down to software development: % software. \end{ditemize} PrabhuPrakash2011a---35\% developing, breakdown by type of work... +% TODO: finish this paragraph Despite the importance of software to science and scientists, most scientists are not familiar with basic software engineering concepts. % @@ -88,13 +89,13 @@ On a more positive note, better software development practices may be ``low-hang can greatly improve researcher's lives without huge amounts of investment. % Great software makes science easier, faster, and often of higher quality. % And making great software isn't necessarily harder than the development practices that scientists -are following today---indeed sometimes it is easier to follow best practices. % +are following today---indeed sometimes it is easier to follow best practices. % TODO: cite wilson In the United States, funding agencies have recognized the crucial role that software plays in science. % The National Science Foundation has a long-running ``Software Infrastructure for Sustained -Innovation'' (SI$^2$) program, which endeavors to take a ``leadership role in providing software as -enabling infrastructure for science and engineering research'' [CITE https://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf]. +Innovation'' (SI$^2$) program, which endeavors to take a \emph{``leadership role in providing software as +enabling infrastructure for science and engineering research''} [CITE https://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf]. % https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503489 \section{Challenges in scientific software development} % ======================================== @@ -109,8 +110,14 @@ Typically the developers of scientific software are not trained software develop This is perfectly appropriate, because scientific software development typically requires a large amount of domain knowledge that only ``end-users'' possess. % Software development practices may not be valued in a scientific environment. % +End-users may lack the skill and knowledge required to develop high quality, maintainable +software. % +They may not be aware of best practices in software development. % +They focus on feature additions and neglect documentation and maintenance. % +% BJT: what are the consequences of end-users -\textbf{Extensibility.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a, + +\textbf{Shifting goals.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a, PrabhuPrakash2011a} Many traditional software development paradigms demand an upfront articulation of goals and requirements. % @@ -120,7 +127,8 @@ In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collabora of researchers and a contracted team of software engineers. % \begin{dquote} - Unlinke traditional commercial software developers, but very much like developers in open source + + Unlike traditional commercial software developers, but very much like developers in open source projects or startups, scientific programmers usually don't get their requirements from customers, and their requirements are rarely frozen. In fact, scientists often can't know what their programs should do next until the current version @@ -135,15 +143,12 @@ cannot simply ``contract out'' a large part of its software development needs. Sometimes, a scientific problem is worked out though the iterative process of developing software to solve it. % -\textbf{Lifetime.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a} -Many scientific software projects have long life cycles, measured in decades or more. % -Challenges with portability, and updating to ``modern standards''. % - -\textbf{Maintenance.} \cite{PrabhuPrakash2011a} -Scientific software, especially software maintained by graduate students, tends to be very hard to -maintain. % -This problem is compounded by the long lifetime of such software, and the poorly defined -requirements and lack of documentation and testing. % +\textbf{Maintenance.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a} +Scientific software is famously hard to maintain. % +Graduate students graduate, and institutional knowledge about the internal workings of software +projects is diminished over time. % +This problem is compounded by the long lifetime of some software, the poorly defined +requirements, and lack of documentation and testing. % Often times, scientific software ends up being a mess of layer upon layer of incongruent pieces written by generation upon generation of student. % Worse, software is sometimes abandoned or left untouched to become a crucial but arcane component @@ -163,9 +168,9 @@ Scientists are unaware of parallelization paradigms. % \section{Good-enough practices} % ================================================================ -In their [...] perspective, ``Good enough practices in scientific computing'', (from which this +In their 2017 perspective, ``Good enough practices in scientific computing'', (from which this section gets its name) \textcite{WilsonGreg2017a} describe a set of techniques that, in their -words, ``every researcher can and should consider adopting''. % +words, \emph{``every researcher can and should consider adopting''}. % In this section, I attempt to very quickly summarize my personal perspective on what makes good software development good---with citations to literature that supports each idea. % These practices are not, generally, \emph{extra work}. % @@ -188,6 +193,8 @@ If you do need to write some software, make sure that you do not duplicate code work. % Instead of writing the same 10 lines of code again and again with small tweaks, write a function that accepts a set of arguments. % +If you are doing the same operation in many different contexts, consider defining a library to that +operation that can be imported and shared between your different projects. % If your software package grows to contain multiple files, make those files modular. % As a general rule, once you have two classes you need multiple files. % @@ -208,14 +215,14 @@ Don't forget units. % Version control systems allow programmers to save a software package such that they can always return to that save point. % All of the files in the package are saved together. % -Modern version control systems allow programmers to see exactly what has changed between each save +These systems also allow programmers to see exactly what has changed between each save point, and since the last save point. % This is indispensable when trying to diagnose software problems. % In order to use version control as effectively as possible, try to save the package after every change (feature addition, bugfix, etc). % Typically version control is coupled with uploading to a remote server, for example using git with -GitHub \cite{GitHub} or git.chem.wisc.edu \cite{git.chem.wisc.edu}, but version control need not be -synonymous with uploading and distribution. % +GitHub \cite{GitHub}, GitLab [CITE] or git.chem.wisc.edu \cite{git.chem.wisc.edu}, but version +control need not be synonymous with uploading and distribution. % Tools like git have a lot of fantastic features beyond simply saving [CITE], but those are beyond the scope of these ``good enough'' recommendations. % Also consider defining a version for the software package as a whole. % @@ -231,7 +238,7 @@ functionality. % In this way, as you make changes you can run your tests to ensure that those changes do not accidentally break important functionality. % Testing sounds difficult, but it's really just about writing simple functions that use your -software to do something, and then raise an exception if the result is not correct. % +software to do something, and then asking if the result is correct. % If you add tests when you add features or fix bugs, you'll quickly find that you have a lot of tests that do a good job of defining the expected behavior of your software. % Software engineers tend to be dogmatic about testing, but don't worry too much about test coverage @@ -266,7 +273,7 @@ Try to follow the recommended style for your language, but don't obsess about it \textbf{Avoid premature optimization.} \cite{WilsonGreg2017a} Don't get pulled into the trap of trying to make things perfect the first time. % Software design is typically a very iterative process, and for good reason. % -Write first, and if it works, consider optimization. % +Write for correctness first, and if it works, consider optimization. % If you do need to make your software faster, use profiling tools like cProfile \cite{PythonProfilers} and SnakeViz \cite{SnakeViz} to empirically determine what operations are taking the longest, rather than trying to guess or use intuition. % -- cgit v1.2.3