2018-04-08 16:00

author: Blaise Thompson <blaise@untzag.com> 2018-04-08 15:59:58 -0500
committer: Blaise Thompson <blaise@untzag.com> 2018-04-08 15:59:58 -0500
commit: cbc819350f29661b69a2ee6bf4f1dafcf3e2f621 (patch)
tree: 6f011c2cc53330e889d24db69693219bb6bb7217 /processing/chapter.tex
parent: f1df2688e6f3a9f077ab8e0c670d1615241c3148 (diff)
1 files changed, 208 insertions, 86 deletions
diff --git a/processing/chapter.tex b/processing/chapter.tex
index 2feec22..e430e1d 100644
--- a/processing/chapter.tex
+++ b/processing/chapter.tex
@@ -70,6 +70,16 @@ as has already been done in simulation and acquisition software created in the W
 \clearpage
 \section{Introduction to WrightTools}  % ==========================================================
 
+WrightTools is a moderately complex piece of software ($\sim$10,000 source lines of code), so it is
+important to keep the package internally organized so that users are able to use the pieces they
+need without feeling overwhelmed by the full complexity.  %
+For organizational purposes, WrightTools is designed in a nested, hierarchical manner through heavy
+use of object oriented programming (see section ...).  %
+In this introductory section, I wish to describe the overall structure of WrightTools, without
+going into too much detail.  %
+In this way the reader can have some context in the sections below, where I describe some crucial
+pieces of WrightTools in greater detail.  %
+
 WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural''
 application programming interface (API).  %
 To use WrightTools, simply import:
@@ -81,75 +91,135 @@ To use WrightTools, simply import:
 I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in
 \autoref{pro:sec:distribution}.
 
-% TODO: consider making the following into a table
-
-We can use the builtin Python function \python{dir} to interrogate the contents of the
-WrightTools package.  %
-\begin{codefragment}{python}
->>> dir(wt)
-['Collection',
- 'Data',
- '__branch__',
- '__builtins__',
- '__cached__',
- '__doc__',
- '__file__',
- '__loader__',
- '__name__',
- '__package__',
- '__path__',
- '__spec__',
- '__version__',
- '__wt5_version__',
- '_dataset',
- '_group',
- '_open',
- '_sys',
- 'artists',
- 'collection',
- 'data',
- 'diagrams',
- 'exceptions',
- 'kit',
- 'open',
- 'units']
-\end{codefragment}  % TODO: consider adding fit to this list
-Many of these are dunder (double underscore) attributes---Python internals that are not normally
-used directly.  %
-The ten attributes that do not start with underscore are the public API that users of WrightTools
-typically use.  %
-Within the public API are two classes, \python{Collection} \&
-\python{Data}, which are the two main classes in the WrightTools object model.  %
-\python{Data} stores spectra directly as multidimensional arrays, and
-\python{Collection} stores \textit{groups} of data objects (and other collection
-objects) in a hierarchical way for internal organization purposes.  %
-
-WrightTools uses a programming strategy called object oriented programming (OOP).  %
-% TODO: introduce HDF5
-% TODO: elaborate on the concept of OOP and how it relates to WrightTools
-
-It contains a central data ``container'' that is capable of storing all of the information about
-each multidimensional (or one-dimensional) spectra: the \python{Data} class.  %
-It also defines a \python{Collection} class that contains data objects, collection
-objects, and other pieces of metadata in a hierarchical structure.  %
-Let's first discuss \mintinline{python}{Data}.
-
-All spectra are stored within WrightTools as multidimensional arrays.  %
+\autoref{pro:tab:wt} contains a description of each top-level component within the WrightTools
+package.  %
+Within an interactive python session, we could see these components using the built in \python{dir}
+function: \python{dir(wt)}.  %
+There are several types of component: functions, attributes, classes, modules, and subclasses.  %
+Functions are simple objects that take some input(s), do something, and return something.  %
+For example, the function \python{wt.open}, which accepts a path to a WrightTools dataset file and
+returns an opened version of that file.  %
+Attributes are not interactive, they are simply pieces of attached information that can be
+accessed.  %
+For example, \python{wt.__version__} as used in the code fragment above.  %
+Classes are instructions for construction of particular custom object types, and can be
+instantiated (see section ...).  %
+We'll talk extensively about the five main WrightTools classes: \python{Axis},
+\python{Collection}, \python{Channel}, \python{Data}, and \python{Variable}, in the coming
+pages.  %
+Modules are literally \bash{.py} files within WrightTools, and they themselves contain attributes,
+functions, and classes.  %
+Finally, subpackages are literally folders that contain several \bash{.py} files: several
+modules.  %
+
+WrightTools is designed around a universal ``wt5'' file format.  %
+wt5 files are simply extensions of the hdf5 format, with some additional requirements applied to
+their internal structure.  %
+This puts wt5 files in the same category as other domain-specific heirarchial data formats (see
+section ...).  %
+One of the most important features of the hdf5 paradigm is the ability to access portions of the
+multidimensional arrays at a time.  %
+WrightTools takes full advantage of this, such that the WrightTools package is simply an
+\emph{interface} to the data contained with the wt5 file, and arrays are not stored in memory until
+needed.  %
+
+All spectra are stored within wt5 files as multidimensional arrays.  %
 Arrays are containers that store many instances of the same data type, typically numerical
 datatypes.  %
 These arrays have some \python{shape}, \python{size}, and
 \python{dtype}.  %
 In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs.  %
 
-The \python{Data} class contains everything that is needed to define a single spectra
-from a single experiment (or simulation).  %
+There are two classes which are top-level components of the WrightTools package:
+\python{Collection} and \python{Data}.  %
+
+\python{Data} is arguably the most important class, as it provides the crucial function of
+interfacing to the stored multidimensional arrays that constitute the CMDS datasets.  %
+\python{Data} can be instantiated directly, but typically is instantiated by helper functions
+within the \python{data} subpackage, or by the \python{open} function.  %
+See section ... for more information.  %
+
+\python{Collection} is a container class, charged with storing groups of data objects and other
+collection objects---empowering users to organize their datasets into clearly structured and well
+labeled hierarchies within the wt5 file.  %
+See section ... for more information about \python{Collection}.  %
+
+[PARAGRAPH ABOUT ARTISTS]
+
+[PARAGRAPH ABOUT FIT]
+
+[PARAGRAPH ABOUT DATASETS]
+
+[PARAGRAPH ABOUT DIAGRAMS]
+
+[PARAGRAPH ABOUT EXCEPTIONS]
+
+[PARAGRAPH ABOUT KIT]
+
+\begin{table}
+  \begin{tabular}{c | c | l}
+    & type & description \\ \hline
+    \texttt{Collection} & class & DESCRIPTION TODO \\ \hline
+    \texttt{Data} & class & DESCRIPTION TODO \\ \hline
+    \texttt{artists} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{collection} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{data} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{datasets} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{diagrams} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{exceptions} & module & DESCRPTION TODO \\ \hline
+    \texttt{fit} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{kit} & subpackage & DESCRIPTION TODO \\ \hline
+    \texttt{open} & module & DESCRIPTION TODO \\ \hline
+    \texttt{units} & module & DESCRIPTION TODO \\ \hline
+  \end{tabular}
+  \caption[Components of WrightTools]{
+    Key components of WrightTools, lexicographically listed.
+  }
+  \label{pro:tab:wt}
+\end{table}
+
+I now focus on the \python{Data} class.  %
+\autoref{pro:tab:data} contains a description of each key component of \python{Data}.
+
+\python{Data} can be thought of as a container class that contains everything needed to define a
+single multidimensional spectra.  %
 To do this, each data object contains several multidimensional arrays (typically 2 to 50 arrays,
 depending on the kind of data).  %
 There are two kinds of arrays, instances of \python{Variable} and \python{Channel}.  %
 Variables are coordinate arrays that define the position of each pixel in the multidimensional
 spectrum, and channels are each a particular kind of signal within that spectrum.  %
-Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels
-\python{[pmt, pyro1, pyro2, pyro3]}.  %
+Typical variables might be \python{[labtime, w1, w2, w3, d1, d2]}, and typical channels
+\python{[pmt, pyro1, pyro2, pyro3] }.  %
+The data object contains attributes \python{Data.variables} and \python{Data.channels} which are
+tuples of the instances of \python{Variable}, \python{Channel} contained within that instance of
+\python{Data}.  %
+The data object also has convenience attributes \python{variable_names} and \python{channel_names};
+creation methods \python{create_channel} and \python{create_variable}; and basic manipulation
+methods \python{remove_channel}, \python{remove_variable}, and \python{rename_channels}.  %
+More information about channels and variables will come on the next pages.  %
+
+Variables contain all of the information about where every piece of hardware was at each coordinate
+in the multidimensional dataset, but most of the time users only want to work with data as
+parameterized by a few key variables.  %
+Crucially, the exact choice of parameterization may be context dependent, or multiple
+parameterizations may be desirable. [CITE NEFF-MALLON]  %
+Axes, instances of the WrightTools \python{Axis} class, are easy to use parameterized interfaces to
+the variable arrays.  %
+Axes do not contain any \emph{new} information, they simply contain expressions which describe how
+the variable arrays are accessed when manipulating or displaying the data.  %
+The \python{tansform} method allows users to change these expressions.  %
+Convenience attribute \python{axis_expressions} allows for quick inspection.  %
+See section ... for more information.  %
+
+Besides merely allowing users to access variables and channels, the \python{Data} class allows for
+manipulation and processing.  %
+Many simple data processing tools are methods of \python{Channel} and \python{Variable}, and are
+discussed further later.  %
+The data manipulation methods that \python{Data} contains are more holistic---they are
+manipulations that involve multiple variable and channel arrays.  %
+\python{heal} attempts to ``fill'' holes via multidimensional interpolation.  %
+\python{chop}, \python{collapse}, python{split}, \python{map_axis}, and \python{zoom} change the
+shape of the data object, by slicing, interpolation, or both.  %
 
 \begin{table}
   \begin{tabular}{c | c | l}
@@ -167,7 +237,7 @@ Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels
     \texttt{map\_variable} & method & Map points of a variable to new points using linear interpolation. \\ \hline
     \texttt{natural\_name} & attribute & \\ \hline
     \texttt{ndim} & attribute & \\ \hline
-    \texttt{offset} & method & Offset one variable based on another variables' values. \\ \hline
+    \texttt{offset} & method & Offset one variable based on another variables''''' values. \\ \hline
     \texttt{print\_tree} & method & \\ \hline
     \texttt{remove\_channel} & method & \\ \hline
     \texttt{remove\_variable} & method & \\ \hline
@@ -187,15 +257,43 @@ Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels
   \caption[Attributes and methods of Data.]{
     Key attributes and methods of data, lexicographically listed.
   }
+  \label{pro:tab:data}
 \end{table}
 
-Each data object contains instances of \python{Channel} and \python{Variable} which represent the
-principle multidimensional arrays.  %
-The following lexicographically lists the attributes of these instances.  %
-Certain methods and attributes are unique to only one type of dataset, and are marked as such.  %
-
-Channels and variables also support direct indexing / slicing using \python{__getitem__}, as
-discussed more in...  % TODO: where is it discussed more?
+I now focus on the \python{Channel} and \python{Variable} classes.  %
+These are the principle multidimensional array containers, and each instance of these classes
+corresponds to exactly one multidimensional array.  %
+These two classes share a large amount of functionality, and they both inherit from the parent
+WrightTools \python{Dataset} class, which itself is a child of \python{h5py.Dataset}.  %
+See section ... to understand the concept of inheritance.  %
+
+% TODO: consider demonstrating slicing
+
+\autoref{pro:tab:dataset} contains a description of each key component of the \python{Channel} and
+\python{Variable} classes.  %
+For each component the column ``of'' indicates if it is a shared feature (inherited from
+\python{Dataset}), or unique to one or the other class.  %
+Many of these are attributes which describe the contents or behavior of these arrays.  %
+\python{argmax}, \python{argmin}, \python{max}, and \python{min} are methods that make it easy to
+inspect the most basic features of the array.  %
+The concept of \python{null} as different from zero is unique to channels, and the components
+\python{signed}, \python{mag}, \python{major_extent}, and \python{minor_extent} come in association
+with the null idea.  %
+
+These classes also have basic mathematical manipulation methods, such as \python{log},
+\python{normalize}, and \python{symmetric_sqrt}.  %
+Other operations are supported by in-place operations, as described in section ....  %
+
+Channels and variables inherit from h5py, so they support partial access through slicing
+(\python{__getitem__} syntax).  %
+This means that, in principle, very large datasets can be processed piece-by-piece without loading
+the entire array into memory simultaneously.  %
+This is trivial for ``blind'' operations like taking a logarithm or normalizing, and becomes more
+complex for operations like smoothing and interpolation.  %
+WrightTools offers several methods that try to make it easier to process arrays piecewise.  %
+\python{slices} returns a generator which yields tuples of slice objects for each chunk of the
+array.  %
+\python{chunkwise} accepts a function and executes it on each chunk returned by \python{slices}.  %
 
 \begin{table}
   \begin{tabular}{c | c | c | l}
@@ -229,13 +327,17 @@ discussed more in...  % TODO: where is it discussed more?
   \caption[Attributes and methods of Channel and Variable.]{
     Key attributes and methods of channel and variable, lexicographically listed
   }
+  \label{pro:tab:dataset}
 \end{table}
 
+I now focus on the \python{Axis} class.  %
+\autoref{pro:tab:axis} contains a description of each key component of the \python{Axis} class.  %
+
 Axes are ways to organize data as functional of particular variables (and combinations thereof).  %
 The \python{Axis} class does not directly contain the respective arrays---it merely refers to the
 associated variables.  %
 The flexibility of this association is one of the main new features in WrightTools 3.  %
-It enables data transformation, discussed in section ...  % TODO: link to section
+
 Axis expressions are simple human-friendly strings made up of numbers and variable
 \python{natural_name}s.  %
 Given 5 variables with names \python{['w1', 'w2', 'wm', 'd1', 'd2']}, example valid expressions
@@ -243,7 +345,13 @@ include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \pyt
 \python{'wm-w1+w2'}.  %
 Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the
 ``numpy-like'' attributes.  %
-A lexicographical list of axis attributes and methods follows.
+
+Axes need not be one-dimensional.
+In fact, axes must have the same dimensionality as their parent \python{Data}.  %
+The loosening of the one-dimensional axis requirement is what makes WrightTools data not fully
+structured, but ``semi-structured''.
+
+Section ... decribes the \python{Axis} class in greater detail.  %
 
 \begin{table}
   \begin{tabular}{c | c | l}
@@ -264,6 +372,7 @@ A lexicographical list of axis attributes and methods follows.
   \caption[Attributes and methods of Axis.]{
     Key attributes and methods of axis, lexicographically listed
   }
+  \label{pro:tab:axis}
 \end{table}
 
 \section{Creating a data object}  % ===============================================================
@@ -283,17 +392,9 @@ This modular approach to data creation means that individuals who want to use Wr
 data sources can simply add one function to unlock the capabilities of the entire package as
 applied to their data.  %
 
-Following are the current from-functions, and the types of data that they support.
-\begin{ditemize}
-  \item Cary (collection creation)
-  \item COLORS
-  \item KENT
-  \item PyCMDS
-  \item Ocean Optics
-  \item Shimadzu
-  \item Tensor27
-\end{ditemize}  % TODO: complete list, update wright.tools to be consistent
-  
+\autoref{pro:tab:from_functions} contains the currently supported from functions in
+WrightTools...  %
+
 \subsubsection{Discover dimensions}
 
 Certain older Wright Group file types (COLORS and KENT) are particularly difficult to import using
@@ -308,8 +409,8 @@ The way that WrightTools handles data creation for these file-types deserves spe
 Firstly, WrightTools contains hardcoded column information for each filetype.
 Data from Kent Meyer's ``picosecond control'' software had consistent columns over the lifetime of
 the software, so only one dictionary is needed to store these correspondences.  %
-Skye Kain's ``COLORS'' software used at least 7 different formats, and unfortunately these format
-types were not fully documented.  %
+Schuyler Kain's ``COLORS'' software [CITE] used at least 7 different formats, and unfortunately
+these format types were not fully documented.  %
 WrightTools attempts to guess the COLORS data format by counting the number of columns.  %
 
 Because these file-types are dimensionality limited, there are many acquisitions that span over
@@ -351,6 +452,23 @@ library Qhull.  %
 This strategy can be copied in the future if other non-self-describing data sources are added into
 WrightTools.  %
 
+\begin{table}
+  \begin{tabular}{c | l}
+    function & data source \\ \hline
+    \texttt{wt.collection.from\_CARY} & TODO \\ \hline
+    \texttt{wt.data.from\_COLORS} & TODO \\ \hline
+    \texttt{wt.data.from\_KENT} & TODO \\ \hline
+    \texttt{wt.data.from\_PyCMDS} & TODO \\ \hline
+    \texttt{wt.data.from\_OceanOptics} & TODO \\ \hline
+    \texttt{wt.data.from\_shimamdzu} & TODO \\ \hline
+    \texttt{wt.data.from\_Tensor27} & TODO \\ \hline
+  \end{tabular}
+  \caption[CAPTION TODO]{
+    CAPTION TODO
+  }
+  \label{pro:tab:from_functions}
+\end{table}
+
 \section{Collections}  % ==========================================================================
 
 The WrightTools \python{Collection} class is a container class meant to organize the contents of
@@ -418,6 +536,7 @@ Users can configure which files are routed to which from-function.  %
 
 % TODO (also document on wright.tools)
 
+\clearpage
 \section{Visualizing a data object}  % ============================================================
 
 After importing and manipulating data, one typically wants to create a plot.  %
@@ -468,10 +587,6 @@ and viridis, the new matplotlib default  % TODO: cite
 WrightTools uses the algorithm from Green to define a custom cubehelix colormap with good
 perceptual properties and familiar Wright Group coloration.  %
 
-% TODO: figure like one on wall
-
-% TODO: mention isoluminant
-
 \subsubsection{Interpolation type}
 
 WrightTools data is defined at discrete points, but an entire 2D surface must be defined in order
@@ -553,6 +668,7 @@ introduces the unique strategy that the WrightTools wrapper takes.  %
 
 % TODO: mention gotcha of apparently narrowing linewidths with wigners (how to READ colormaps)
 
+\clearpage
 \section{Variables and channels}  % ===============================================================
 
 Data objects are made up of many component channels and variables, each array having the same
@@ -597,6 +713,7 @@ From a quick inspection, one can see that \python{w1} and \python{wm} were scann
 \python{w3}, \python{d0}, and \python{d1} were not moved at all, yet their coordinates are still
 propagated.  %
 
+\clearpage
 \section{Axes}  % =================================================================================
 
 The axes have the joint shape of their component variables.  %
@@ -758,6 +875,9 @@ can confidently smooth collected data in post to achieve clean results.  %
 This strategy is similar to that accomplished in time domain CMDS where a low-pass filter is
 applied on the very high resolution raw data.  %
 
+% TODO: figure: example of smoothed data
+
+\clearpage
 \section{Dimensionality manipulation}  % ==========================================================
 
 WrightTools offers several strategies for reducing the dimensionality of a data object.  %
@@ -777,7 +897,7 @@ data created at /tmp/lzyjg4au.wt5::/
   axes ('wm', 'w2', 'w1')
   shape (35, 11, 11)
 >>> chopped = data.chop('wm', 'w1')  
-chopped data into 11 piece(s) in ('wm', 'w1')
+chopped data into 11 piece(s) in ('wm', 'w1'')
 >>> chopped.chop000
 <WrightTools.Data 'chop000' ('wm', 'w1') at /tmp/935c2v5a.wt5::/chop000>
 \end{codefragment}
@@ -889,6 +1009,7 @@ Can be used directly...
 Loops through...
 Returns model and outs...
 
+\clearpage
 \section{Construction, maintenance, and distribution}  % ==========================================
 
 While WrightTools has already been useful to the work done in the WrightGroup over the last 3
@@ -981,6 +1102,7 @@ Git...
 
 Unit testing...
 
+\clearpage
 \section{Future directions}  % ====================================================================
 
 Single variable decomposition.  %
author	Blaise Thompson <blaise@untzag.com>	2018-04-08 15:59:58 -0500
committer	Blaise Thompson <blaise@untzag.com>	2018-04-08 15:59:58 -0500
commit	cbc819350f29661b69a2ee6bf4f1dafcf3e2f621 (patch)
tree	6f011c2cc53330e889d24db69693219bb6bb7217 /processing/chapter.tex
parent	f1df2688e6f3a9f077ab8e0c670d1615241c3148 (diff)