From ac6bde61f90e6684b5b5b79286ebe58d08c09f9c Mon Sep 17 00:00:00 2001 From: Blaise Thompson Date: Mon, 19 Mar 2018 17:11:06 -0500 Subject: 2018-03-19 17:11 --- processing/chapter.tex | 223 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 205 insertions(+), 18 deletions(-) (limited to 'processing') diff --git a/processing/chapter.tex b/processing/chapter.tex index 9c9ccab..0e0e5cb 100644 --- a/processing/chapter.tex +++ b/processing/chapter.tex @@ -17,7 +17,6 @@ WrightTools is a software package at the heart of all work in the Wright Group. % TODO: more intro - WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural'' application programming interface (API). % To use WrightTools, simply import: @@ -29,7 +28,7 @@ To use WrightTools, simply import: I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in \autoref{sec:processing_distbribution}. -We can use the builtin Python function \mintinline{python}{dir} to interrogate the contents of the +We can use the builtin Python function \python{dir} to interrogate the contents of the WrightTools package. % \begin{codefragment}{python} >>> dir(wt) @@ -59,53 +58,240 @@ WrightTools package. % 'kit', 'open', 'units'] -\end{codefragment} +\end{codefragment} % TODO: consider adding fit to this list Many of these are dunder (double underscore) attributes---Python internals that are not normally used directly. % The ten attributes that do not start with underscore are the public API that users of WrightTools typically use. % -Within the public API are two classes, \mintinline{python}{Collection} \& -\mintinline{python}{Data}, which are the two main classes in the WrightTools object model. % -\mintinline{python}{Data} stores spectra directly as multidimensional arrays, and -\mintinline{python}{Collection} stores \textit{groups} of data objects (and other collection +Within the public API are two classes, \python{Collection} \& +\python{Data}, which are the two main classes in the WrightTools object model. % +\python{Data} stores spectra directly as multidimensional arrays, and +\python{Collection} stores \textit{groups} of data objects (and other collection objects) in a hierarchical way for internal organization purposes. % \section{Data object model} % ==================================================================== WrightTools uses a programming strategy called object oriented programming (OOP). % +% TODO: introduce HDF5 +% TODO: elaborate on the concept of OOP and how it relates to WrightTools It contains a central data ``container'' that is capable of storing all of the information about -each multidimensional (or one-dimensional) spectra: the \mintinline{python}{Data} class. % -It also defines a \mintinline{python}{Collection} class that contains data objects, collection +each multidimensional (or one-dimensional) spectra: the \python{Data} class. % +It also defines a \python{Collection} class that contains data objects, collection objects, and other pieces of metadata in a hierarchical structure. % Let's first discuss \mitinline{python}{Data}. All spectra are stored within WrightTools as multidimensional arrays. % Arrays are containers that store many instances of the same data type, typically numerical datatypes. % -These arrays have some \mintinline{python}{shape}, \mintinline{python}{size}, and -\mintinline{python}{size}. % +These arrays have some \python{shape}, \python{size}, and +\python{dtype}. % In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs. % -The \mintinline{python}{Data} class contains everything that is needed to define a single spectra +The \python{Data} class contains everything that is needed to define a single spectra from a single experiment (or simulation). % To do this, each data object contains several multidimensional arrays (typically 2 to 50 arrays, depending on the kind of data). % -There are two kinds of arrays, instances of \mintinline{python}{Variable} and -\mintinline{python}{Channel}. % +There are two kinds of arrays, instances of \python{Variable} and \python{Channel}. % Variables are coordinate arrays that define the position of each pixel in the multidimensional spectrum, and channels are each a particular kind of signal within that spectrum. % -Typical variables might be \mintinline{python}{[w1, w2, w3, d1, d2]}, and typical channels -\mintinline{python}{[pmt, pyro1, pyro2, pyro3]}. % +Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels +\python{[pmt, pyro1, pyro2, pyro3]}. % + +As an overview, the following lexicographically lists the attributes and methods of +\python{Data}. % +\begin{ditemize} + \item method \python{collapse}: Collapse along one dimension in a well-defined way. + \item method \python{convert}: Convert all axes of a certain kind. + \item method \python{create_channel}: Create a new channel. + \item method \python{create_variable}: Create a new variable. + \item method \python{fullpath} + \item method \python{get_nadir} + \item method \python{get_zenith} + \item method \python{heal} + \item attribute \python{kind} + \item method \python{level} + \item method \python{map_variable} + \item attribute \python{natural_name} + \item attribute \python{ndim} + \item method \python{offset} + \item method \python{print_tree} + \item method \python{remove_channel} + \item method \python{remove_variable} + \item method \python{rename_channels} + \item method \python{rename_variables} + \item attribute \python{shape} + \item method \python{share_nans} + \item attribute \python{size} + \item method \python{smooth} + \item attribute \python{source} + \item method \python{split} + \item method \python{transform} + \item attribute \python{units} + \item attribute \python{variable_names} + \item attribute \python{variables} + \item method \python{zoom} +\end{ditemize} + +Each data object contains instances of \python{Channel} and \python{Variable} which represent the +principle multidimensional arrays. % +The following lexicographically lists the attributes of these instances. % +Certain methods and attributes are unique to only one type of dataset, and are marked as such. % +\begin{ditemize} + \item method \python{argmax} + \item method \python{argmin} + \item method \python{chunkwise} + \item method \python{clip} + \item method \python{convert} + \item attribute \python{full} + \item attribute \python{fullpath} + \item attribute \python{label} (variable only) + \item method \python{log} + \item method \python{log10} + \item method \python{log2} + \item method \python{mag} + \item attribute \python{major_extent} (channel only) + \item method \python{max} + \item method \python{min} + \item attribute \python{minor_extent} (channel only) + \item attribute \python{natural_name} + \item method \python{normalize} (channel only) + \item attribute \python{null} (channel only) + \item attribute \python{parent} + \item attribute \python{points} + \item attribute \python{signed} (channel only) + \item method \python{slices} + \item method \python{symmetric_root} + \item method \python{trim} (channel only) +\end{ditemize} +Channels and variables also support direct indexing / slicing using \python{__getitem__}, as +discussed more in... % TODO: where is it discussed more? + +Axes are ways to organize data as functional of particular variables (and combinations thereof). % +The \python{Axis} class does not directly contain the respective arrays---it refers to the +associated variables. % +The flexibility of this association is one of the main new features in WrightTools 3. % +Axis expressions are simple human-friendly strings made up of numbers and variable +\python{natural_name}s. % +Given 5 variables with names \python{['w1', 'w2', 'wm', 'd1', 'd2']}, example valid expressions +include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \python{'d1-d2'}, and +\python{'wm-w1+w2'}. % +Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the +``numpy-like'' attributes. % +A lexicographical list of axis attributes and methods follows. +\begin{ditemize} + \item attribute \python{full} + \item attribute \python{label} + \item attribute \python{natural_name} + \item attribute \python{ndim} + \item attribute \python{points} + \item attribute \python{shape} + \item attribute \python{size} + \item attribute \python{units_kind} + \item attribute \python{variables} + \item method \python{convert} + \item method \python{min} + \item method \python{max} +\end{ditemize} % TODO: actually lexicographical + +\subsection{Creating a data object} % ------------------------------------------------------------ + +WrightTools data objects are capable of storing arbitrary multidimensional spectra, but how can we +actually get data into WrightTools? % +If you start with a wt5 file, the answer is easy: \python{wt.open()}. % +But what if you have data that was written using some other software? % +WrightTools offers data conversion functions (``from'' functions) that do the hard work of creating +data objects from other files. % +These from-functions are as parameter free as possible, which means they recognize details like +shape and units from each specific file format without manual user intervention. % + +The most important thing about from-functions is that they are extensible: that is, that more +from-functions can be easily added as needed. % +This modular approach to data creation means that individuals who want to use WrightTools for new +data sources can simply add one function to unlock the capabilities of the entire package as +applied to their data. % + +Following are the current from-functions, and the types of data that they support. +\begin{ditemize} + \item Cary (collection creation) + \item COLORS + \item KENT + \item PyCMDS + \item Ocean Optics + \item Shimadzu + \item Tensor27 +\end{ditemize} % TODO: complete list, update wright.tools to be consistent + +\subsubsection{Discover dimensions} + +Certain older Wright Group file types (COLORS and KENT) are particularly difficult to import using +a parameter-free from-function. % +There are two problems: +\begin{ditemize} + \item Dimensionality limitation to individual files (1D for KENT, 2D for COLORS). + \item Lack of self-describing metadata. +\end{ditemize} +The way that WrightTools handles data creation for these file-types deserves special discussion. % + +Firstly, WrightTools contains hardcoded column information for each filetype... +For COLORS... % TODO + +Secondly, WrightTools accepts a list of files which it stacks together to form a single large +array. % + +Finally, the \python{wt.kit.discover_dimensions} function is called. % +This function does its best to recognize the parameters of the original scan... % TODO + +\subsubsection{From directory} + +% TODO (also document on wright.tools) + +\subsection{Math} % ------------------------------------------------------------------------------ + +Now that we know the basics of how the WrightTools \python{Data} class stores data, it's time to do +some data manipulation. % +Let's start with some elementary algebra. % + +\subsubsection{In place operators} + +Operators are... % TODO +Because the \python{Data} object is mostly stored outside of memory, it is better to do +in-place... % TODO + +Broadcasting... % TODO + +\subsubsection{Clip} + +% TODO + +\subsubsection{Symmetric root} + +% TODO + +\subsubsection{Log} + +% TODO \subsection{Dimensionality manipulation} % ------------------------------------------------------- +WrightTools offers several strategies for reducing the dimensionality of a data object. % +Also consider using the fit sub-package. % TODO: more info, link to section + \subsubsection{Chop} +Chop is one of the most important methods of data, although it is typically not called directly by +users of WrightTools. % + \subsubsection{Collapse} \subsubsection{Split} +\subsubsection{Join} + +\subsection{The wt5 file format} % --------------------------------------------------------------- + +Since WrightTools is based on the hdf5 file format... % TODO + \section{Artists} % ============================================================================== After importing and manipulating data, one typically wants to create a plot. % @@ -134,9 +320,10 @@ In the future, other libraries (e.g. mayavi), may be incorporated. % \section{Fitting} % ============================================================================== - - \section{Distribution and licensing} \label{sec:processing_disbribution} % ======================= +WrightTools is MIT licensed. % + +WrightTools is distributed on PyPI and conda-forge. \section{Future directions} % ==================================================================== \ No newline at end of file -- cgit v1.2.3