\chapter{Processing} % TODO: cool quote, if I can think of one \clearpage From a data science perspective, CMDS has several unique challenges: \begin{ditemize} \item Dimensionality of datasets can typically be greater than two, complicating \textbf{representation}. \item Shape and dimensionality change... \item Data can be large (over one million points). % TODO: contextualize large (not BIG DATA) \end{ditemize} I have designed a software package that directly addresses these issues. % WrightTools is a software package at the heart of all work in the Wright Group. % % TODO: more intro WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural'' application programming interface (API). % To use WrightTools, simply import: \begin{codefragment}{python} >>> import WrightTools as wt >>> wt.__version__ 3.0.0 \end{codefragment} I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in \autoref{sec:processing_distbribution}. We can use the builtin Python function \python{dir} to interrogate the contents of the WrightTools package. % \begin{codefragment}{python} >>> dir(wt) ['Collection', 'Data', '__branch__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '__wt5_version__', '_dataset', '_group', '_open', '_sys', 'artists', 'collection', 'data', 'diagrams', 'exceptions', 'kit', 'open', 'units'] \end{codefragment} % TODO: consider adding fit to this list Many of these are dunder (double underscore) attributes---Python internals that are not normally used directly. % The ten attributes that do not start with underscore are the public API that users of WrightTools typically use. % Within the public API are two classes, \python{Collection} \& \python{Data}, which are the two main classes in the WrightTools object model. % \python{Data} stores spectra directly as multidimensional arrays, and \python{Collection} stores \textit{groups} of data objects (and other collection objects) in a hierarchical way for internal organization purposes. % \section{Data object model} % ==================================================================== WrightTools uses a programming strategy called object oriented programming (OOP). % % TODO: introduce HDF5 % TODO: elaborate on the concept of OOP and how it relates to WrightTools It contains a central data ``container'' that is capable of storing all of the information about each multidimensional (or one-dimensional) spectra: the \python{Data} class. % It also defines a \python{Collection} class that contains data objects, collection objects, and other pieces of metadata in a hierarchical structure. % Let's first discuss \mitinline{python}{Data}. All spectra are stored within WrightTools as multidimensional arrays. % Arrays are containers that store many instances of the same data type, typically numerical datatypes. % These arrays have some \python{shape}, \python{size}, and \python{dtype}. % In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs. % The \python{Data} class contains everything that is needed to define a single spectra from a single experiment (or simulation). % To do this, each data object contains several multidimensional arrays (typically 2 to 50 arrays, depending on the kind of data). % There are two kinds of arrays, instances of \python{Variable} and \python{Channel}. % Variables are coordinate arrays that define the position of each pixel in the multidimensional spectrum, and channels are each a particular kind of signal within that spectrum. % Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels \python{[pmt, pyro1, pyro2, pyro3]}. % As an overview, the following lexicographically lists the attributes and methods of \python{Data}. % \begin{ditemize} \item method \python{collapse}: Collapse along one dimension in a well-defined way. \item method \python{convert}: Convert all axes of a certain kind. \item method \python{create_channel}: Create a new channel. \item method \python{create_variable}: Create a new variable. \item method \python{fullpath} \item method \python{get_nadir} \item method \python{get_zenith} \item method \python{heal} \item attribute \python{kind} \item method \python{level} \item method \python{map_variable} \item attribute \python{natural_name} \item attribute \python{ndim} \item method \python{offset} \item method \python{print_tree} \item method \python{remove_channel} \item method \python{remove_variable} \item method \python{rename_channels} \item method \python{rename_variables} \item attribute \python{shape} \item method \python{share_nans} \item attribute \python{size} \item method \python{smooth} \item attribute \python{source} \item method \python{split} \item method \python{transform} \item attribute \python{units} \item attribute \python{variable_names} \item attribute \python{variables} \item method \python{zoom} \end{ditemize} Each data object contains instances of \python{Channel} and \python{Variable} which represent the principle multidimensional arrays. % The following lexicographically lists the attributes of these instances. % Certain methods and attributes are unique to only one type of dataset, and are marked as such. % \begin{ditemize} \item method \python{argmax} \item method \python{argmin} \item method \python{chunkwise} \item method \python{clip} \item method \python{convert} \item attribute \python{full} \item attribute \python{fullpath} \item attribute \python{label} (variable only) \item method \python{log} \item method \python{log10} \item method \python{log2} \item method \python{mag} \item attribute \python{major_extent} (channel only) \item method \python{max} \item method \python{min} \item attribute \python{minor_extent} (channel only) \item attribute \python{natural_name} \item method \python{normalize} (channel only) \item attribute \python{null} (channel only) \item attribute \python{parent} \item attribute \python{points} \item attribute \python{signed} (channel only) \item method \python{slices} \item method \python{symmetric_root} \item method \python{trim} (channel only) \end{ditemize} Channels and variables also support direct indexing / slicing using \python{__getitem__}, as discussed more in... % TODO: where is it discussed more? Axes are ways to organize data as functional of particular variables (and combinations thereof). % The \python{Axis} class does not directly contain the respective arrays---it refers to the associated variables. % The flexibility of this association is one of the main new features in WrightTools 3. % Axis expressions are simple human-friendly strings made up of numbers and variable \python{natural_name}s. % Given 5 variables with names \python{['w1', 'w2', 'wm', 'd1', 'd2']}, example valid expressions include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \python{'d1-d2'}, and \python{'wm-w1+w2'}. % Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the ``numpy-like'' attributes. % A lexicographical list of axis attributes and methods follows. \begin{ditemize} \item attribute \python{full} \item attribute \python{label} \item attribute \python{natural_name} \item attribute \python{ndim} \item attribute \python{points} \item attribute \python{shape} \item attribute \python{size} \item attribute \python{units_kind} \item attribute \python{variables} \item method \python{convert} \item method \python{min} \item method \python{max} \end{ditemize} % TODO: actually lexicographical \subsection{Creating a data object} % ------------------------------------------------------------ WrightTools data objects are capable of storing arbitrary multidimensional spectra, but how can we actually get data into WrightTools? % If you start with a wt5 file, the answer is easy: \python{wt.open()}. % But what if you have data that was written using some other software? % WrightTools offers data conversion functions (``from'' functions) that do the hard work of creating data objects from other files. % These from-functions are as parameter free as possible, which means they recognize details like shape and units from each specific file format without manual user intervention. % The most important thing about from-functions is that they are extensible: that is, that more from-functions can be easily added as needed. % This modular approach to data creation means that individuals who want to use WrightTools for new data sources can simply add one function to unlock the capabilities of the entire package as applied to their data. % Following are the current from-functions, and the types of data that they support. \begin{ditemize} \item Cary (collection creation) \item COLORS \item KENT \item PyCMDS \item Ocean Optics \item Shimadzu \item Tensor27 \end{ditemize} % TODO: complete list, update wright.tools to be consistent \subsubsection{Discover dimensions} Certain older Wright Group file types (COLORS and KENT) are particularly difficult to import using a parameter-free from-function. % There are two problems: \begin{ditemize} \item Dimensionality limitation to individual files (1D for KENT, 2D for COLORS). \item Lack of self-describing metadata. \end{ditemize} The way that WrightTools handles data creation for these file-types deserves special discussion. % Firstly, WrightTools contains hardcoded column information for each filetype... For COLORS... % TODO Secondly, WrightTools accepts a list of files which it stacks together to form a single large array. % Finally, the \python{wt.kit.discover_dimensions} function is called. % This function does its best to recognize the parameters of the original scan... % TODO \subsubsection{From directory} % TODO (also document on wright.tools) \subsection{Math} % ------------------------------------------------------------------------------ Now that we know the basics of how the WrightTools \python{Data} class stores data, it's time to do some data manipulation. % Let's start with some elementary algebra. % \subsubsection{In place operators} Operators are... % TODO Because the \python{Data} object is mostly stored outside of memory, it is better to do in-place... % TODO Broadcasting... % TODO \subsubsection{Clip} % TODO \subsubsection{Symmetric root} % TODO \subsubsection{Log} % TODO \subsection{Dimensionality manipulation} % ------------------------------------------------------- WrightTools offers several strategies for reducing the dimensionality of a data object. % Also consider using the fit sub-package. % TODO: more info, link to section \subsubsection{Chop} Chop is one of the most important methods of data, although it is typically not called directly by users of WrightTools. % Chop takes n-dimensional data and ``chops'' it into all of it's lower dimensional components. % Consider a 3D dataset in \python{('wm', 'w2', 'w1')}. % This dataset can be chopped to it's component 2D \python{('wm', 'w1')} spectra. % \begin{codefragment}{python, label=test_label} >>> import WrightTools as wt; from WrightTools import datasets >>> data = wt.data.from_PyCMDS(datasets.PyCMDS.wm_w2_w1_000) data created at /tmp/lzyjg4au.wt5::/ axes ('wm', 'w2', 'w1') shape (35, 11, 11) >>> chopped = data.chop('wm', 'w1') chopped data into 11 piece(s) in ('wm', 'w1') >>> chopped.chop000 \end{codefragment} \python{chopped} is a collection containing 11 data objects: \python{chop000, chop001 ... chop010}. % Note that, by default, the collection is made at the root level of a new tempfile. % An optional keyword argument \python{parent} allows users to specify the destination for this new collection. % These lower dimensional data objects can then be used in plotting routines, fitting routines etc. % By default, chop returns \emph{all} of the lower dimensional slices. % Considering the same data object from \autoref{test_label}, we can choose to get all 1D wm slices. % \begin{codefragment}{python} >>> chopped = data.chop('wm') chopped data into 121 piece(s) in ('wm',) >>> chopped.chop000 \end{codefragment} If desired, users may use the \python{at} keyword argument to specify a particular coordinate in the un-retained dimensions. % For example, suppose that you want to plot the data from \ref{test_label} as an wm, w1 plot at w2 = 1580 wn. % \begin{codefragment}{python} >>> chopped = data.chop('wm', 'w1', at={'w2': [1580, 'wn']})[0] chopped data into 1 piece(s) in ('wm', 'w1') >>> chopped >>> chopped.w2.points array([1580.0]) \end{codefragment} Note the [0]... % TODO This same syntax used in artists... % TODO \subsubsection{Collapse} \subsubsection{Split} \subsubsection{Join} \subsection{The wt5 file format} % --------------------------------------------------------------- Since WrightTools is based on the hdf5 file format... % TODO \section{Artists} % ============================================================================== After importing and manipulating data, one typically wants to create a plot. % The artists sub-package contains everything users need to plot their data objects. % This includes both ``quick'' artists, which generate simple plots as quickly as possible, and a full figure layout toolkit that allows users to generate full publication quality figures. % It also includes ``specialty'' artists which are made to perform certain popular plotting operations, as I will describe below. % Currently the artists sub-package is built on-top of the wonderful matplotlib library. % In the future, other libraries (e.g. mayavi), may be incorporated. % \subsection{Quick} % ----------------------------------------------------------------------------- \subsubsection{1D} \begin{dfigure} \includegraphics[width=0.5\textwidth]{"processing/quick1D 000"} \includepython{"processing/quick1D.py"} \caption[CAPTION TODO] {CAPTION TODO} \end{dfigure} \subsubsection{2D} \begin{dfigure} \includegraphics[width=0.5\textwidth]{"processing/quick2D 000"} \includepython{"processing/quick2D.py"} \caption[CAPTION TODO] {CAPTION TODO} \end{dfigure} \subsection{Specialty} % ------------------------------------------------------------------------- \subsection{Artists API} % ----------------------------------------------------------------------- The artists sub-package offers a thin wrapper on the default matplotlib object-oriented figure creation API. % The wrapper allows WrightTools to add the following capabilities on top of matplotlib: \begin{ditemize} \item More consistent multi-axes figure layout. \item Ability to plot data objects directly. \end{ditemize} Each of these is meant to lower the barrier to plotting data. % Without going into every detail of matplotlib figure generation capabilities, this section introduces the unique strategy that the WrightTools wrapper takes. % % TODO: finish discussion \subsection{Colormaps} % ------------------------------------------------------------------------- \subsection{Interpolation} % --------------------------------------------------------------------- \section{Fitting} % ============================================================================== \section{Distribution and licensing} \label{sec:processing_disbribution} % ======================= WrightTools is MIT licensed. % WrightTools is distributed on PyPI and conda-forge. \section{Future directions} % ====================================================================