processing/chapter.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356

\chapter{Processing}

% TODO: cool quote, if I can think of one

\clearpage

From a data science perspective, CMDS has several unique challenges:
\begin{ditemize}
  \item Dimensionality of datasets can typically be greater than two, complicating
    \textbf{representation}.
  \item Shape and dimensionality change...
  \item Data can be large (over one million points).  % TODO: contextualize large (not BIG DATA)
\end{ditemize}
I have designed a software package that directly addresses these issues.  %

WrightTools is a software package at the heart of all work in the Wright Group.  %

% TODO: more intro

WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural''
application programming interface (API).  %
To use WrightTools, simply import:
\begin{codefragment}{python}
>>> import WrightTools as wt
>>> wt.__version__
3.0.0
\end{codefragment}
I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in
\autoref{sec:processing_distbribution}.

We can use the builtin Python function \python{dir} to interrogate the contents of the
WrightTools package.  %
\begin{codefragment}{python}
>>> dir(wt)
['Collection',
 'Data',
 '__branch__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '__wt5_version__',
 '_dataset',
 '_group',
 '_open',
 '_sys',
 'artists',
 'collection',
 'data',
 'diagrams',
 'exceptions',
 'kit',
 'open',
 'units']
\end{codefragment}  % TODO: consider adding fit to this list
Many of these are dunder (double underscore) attributes---Python internals that are not normally
used directly.  %
The ten attributes that do not start with underscore are the public API that users of WrightTools
typically use.  %
Within the public API are two classes, \python{Collection} \&
\python{Data}, which are the two main classes in the WrightTools object model.  %
\python{Data} stores spectra directly as multidimensional arrays, and
\python{Collection} stores \textit{groups} of data objects (and other collection
objects) in a hierarchical way for internal organization purposes.  %

\section{Data object model}  % ====================================================================

WrightTools uses a programming strategy called object oriented programming (OOP).  %
% TODO: introduce HDF5
% TODO: elaborate on the concept of OOP and how it relates to WrightTools

It contains a central data ``container'' that is capable of storing all of the information about
each multidimensional (or one-dimensional) spectra: the \python{Data} class.  %
It also defines a \python{Collection} class that contains data objects, collection
objects, and other pieces of metadata in a hierarchical structure.  %
Let's first discuss \mitinline{python}{Data}.

All spectra are stored within WrightTools as multidimensional arrays.  %
Arrays are containers that store many instances of the same data type, typically numerical
datatypes.  %
These arrays have some \python{shape}, \python{size}, and
\python{dtype}.  %
In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs.  %

The \python{Data} class contains everything that is needed to define a single spectra
from a single experiment (or simulation).  %
To do this, each data object contains several multidimensional arrays (typically 2 to 50 arrays,
depending on the kind of data).  %
There are two kinds of arrays, instances of \python{Variable} and \python{Channel}.  %
Variables are coordinate arrays that define the position of each pixel in the multidimensional
spectrum, and channels are each a particular kind of signal within that spectrum.  %
Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels
\python{[pmt, pyro1, pyro2, pyro3]}.  %

As an overview, the following lexicographically lists the attributes and methods of
\python{Data}.  %
\begin{ditemize}
  \item method \python{collapse}: Collapse along one dimension in a well-defined way.
  \item method \python{convert}: Convert all axes of a certain kind.
  \item method \python{create_channel}: Create a new channel.
  \item method \python{create_variable}: Create a new variable.
  \item method \python{fullpath}
  \item method \python{get_nadir}
  \item method \python{get_zenith}
  \item method \python{heal}
  \item attribute \python{kind}
  \item method \python{level}
  \item method \python{map_variable}
  \item attribute \python{natural_name}
  \item attribute \python{ndim}
  \item method \python{offset}
  \item method \python{print_tree}
  \item method \python{remove_channel}
  \item method \python{remove_variable}
  \item method \python{rename_channels}
  \item method \python{rename_variables}
  \item attribute \python{shape}
  \item method \python{share_nans}
  \item attribute \python{size}
  \item method \python{smooth}
  \item attribute \python{source}
  \item method \python{split}
  \item method \python{transform}
  \item attribute \python{units}
  \item attribute \python{variable_names}
  \item attribute \python{variables}
  \item method \python{zoom}
\end{ditemize}

Each data object contains instances of \python{Channel} and \python{Variable} which represent the
principle multidimensional arrays.  %
The following lexicographically lists the attributes of these instances.  %
Certain methods and attributes are unique to only one type of dataset, and are marked as such.  %
\begin{ditemize}
  \item method \python{argmax}
  \item method \python{argmin}
  \item method \python{chunkwise}
  \item method \python{clip}
  \item method \python{convert}
  \item attribute \python{full}
  \item attribute \python{fullpath}
  \item attribute \python{label} (variable only)
  \item method \python{log}
  \item method \python{log10}
  \item method \python{log2}
  \item method \python{mag}
  \item attribute \python{major_extent} (channel only)
  \item method \python{max}
  \item method \python{min}
  \item attribute \python{minor_extent} (channel only)
  \item attribute \python{natural_name}
  \item method \python{normalize} (channel only)
  \item attribute \python{null} (channel only)
  \item attribute \python{parent}
  \item attribute \python{points}
  \item attribute \python{signed} (channel only)
  \item method \python{slices}
  \item method \python{symmetric_root}
  \item method \python{trim} (channel only)
\end{ditemize}
Channels and variables also support direct indexing / slicing using \python{__getitem__}, as
discussed more in...  % TODO: where is it discussed more?
 
Axes are ways to organize data as functional of particular variables (and combinations thereof).  %
The \python{Axis} class does not directly contain the respective arrays---it refers to the
associated variables.  %
The flexibility of this association is one of the main new features in WrightTools 3.  %
Axis expressions are simple human-friendly strings made up of numbers and variable
\python{natural_name}s.  %
Given 5 variables with names \python{['w1', 'w2', 'wm', 'd1', 'd2']}, example valid expressions
include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \python{'d1-d2'}, and
\python{'wm-w1+w2'}.  %
Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the
``numpy-like'' attributes.  %
A lexicographical list of axis attributes and methods follows.
\begin{ditemize}
  \item attribute \python{full}
  \item attribute \python{label}
  \item attribute \python{natural_name}
  \item attribute \python{ndim}
  \item attribute \python{points}
  \item attribute \python{shape}
  \item attribute \python{size}
  \item attribute \python{units_kind}
  \item attribute \python{variables}
  \item method \python{convert}
  \item method \python{min}
  \item method \python{max}
\end{ditemize}  % TODO: actually lexicographical

\subsection{Creating a data object}  % ------------------------------------------------------------

WrightTools data objects are capable of storing arbitrary multidimensional spectra, but how can we
actually get data into WrightTools?  %
If you start with a wt5 file, the answer is easy: \python{wt.open(<filepath>)}.  %
But what if you have data that was written using some other software?  %
WrightTools offers data conversion functions (``from'' functions) that do the hard work of creating
data objects from other files.  %
These from-functions are as parameter free as possible, which means they recognize details like
shape and units from each specific file format without manual user intervention.  %

The most important thing about from-functions is that they are extensible: that is, that more
from-functions can be easily added as needed.  %
This modular approach to data creation means that individuals who want to use WrightTools for new
data sources can simply add one function to unlock the capabilities of the entire package as
applied to their data.  %

Following are the current from-functions, and the types of data that they support.
\begin{ditemize}
  \item Cary (collection creation)
  \item COLORS
  \item KENT
  \item PyCMDS
  \item Ocean Optics
  \item Shimadzu
  \item Tensor27
\end{ditemize}  % TODO: complete list, update wright.tools to be consistent
  
\subsubsection{Discover dimensions}

Certain older Wright Group file types (COLORS and KENT) are particularly difficult to import using
a parameter-free from-function.  %
There are two problems:
\begin{ditemize}
  \item Dimensionality limitation to individual files (1D for KENT, 2D for COLORS).
  \item Lack of self-describing metadata.
\end{ditemize}
The way that WrightTools handles data creation for these file-types deserves special discussion.  %

Firstly, WrightTools contains hardcoded column information for each filetype...
For COLORS...  % TODO

Secondly, WrightTools accepts a list of files which it stacks together to form a single large
array.  %

Finally, the \python{wt.kit.discover_dimensions} function is called.  %
This function does its best to recognize the parameters of the original scan...  % TODO

\subsubsection{From directory}

% TODO (also document on wright.tools)

\subsection{Math}  % ------------------------------------------------------------------------------

Now that we know the basics of how the WrightTools \python{Data} class stores data, it's time to do
some data manipulation.  %
Let's start with some elementary algebra.  %

\subsubsection{In place operators}

Operators are...  % TODO
Because the \python{Data} object is mostly stored outside of memory, it is better to do
in-place... % TODO

Broadcasting... % TODO

\subsubsection{Clip}

% TODO

\subsubsection{Symmetric root}

% TODO

\subsubsection{Log}

% TODO

\subsection{Dimensionality manipulation}  % -------------------------------------------------------

WrightTools offers several strategies for reducing the dimensionality of a data object.  %
Also consider using the fit sub-package.  % TODO: more info, link to section

\subsubsection{Chop}

Chop is one of the most important methods of data, although it is typically not called directly by
users of WrightTools.  %

\subsubsection{Collapse}

\subsubsection{Split}

\subsubsection{Join}

\subsection{The wt5 file format}  % ---------------------------------------------------------------

Since WrightTools is based on the hdf5 file format...  % TODO

\section{Artists}  % ==============================================================================

After importing and manipulating data, one typically wants to create a plot.  %
The artists sub-package contains everything users need to plot their data objects.  %
This includes both ``quick'' artists, which generate simple plots as quickly as possible, and a
full figure layout toolkit that allows users to generate full publication quality figures.  %
It also includes ``specialty'' artists which are made to perform certain popular plotting
operations, as I will describe below.  %

Currently the artists sub-package is built on-top of the wonderful matplotlib library.  %
In the future, other libraries (e.g. mayavi), may be incorporated.  %

\subsection{Quick}  % -----------------------------------------------------------------------------

\subsubsection{1D}

\begin{dfigure}
  \includegraphics[width=0.5\textwidth]{"processing/quick1D 000"}
  \includepython{"processing/quick1D.py"}
  \caption[CAPTION TODO]
    {CAPTION TODO}
\end{dfigure}

\subsubsection{2D}

\begin{dfigure}
  \includegraphics[width=0.5\textwidth]{"processing/quick2D 000"}
  \includepython{"processing/quick2D.py"}
  \caption[CAPTION TODO]
    {CAPTION TODO}
\end{dfigure}

\subsection{Specialty}  % -------------------------------------------------------------------------

\subsection{Artists API}  % -----------------------------------------------------------------------

The artists sub-package offers a thin wrapper on the default matplotlib object-oriented figure
creation API.  %
The wrapper allows WrightTools to add the following capabilities on top of matplotlib:
\begin{ditemize}
  \item More consistent multi-axes figure layout.
  \item Ability to plot data objects directly.
\end{ditemize}
Each of these is meant to lower the barrier to plotting data.  %
Without going into every detail of matplotlib figure generation capabilities, this section
introduces the unique strategy that the WrightTools wrapper takes.  %

% TODO: finish discussion

\subsection{Colormaps}  % -------------------------------------------------------------------------

\subsection{Interpolation}  % ---------------------------------------------------------------------

\section{Fitting}  % ==============================================================================

\section{Distribution and licensing} \label{sec:processing_disbribution}  % =======================

WrightTools is MIT licensed.  %

WrightTools is distributed on PyPI and conda-forge.

\section{Future directions}  % ====================================================================