d to express
relationships between
functions and their
derivatives, for example,
change over time.
Differential equations can be used to formalize
models and make predictions. The equations
themselves can be solved numerically and
tested with different initial conditions to study
system trajectories.
Zill, Dennis, Warren Wright,
and Michael Cullen. Differential
Equations with Boundary-Value
Problems. Connecticut: Cengage
Learning, 2012. Print.
Discrete Event
Simulation
Simulates a
Algorithms or
Method Name Description Tips From the Pros References and Papers
We Love to Read
Differential
Equations
Used to express
relationships between
functions and their
derivatives, for example,
change over time.
Differential equations can be used to formalize
models and make predictions. The equations
themselves can be solved numerically and
tested with different initial conditions to study
system trajectories.
Zill, Dennis, Warren Wright,
and Michael Cullen. Differential
Equations with Boundary-Value
Problems. Connecticut: Cengage
Learning, 2012. Print.
Discrete Event
Simulation
Simulates a discrete
sequence of events where
each event occurs at a
particular instant in time.
The model updates its state
only at points in time when
events occur.
Discrete event simulation is useful when
analyzing event based processes such as
production lines and service centers to
determine how system level behavior changes
as different process parameters change.
Optimization can integrate with simulation to
gain efﬁciencies in a process.
Cassandras, Christopher, and
Stephanie Lafortune. Introduction
to Discrete Event Systems. New
York: Springer, 1999. Print.
Discrete
Wavelet
Transform
Transforms time series
data into frequency
domain preserving
locality information.
Offers very good time and frequency
localization. The advantage over Fourier
transforms is that it preserves both frequency
and locality.
Burrus, C.Sidney, Ramesh A.
Gopinath, Haitao Guo, Jan E.
Odegard, and Ivan W. Selesnick.
Introduction to Wavelets
and Wavelet Transforms:
A Primer. New Jersey:
Prentice Hall, 1998. Print.
Exponential
Smoothing
Used to remove artifacts
expected from collection
error or outliers.
In comparison to a using moving average
where past observations are weighted equally,
exponential smoothing assigns exponentially
decreasing weights over time.
Chatﬁeld, Chris, Anne B. Koehler,
J. Keith Ord, and Ralph D.
Snyder. “A New Look at Models
for Exponential Smoothing.”
Journal of the Royal Statistical
Society: Series D (The Statistician)
50.2 (July 2001): 147-159. Print.
Factor
Analysis
Describes variability among
correlated variables with the
goal of lowering the number
of unobserved variables,
namely, the factors.
If you suspect there are inmeasurable
inﬂuences on your data, then you may want to
try factor analysis.
Child, Dennis. The Essentials of
Factor Analysis. United Kingdom:
Cassell Educational, 1990. Print.
Fast Fourier
Transform
Transforms time series from
time to frequency domain
efﬁciently. Can also be used
for image improvement by
spatial transforms.
Filtering a time varying signal can be done
more effectively in the frequency domain. Also,
noise can often be identiﬁed in such signals by
observing power at aberrant frequencies.
Mitra, Partha P., and Hemant
Bokil. Observed Brain Dynamics.
United Kingdom: Oxford
University Press, 2008. Print.
Format
Conversion
Creates a standard
representation of data
regardless of source format.
For example, extracting raw
UTF-8 encoded text from
binary ﬁle formats such as
Microsoft Word or PDFs.
There are a number of open source software
packages that support format conversion and
can interpret a wide variety of formats. One
notable package is Apache Tikia.
Ingersoll, Grant S., Thomas S.
Morton, and Andrew L. Farris.
Taming Text: How to Find,
Organize, and Manipulate It. New
Jersey: Manning, 2013. Print.
Gaussian
Filtering
Acts to remove noise
or blur data.
Can be used to remove speckle
noise from images.
Parker, James R. Algorithms for
Image Processing and Computer
Vision. New Jersey: John Wiley &
Sons, 2010. Print.
Generalized
Linear Models
Expands ordinary linear
regression to allow
for error distribution
that is not normal.
Use if the observed error in your system does
not follow the normal distribution.
MacCullagh, P., and John A.
Nelder. Generalized Linear
Models. Florida: CRC Press,
1989. Print.
Genetic
Algorithms
Evolves candidate
models over generations
by evolutionary
inspired operators of
mutation and crossover
of parameters.
Increasing the generation size adds diversity
in considering parameter combinations, but
requires more objective function evaluation.
Calculating individuals within a generation
is strongly parallelizable. Representation of
candidate solutions can impact performance.
De Jong, Kenneth A. Evolutionary
Computation - A Uniﬁed
Approach. Massachusetts:
MIT Press, 2002. Print.
Grid Search
Systematic search across
discrete parameter
values for parameter
exploration problems.
A grid across the parameters is used to
visualize the parameter landscape and assess
whether multiple minima are present.
Kolda, Tamara G., Robert M.
Lewis, and Virginia Torczon.
“Optimization by Direct Search:
New Perspectives on Some
Classical and Modern Methods.”
SIAM Review 45.3 (2003): 385-
482. Print.
Algorithms or
Method Name Description Tips From the Pros References and Papers
We Love to Read
Differential
Equations
Used to express
relationships between
functions and their
derivatives, for example,
change over time.
Differential equations can be used to formalize
models and make predictions. The equations
themselves can be solved numerically and
tested with different initial conditions to study
system trajectories.
Zill, Dennis, Warren Wright,
and Michael Cullen. Differential
Equations with Boundary-Value
Problems. Connecticut: Cengage
Learning, 2012. Print.
Discrete Event
Simulation
Simulates a discrete
sequence of events where
each event occurs at a
particular instant in time.
The model updates its state
only at points in time when
events occur.
Discrete event simulation is useful when
analyzing event based processes such as
production lines and service centers to
determine how system level behavior changes
as different process parameters change.
Optimization can integrate with simulation to
gain efﬁciencies in a process.
Cassandras, Christopher, and
Stephanie Lafortune. Introduction
to Discrete Event Systems. New
York: Springer, 1999. Print.
Discrete
Wavelet
Transform
Transforms time series
data into frequency
domain preserving
locality information.
Offers very good time and frequency
localization. The advantage over Fourier
transforms is that it preserves both frequency
and locality.
Burrus, C.Sidney, Ramesh A.
Gopinath, Haitao Guo, Jan E.
Odegard, and Ivan W. Selesnick.
Introduction to Wavelets
and Wavelet Transforms:
A Primer. New Jersey:
Prentice Hall, 1998. Print.
Exponential
Smoothing
Used to remove artifacts
expected from collection
error or outliers.
In comparison to a using moving average
where past observations are weighted equally,
exponential smoothing assigns exponentially
decreasing weights over time.
Chatﬁeld, Chris, Anne B. Koehler,
J. Keith Ord, and Ralph D.
Snyder. “A New Look at Models
for Exponential Smoothing.”
Journal of the Royal Statistical
Society: Series D (The Statistician)
50.2 (July 2001): 147-159. Print.
Factor
Analysis
Describes variability among
correlated variables with the
goal of lowering the number
of unobserved variables,
namely, the factors.
If you suspect there are inmeasurable
inﬂuences on your data, then you may want to
try factor analysis.
Child, Dennis. The Essentials of
Factor Analysis. United Kingdom:
Cassell Educational, 1990. Print.
Fast Fourier
Transform
Transforms time series from
time to frequency domain
efﬁciently. Can also be used
for image improvement by
spatial transforms.
Filtering a time varying signal can be done
more effectively in the frequency domain. Also,
noise can often be identiﬁed in such signals by
observing power at aberrant frequencies.
Mitra, Partha P., and Hemant
Bokil. Observed Brain Dynamics.
United Kingdom: Oxford
University Press, 2008. Print.
Format
Conversion
Creates a standard
representation of data
regardless of source format.
For example, extracting raw
UTF-8 encoded text from
binary ﬁle formats such as
Microsoft Word or PDFs.
There are a number of open source software
packages that support format conversion and
can interpret a wide variety of formats. One
notable package is Apache Tikia.
Ingersoll, Grant S., Thomas S.
Morton, and Andrew L. Farris.
Taming Text: How to Find,
Organize, and Manipulate It. New
Jersey: Manning, 2013. Print.
Gaussian
Filtering
Acts to remove noise
or blur data.
Can be used to remove speckle
noise from images.
Parker, James R. Algorithms for
Image Processing and Computer
Vision. New Jersey: John Wiley &
Sons, 2010. Print.
Generalized
Linear Models
Expands ordinary linear
regression to allow
for error distribution
that is not normal.
Use if the observed error in your system does
not follow the normal distribution.
MacCullagh, P., and John A.
Nelder. Generalized Linear
Models. Florida: CRC Press,
1989. Print.
Genetic
Algorithms
Evolves candidate
models over generations
by evolutionary
inspired operators of
mutation and crossover
of parameters.
Increasing the generation size adds diversity
in considering parameter combinations, but
requires more objective function evaluation.
Calculating individuals within a generation
is strongly parallelizable. Representation of
candidate solutions can impact performance.
De Jong, Kenneth A. Evolutionary
Computation - A Uniﬁed
Approach. Massachusetts:
MIT Press, 2002. Print.
Grid Search
Systematic search across
discrete parameter
values for parameter
exploration problems.
A grid across the parameters is used to
visualize the parameter landscape and assess
whether multiple minima are present.
Kolda, Tamara G., Robert M.
Lewis, and Virginia Torczon.
“Optimization by Direct Search:
New Perspectives on Some
Classical and Modern Methods.”
SIAM Review 45.3 (2003): 385-
482. Print.
Algorithms or
Method Name Description Tips From the Pros References and Papers
We Love to Read
Regression
with Shrinkage
(Lasso)
A method of variable
selection and prediction
combined into a possibly
biased linear model.
There are different methods to select the
lambda parameter. A typical choice is cross
validation with MSE as the metric.
Tibshirani, Robert. “Regression
Shrinkage and Selection
via the Lasso.” Journal of
the Royal Statistical Society.
Series B (Methodological) 58.1
(1996): 267-288. Print.
Sensitivity
Analysis
Involves testing individual
parameters in an analytic
or model and observing the
magnitude of the effect.
Insensitive model parameters during an
optimization are candidates for being set to
constants. This reduces the dimensionality
of optimization problems and provides an
opportunity for speed up.
Saltelli, A., Marco Ratto, Terry
Andres, Francesca Campolongo,
Jessica Cariboni, Debora Gatelli,
Michaela Saisana, and Stefano
Tarantola. Global Sensitivity
Analysis: the Primer. New Jersey:
John Wiley & Sons, 2008. Print.
Simulated
Annealing
Named after a controlled
cooling process in
metallurgy, and by
analogy using a changing
temperature or annealing
schedule to vary
algorithmic convergence.
The standard annealing function allows for
initial wide exploration of the parameter space
followed by a narrower search. Depending on
the search priority the annealing function can
be modiﬁed to allow for longer explorative
search at a high temperature.
Bertsimas, Dimitris, and John
Tsitsiklis. “Simulated Annealing.”
Statistical Science. 8.1 (1993):
10-15. Print.
Stepwise
Regression
A method of variable
selection and prediction.
Akaike's information
criterion AIC is used as
the metric for selection.
The resulting predictive
model is based upon
ordinary least squares,
or a general linear model
with parameter estimation
via maximum likelihood.
Caution must be used when considering
Stepwise Regression, as over ﬁtting often
occurs. To mitigate over ﬁtting try to limit the
number of free variables used.
Hocking, R. R. “The Analysis and
Selection of Variables in Linear
Regression.” Biometrics. 32.1
(March 1976): 1-49. Print.
Stochastic
Gradient
Descent
General-purpose
optimization for learning of
neural networks, support
vector machines, and
logistic regression models.
Applied in cases where the objective
function is not completely differentiable
when using sub-gradients.
Witten, Ian H., Eibe Frank,
and Mark A. Hall. Data Mining:
Practical Machine Learning Tools
and Techniques. Massachusetts:
Morgan Kaufmann, 2011. Print.
Support Vector
Machines
Projection of feature vectors
using a kernel function into
a space where classes are
more separable.
Try multiple kernels and use k-fold cross
validation to validate the choice of the best one.
Hsu, Chih-Wei, Chih-Chung
Chang, and Chih-Jen Lin. “A
Practical Guide to Support Vector
Classiﬁcation.” National Taiwan
University. 2003. Print.
Term
Frequency
Inverse
Document
Frequency
A statistic that measures
the relative importance of a
term from a corpus.
Typically used in text mining. Assuming a
corpus of news articles, a term that is very
frequent such as “the” will likely appear many
times in many documents, having a low value. A
term that is infrequent such as a person’s last
name that appears in a single article will have a
higher TD IDF score.
Ingersoll, Grant S., Thomas S.
Morton, and Andrew L. Farris.
Taming Text: How to Find,
Organize, and Manipulate It. New
Jersey: Manning, 2013. Print.
Topic Modeling
(Latent
Dirichlet
Allocation)
Identiﬁes latent topics
in text by examining
word co-occurrence.
Employ part-of-speech tagging to eliminate
words other than nouns and verbs. Use raw
term counts instead of TF/IDF weighted terms.
Blei, David M., Andrew Y. Ng,
and Michael I Jordan. “Latent
Dirichlet Allocation.” Journal of
Machine Learning Research. 3
(March 2003): 993-1022. Print.
Tree Based
Methods
Models structured as graph
trees where branches
indicate decisions.
Can be used to systematize a process or act as
a classiﬁer.
James, G., D. Witten, T. Hastie,
and R Tibshirani. Tree-Based
Methods. In An Introduction to
Statistical Learning. New York:
Springer, 2013. Print.
Wrapper
Methods
Feature set reduction
method that utilizes
performance of a set of
features on a model, as
a measure of the feature
set’s performance. Can
help identify combinations
of features in models that
achieve high performance.
Utilize k-fold cross validation
to control over ﬁtting.
John, G. H, R Kohavi, and K.
Pﬂeger. “Irrelevant Features
and the Subset Selection
Problem.” Proceedings of
ICML-94, 11th International
Conference on Machine
Learning. New Brunswick,
New Jersey. 1994. 121-129.
59. Conference Presentation