Poster Session Abstracts – Theoretical Foundation of Deep Learning 2018

On Generalization Bounds of a Family of Recurrent Neural Networks

Presenter: Minshuo Chen (Georgia Tech)

Bio: Minshuo Chen is a second year PhD student in H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology. His research focuses on the interdisciplinary area of machine learning, nonconvex optimization, and statistics. He is currently working with Prof. Tuo Zhao.

Abstract: Recurrent Neural Networks (RNNs) have been widely applied to sequential data analysis. Due to their complicated modeling structures, however, the theory behind is still largely missing. To connect theory and practice, we study the generalization properties of vanilla RNNs as well as their variants, including Minimal Gated Unit (MGU) and Long Short Term Memory (LSTM) RNNs. Specifically, our theory is established under the PAC-Learning framework. The generalization bound is presented in terms of the spectral norms of the weight matrices and the total number of parameters. We also establish refined generalization bounds with additional norm assumptions, and draw a comparison among these bounds. We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU and LSTM in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.

Toward Deeper Understanding of Nonconvex Stochastic Optimization

Presenter: Zhehui Chen (Georgia Tech)

Bio: Zhehui Chen is a third year Ph.D. student in School of Industrial and Systems Engineering at Georgia Tech. Before he joined Georgia Tech, he received his bachelor degree in School of Gifted Young from USTC. His research interests include High Dimensional Statistics, Machine Learning, Deep Learning, Continuous Optimization, and Computer Experiment. He is currently working with Prof. Tuo Zhao and Prof. Jeff Wu.

Abstract: Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Due to the current technical limit, however, establishing convergence properties of MSGD for these highly complicated nonconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem — streaming PCA. This allows us to make progress toward understanding MSGD and gaining new insights into more general problems. Specifically, by applying diffusion approximations, our study shows that the momentum helps to escape from saddle points, but hurts the convergence within the neighborhood of optima (if without the step size anneal- ing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks. Moreover, our analysis applies the martingale method and “Fixed-State- Chain” method from the stochastic approximation literature, which is of independent interest.

Deep Learning for Distortion Prediction in Laser-Based Additive Manufacturing

Presenter: Jack Francis (Mississippi State)

Bio: Jack Francis is a current Ph.D. student in the Industrial and Systems Engineering Department at Mississippi State University. His research investigates the use of high-performance computing to solve quality and reliability issues in advanced manufacturing systems, such as Additive Manufacturing. He is an active member of IISE and INFORMS.

Abstract: Additive Manufacturing (AM) is a revolutionary fabrication process that is a key aspect of the Industry 4.0 environment. In addition, Industry 4.0 aims to have a fully connected environment with numerous sensors for capturing process data. One of the challenges in AM currently is the geometric inaccuracy of parts during fabrication. This has inhibited the widespread use of AM in many suitable applications, such as maintenance and biomedical. Geometric inaccuracies (i.e. distortion) can be reduced through compensation plans, however, compensation plans require accurate predictions of expected distortion. Here we develop a novel Deep Learning approach that accurately predicts distortion well within AM tolerance limits (30-40 microns). We utilize a Convolutional Neural Network to analyze thermal images captured during the process and an Artificial Neural Network to incorporate various design and process parameters. The Deep Learning approach can be applied to any AM quality criterion that is measured pointwise (i.e. porosity). Our Deep Learning approach not only gives highly accurate predictions, but also fits into the Industry 4.0 framework of analyzing big data from a large number of sensors.

On Computation and Generalization of Generative Adversarial Networks under Spectrum Control

Presenter: Haoming Jiang (Georgia Tech)

Bio: Haoming Jiang is a second year Ph.D. student in School of Industrial and Systems Engineering (ISyE) at Georgia Tech. Before he joined Georgia Tech, he received his B.S. Degree in Computer Science and Mathematics from the School of Gifted Young at University of Science and Technology of China. His research interest is in developing machine learning algorithms with efficient implementation.

Abstract: Generative Adversarial Networks (GANs), though powerful, suffer from training instability. Several recent works (Brock et al., 2016; Miyato et al., 2018) suggest that controlling the spectra of weight matrices in the discriminator can significantly improve the training of GANs. Motivated by their discovery, we propose a new framework for training GANs, which allows more flexible spectrum control (e.g., making the weight matrices of the discriminator have slow singular value decays). Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions. Theoretically, we further show that the spectrum control improves the generalization ability of GANs. Our experiments on CIFAR-10, STL-10, and ImgaeNet datasets confirm that compared to other competitors, our proposed method is capable of generating images with better or equal quality by utilizing spectral normalization and encouraging the slow singular value decay.

Combinatorial Attacks on Binarized Neural Networks

Authors: Elias B. Khalil (CSE, Georgia Tech), Amrita Gupta (CSE, Georgia Tech), Bistra Dilkina (CS, USC / CSE, Georgia Tech)

A Hierarchical Data Filtration Framework for Applied Industrial Analytics

Presenter: Phillip LaCasse (Wisconsin)

Bio: Phillip M. LaCasse is a Ph.D. candidate in Industrial and Manufacturing Engineering Department at the University of Wisconsin – Milwaukee. He earned his Bachelor of Science degree in Mathematics from the United States Military Academy and a Master of Science degree in Industrial Engineering from the University of Wisconsin – Madison. His research interests include predictive analytics, applied machine learning, smart manufacturing, and sports analytics.

Abstract: Big data is both an enabler and a challenge for the smart, connected manufacturing enterprise. Big data requires innovation and effort to harness because, by definition, its size, structure, or variety strain the capability of traditional software or database software tools to capture, store, manage, and analyze it. However, when able to be effectively employed, big data offers unprecedented potential to link people, equipment, and processes for real-time, informed, adaptive, and proactive decision-making. This research presents a hierarchical data filtration framework that sequentially tests and selects independent variables for use in training deep learning models for industrial analytics. Framework results perform well when applied to three publicly available machine learning datasets: one using time series input data, one using binary input data, and one using continuous input data. Continuing research explores the framework’s performance in an applied case study in electronic assembly manufacture.

Efficient Manifold Approximation with Spherelets

Presenter: Didong Li (Duke)

Bio: Didong Li is a 4-th year graduate student at Duke math department working with David Dunson and Sayan Mukherjee interested in applied differential geometry, Bayesian nonparametric, information geometry and machine learning.

Abstract: Data lying in a high-dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower-dimensional subspace or manifold. There is an immense literature focusing on approximating the unknown subspace, and in exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces using a locally linear, and potentially multiscale, dictionary. In this article, we propose a simple and general alternative, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. Building on this idea, we develop a simple and computationally efficient algorithm for subspace learning. Results relative to state-of-the-art competitors show dramatic gains in ability to accurately approximate the subspace with orders of magnitude fewer components. This leads to substantial gains in data compressibility, few clusters and hence better interpretability, and much lower MSE based on small to moderate sample sizes. Theory on approximation accuracy is presented, and the methods are applied to multiple examples.

Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval

Presenter: Cong Ma (Princeton)

Bio: Cong Ma is a PhD candidate in the Department of Operations Research and Financial Engineering at Princeton University under the supervision of Professor Yuxin Chen and Professor Jianqing Fan. He is broadly interested in mathematics of data science, machine learning, high dimensional statistics, convex and nonconvex optimization as well as their applications to neuroscience.

Abstract: This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest $\bm{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations\,/\,samples $y_{i}=(\bm{a}_{i}^{\top}\bm{x}^{\natural})^{2}$, $1\leq i\leq m$. This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning.

We investigate the efficiency of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent — when randomly initialized — yields an $\epsilon$-accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images

Presenter: Wei Pan (Minnesota)

Bio: Wei Pan is a professor of Biostatistics at the University of Minnesota. His research interests are in statistical genetics and genomics, bioinformatics, high-dimensional data, Big Data, and machine learning.

Abstract: Single cell microscopy images analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large amount of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. On the other hand, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep Convolutional Neural Networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image dataset. We show the consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pre-trained models to facilitate CNN’s applications in genetics and computational biology.

TBD

Presenter: Josiah Park (Georgia Tech)

Bio:

Abstract:

Learning and Identifying Imbalanced Control Chart Patterns using Convolutional Neural Networks

Presenter: Talayeh Razzaghi (New Mexico State)

Bio: Dr. Talayeh Razzaghi received her PhD in Industrial Engineering in 2014 from the University of Central Florida (UCF) and served as a postdoctoral research associate for two years at the School of Computing at Clemson University. She is an Assistant Professor in the IE department at NMSU. Dr. Razzaghi’s research expertise is in the area of machine learning and data mining for massive and imperfect data from both theoretical and practical standpoints. She has published in peer-reviewed journals, such as Optimization Letters, Computer and Industrial Engineering, Annals of Operations Research, and PloS One. Dr. Razzaghi has served as program committee member and reviewer for the 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB 2017), and as reviewer for several peer-reviewed journals. Dr. Razzaghi is particularly actively involved in leveraging undergraduate research opportunities to enhance underrepresented minority students in engineering and computer science. She has been conducting research with undergraduate students on machine learning on massive datasets under the NSF REU program.

Abstract: Early detection of abnormal patterns is crucial in many applications. For instance, any slight variation and shift patterns in manufacturing can be indicative of production process anomalies such as equipment malfunction. Monitoring the manufacturing processes becomes very expensive and time-consuming, when the size of data increases enormously over time, and necessitates using a fast automated control chart pattern recognition (CCPR) algorithm. Furthermore, for this application, the distribution of observations is significantly skewed toward the normal class leading to highly imbalanced datasets. Most of the existing CCPR research works tend to yield very biased results in such cases and suffer from very poor performance. This research is motivated by the need to develop fast algorithms for time-series modeling that learn from imbalanced sets.

Inspired by the high performance of deep learning algorithms on time-series classification, we propose the use of cost-sensitive layers into Deep Convolutional Neural Network (DCNNs) for automated process monitoring and early fault diagnosis. Our algorithm also enjoys feature learning technique that mitigates the effects of noise and outliers. We compare and show the benefits of Cost-sensitive DCNN over the state-of-art algorithms under various fault scenarios. The results are evaluated using simulated time series data in the presence of various imbalance ratios.

Rigorous guarantees on sequence memory capacity in recurrent neural networks using randomized dimensionality reduction

Presenter: Christopher J Rozell (Georgia Tech)

Bio: Christopher J. Rozell received a B.S.E. degree in Computer Engineering and a B.F.A. degree in Music (Performing Arts Technology) in 2000 from the University of Michigan. He attended graduate school at Rice University, receiving the M.S. and Ph.D. degrees in Electrical Engineering in 2002 and 2007, respectively. Following graduate school he joined the Redwood Center for Theoretical Neuroscience at the University of California, Berkeley as a postdoctoral scholar. Dr. Rozell is currently a Professor in Electrical and Computer Engineering at the Georgia Institute of Technology, where he previously held the Demetrius T. Paris Junior Professorship. His research interests live at the intersection of machine learning, signal processing, complex systems, computational neuroscience and biotechnology. Dr. Rozell is currently the co-Director of the Neural Engineering Center at Georgia Tech, where his lab is also affiliated with the Center for Signal and Information Processing and the Institute for Robotics and Intelligent Machines. In 2014, Dr. Rozell was one of six international recipients of the Scholar Award in Studying Complex Systems from the James S. McDonnell Foundation 21st Century Science Initiative, as well as receiving a National Science Foundation CAREER Award and a Sigma Xi Young Faculty Research Award. In addition to his research activity, Dr. Rozell was awarded the CETL/BP Junior Faculty Teaching Excellence Award at Georgia Tech in 2013 and the Outstanding Junior ECE Faculty Member Award in 2018.

Abstract: In addition to the performance gains enabled by deep neural networks, recurrent neural networks (RNNs) are being increasingly utilized, both as stand-alone structures and as layers of deep networks. RNNs are especially interesting as cortical networks are recurrent, indicating that recurrent connections are important in human-level processing. Despite their growing use, theory on the computational properties of RNNs is rare. As many applications hinge on RNNs accumulating information dynamically, the ability of RNNs to iteratively compress information into the network is particularly critical. This work presents rigorous and non-asymptotic bounds on the network’s short-term memory (STM), characterizing the number of inputs that can be compressed into and recovered from a network state. Previous bounds on a random RNN’s STM limit the number of recoverable inputs by the number of network nodes. We show here a dramatic improvement over the current state of the art bound, demonstrating that when the inputs are structured (e.g., sparse in a basis or matrix inputs are low-rank), the number of network nodes needed to recover the input grows linearly in the information rate and sub-linearly in the ambient dimension. These results show for the first time that RNNs can efficiently store much longer inputs than the size of the network, shedding light on their computational capabilities.

A Novel Framework for Online Supervised Learning with Feature Selection

Presenter: Lizhe Sun (Florida State)

Bio: Lizhe Sun is a fifth year PhD student in department of Statistics, Florida State University. His research is about feature selection in online learning. Lizhe Sun is supervised by Dr.Barbu in the department of Statistics.

Abstract: Online methods have a wide variety of applications in large scale machine learning problems. In the online scenario, one can train new models rapidly by using the current observations rather than all the data. Thus, online algorithms reduce the memory complexity and computational complexity when training models. However, the standard online methods still suffer some issues such as lower convergence rates and limited capability to recover the support of true features. In this paper, we present a novel framework for online learning based on running averages and introduce a series of online versions of some popular existing offline methods such as Elastic Net, Minimax Concave Penalty and Feature Selection with Annealing. We prove the equivalence between our online methods and their offline counterparts and give theoretical true feature recovery and convergence guarantees for some of them. In contrast to the existing online methods, the proposed methods can extract models with any desired sparsity level at any time. Numerical experiments indicate that our new methods enjoy high accuracy of true feature recovery and a fast convergence rate, compared with standard online algorithms and offline algorithms. We also show how the running averages framework can be used for model adaptation in the presence of a changing environment. Finally, we present some applications to large datasets where again the proposed framework shows competitive results compared to popular online and offline algorithms.

Entrywise eigenvector analysis of random matrices with applications

Presenter: Kaizheng Wang (Princeton)

Bio: Kaizheng Wang is a fourth-year PhD candidate in the Department of Operations Research and Financial Engineering at Princeton University, under the supervision of Professor Jianqing Fan. His research interests lie at the intersection of statistics, machine learning and optimization.

Abstract: Spectral algorithms have wide applications in statistical problems. Sharp analysis usually requires characterization of entrywise behavior of eigenvectors, and classical perturbation bounds often yield suboptimal results. For a large class of random matrices, we approximate the eigenvectors by linear functions of the matrix that greatly facilitate analysis, and obtain uniform control of entrywise approximation errors. As applications, optimalities of vanilla spectral algorithms are shown for several challenging problems such as Z_2-synchronization, community detection, matrix completion, and top-K ranking. Our theoretical results are illustrated by numerical experiments.

Crime Series Detection from Large-Scale Police Report Data

Presenter: Shixiang (Woody) Zhu (Georgia Tech)

Bio: Shixiang (Woody) Zhu is a second year Ph.D. student in Machine Learning (Fall 2017 ~ present) at H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology. Working with Professor Yao Xie. Interested in Machine Learning, Statistical Learning, and Optimization problems.

Abstract: One of the most important problems in crime analysis is that of crime series detection. Technically speaking, crime series is a subset of crime events committed by the same individual or group. Generally, criminals follow a modus operandi (M.O.) that characterizes their crime series. The main scope of the project is to develop an efficient algorithm that can detect the correlation between crime incidences, using large-scale streaming police report data, both the structured (e.g., time, location) and unstructured (the so-called free-text).