Use Of Vapnik-Chervonenkis Dimension in Model Selection
In this dissertation, I derive a new method to estimate the Vapnik-Chervonenkis Dimension (VCD) for the class of linear functions. This method is inspired by the technique developed by Vapnik et al. Vapnik et al. (1994). My contribution rests on the approximation of the expected maximum difference between two empirical Losses (EMDBTEL). In fact, I use a cross-validated form of the error to compute the EMDBTEL, and I make the bound on the EMDBTEL tighter by minimizing a constant in of its right upper bound. I also derive two bounds for the true unknown risk using the additive (ERM1) and the multiplicative (ERM2) Chernoff bounds. These bounds depend on the estimated VCD and the empirical risk. These bounds can be used to perform model selection and to declare with high probability, the chosen model will perform better without making strong assumptions about the data generating process (DG). I measure the accuracy of my technique on simulated datasets and also on three real datasets. The model selection provided by VCD was always as good as if not better than the other methods under reasonable conditions.
Loss Data Analytics
Loss Data Analytics is an interactive, online, freely available text. The idea behind the name Loss Data Analytics is to integrate classical loss data models from applied probability with modern analytic tools. In particular, we seek to recognize that big data (including social media and usage based insurance) are here and high speed computation is readily available. The online version contains many interactive objects (quizzes, computer demonstrations, interactive graphs, video, and the like) to promote deeper learning. A subset of the book is available for offline reading in pdf and EPUB formats. The online text will be available in multiple languages to promote access to a worldwide audience.
Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks
Recently, researchers have started applying convolutional neural networks (CNNs) with one-dimensional convolutions to clinical tasks involving time-series data. This is due, in part, to their computational efficiency, relative to recurrent neural networks and their ability to efficiently exploit certain temporal invariances, (e.g., phase invariance). However, it is well-established that clinical data may exhibit many other types of invariances (e.g., scaling). While preprocessing techniques, (e.g., dynamic time warping) may successfully transform and align inputs, their use often requires one to identify the types of invariances in advance. In contrast, we propose the use of Sequence Transformer Networks, an end-to-end trainable architecture that learns to identify and account for invariances in clinical time-series data. Applied to the task of predicting in-hospital mortality, our proposed approach achieves an improvement in the area under the receiver operating characteristic curve (AUROC) relative to a baseline CNN (AUROC=0.851 vs. AUROC=0.838). Our results suggest that a variety of valuable invariances can be learned directly from the data.
Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning
Extracting relations is critical for knowledge base completion and construction in which distant supervised methods are widely used to extract relational facts automatically with the existing knowledge bases. However, the automatically constructed datasets comprise amounts of low-quality sentences containing noisy words, which is neglected by current distant supervised methods resulting in unacceptable precisions. To mitigate this problem, we propose a novel word-level distant supervised approach for relation extraction. We first build Sub-Tree Parse(STP) to remove noisy words that are irrelevant to relations. Then we construct a neural network inputting the sub-tree while applying the entity-wise attention to identify the important semantic features of relational words in each instance. To make our model more robust against noisy words, we initialize our network with a priori knowledge learned from the relevant task of entity classification by transfer learning. We conduct extensive experiments using the corpora of New York Times(NYT) and Freebase. Experiments show that our approach is effective and improves the area of Precision/Recall(PR) from 0.35 to 0.39 over the state-of-the-art work.
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points for a given document. While sequence-to-sequence (seq2seq) models have achieved remarkable performance on this task (Meng et al., 2017), model training often relies on large amounts of labeled data, which is only applicable to resource-rich domains. In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. Two strategies are proposed. First, unlabeled documents are first tagged with synthetic keyphrases obtained from unsupervised keyphrase extraction methods or a selflearning algorithm, and then combined with labeled samples for training. Furthermore, we investigate a multi-task learning framework to jointly learn to generate keyphrases as well as the titles of the articles. Experimental results show that our semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.
LRMM: Learning to Recommend with Missing Modalities
Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem.
zoNNscan : a boundary-entropy index for zone inspection of neural models
The training of deep neural network classifiers results in decision boundaries which geometry is still not well understood. This is in direct relation with classification problems such as so called adversarial examples. We introduce zoNNscan, an index that is intended to inform on the boundary uncertainty (in terms of the presence of other classes) around one given input datapoint. It is based on confidence entropy, and is implemented through sampling in the multidimensional ball surrounding that input. We detail the zoNNscan index, give an algorithm for approximating it, and finally illustrate its benefits on four applications, including two important problems for the adoption of deep networks in critical systems: adversarial examples and corner case inputs. We highlight that zoNNscan exhibits significantly higher values than for standard inputs in those two problem classes.
Composite Hashing for Data Stream Sketches
In rapid and massive data streams, it is often not possible to estimate the frequency of items with complete accuracy. To perform the operation in a reasonable amount of space and with sufficiently low latency, approximated methods are used. The most common ones are variations of the Count-Min sketch. By using multiple hash functions, they summarize massive streams in sub-linear space. In reality, data item ids or keys can be modular, e.g., a graph edge is represented by source and target node ids, a 32-bit IP address is composed of four 8-bit words, a web address consists of domain name, domain extension, path, and filename, among many others. In this paper, we investigate the modularity property of item keys, and systematically develop more accurate, composite hashing strategies, such as employing multiple independent hash functions that hash different modules in a key and their combinations separately, instead of hashing the entire key directly into the sketch. However, our problem of finding the best hashing strategy is non-trivial, since there are exponential number of ways to combine the modules of a key before they can be hashed into the sketch. Moreover, given a fixed size allocated for the entire sketch, it is hard to find the optimal range of all hash functions that correspond to different modules and their combinations. We solve both these problems with extensive theoretical analysis, and perform thorough experiments with real-world datasets to demonstrate the accuracy and efficiency of our proposed method, MOD-Sketch.
Text-to-image Synthesis via Symmetrical Distillation Networks
Text-to-image synthesis aims to automatically generate images according to text descriptions given by users, which is a highly challenging task. The main issues of text-to-image synthesis lie in two gaps: the heterogeneous and homogeneous gaps. The heterogeneous gap is between the high-level concepts of text descriptions and the pixel-level contents of images, while the homogeneous gap exists between synthetic image distributions and real image distributions. For addressing these problems, we exploit the excellent capability of generic discriminative models (e.g. VGG19), which can guide the training process of a new generative model on multiple levels to bridge the two gaps. The high-level representations can teach the generative model to extract necessary visual information from text descriptions, which can bridge the heterogeneous gap. The mid-level and low-level representations can lead it to learn structures and details of images respectively, which relieves the homogeneous gap. Therefore, we propose Symmetrical Distillation Networks (SDN) composed of a source discriminative model as ‘teacher’ and a target generative model as ‘student’. The target generative model has a symmetrical structure with the source discriminative model, in order to transfer hierarchical knowledge accessibly. Moreover, we decompose the training process into two stages with different distillation paradigms for promoting the performance of the target generative model. Experiments on two widely-used datasets are conducted to verify the effectiveness of our proposed SDN.
Are You Tampering With My Data?
We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary.
Machine Learning for Spatiotemporal Sequence Forecasting: A Survey
Spatiotemporal systems are common in the real-world. Forecasting the multi-step future of these spatiotemporal systems based on the past observations, or, Spatiotemporal Sequence Forecasting (STSF), is a significant and challenging problem. Although lots of real-world problems can be viewed as STSF and many research works have proposed machine learning based methods for them, no existing work has summarized and compared these methods from a unified perspective. This survey aims to provide a systematic review of machine learning for STSF. In this survey, we define the STSF problem and classify it into three subcategories: Trajectory Forecasting of Moving Point Cloud (TF-MPC), STSF on Regular Grid (STSF-RG) and STSF on Irregular Grid (STSF-IG). We then introduce the two major challenges of STSF: 1) how to learn a model for multi-step forecasting and 2) how to adequately model the spatial and temporal structures. After that, we review the existing works for solving these challenges, including the general learning strategies for multi-step forecasting, the classical machine learning based methods for STSF, and the deep learning based methods for STSF. We also compare these methods and point out some potential research directions.
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
Multi-Source Pointer Network for Product Title Summarization
In this paper, we study the product title summarization problem in E-commerce applications for display on mobile devices. Comparing with conventional sentence summarization, product title summarization has some extra and essential constraints. For example, factual errors or loss of the key information are intolerable for E-commerce applications. Therefore, we abstract two more constraints for product title summarization: (i) do not introduce irrelevant information; (ii) retain the key information (e.g., brand name and commodity name). To address these issues, we propose a novel multi-source pointer network by adding a new knowledge encoder for pointer network. The first constraint is handled by pointer mechanism. For the second constraint, we restore the key information by copying words from the knowledge encoder with the help of the soft gating mechanism. For evaluation, we build a large collection of real-world product titles along with human-written short titles. Experimental results demonstrate that our model significantly outperforms the other baselines. Finally, online deployment of our proposed model has yielded a significant business impact, as measured by the click-through rate.
Backpropagation and Biological Plausibility
By and large, Backpropagation (BP) is regarded as one of the most important neural computation algorithms at the basis of the progress in machine learning, including the recent advances in deep learning. However, its computational structure has been the source of many debates on its arguable biological plausibility. In this paper, it is shown that when framing supervised learning in the Lagrangian framework, while one can see a natural emergence of Backpropagation, biologically plausible local algorithms can also be devised that are based on the search for saddle points in the learning adjoint space composed of weights, neural outputs, and Lagrangian multipliers. This might open the doors to a truly novel class of learning algorithms where, because of the introduction of the notion of support neurons, the optimization scheme also plays a fundamental role in the construction of the architecture.
Interval-valued Data Prediction via Regularized Artificial Neural Network
A regularized artificial neural network (RANN) is proposed for interval-valued data prediction. The ANN model is selected due to its powerful capability in fitting linear and nonlinear functions. To meet mathematical coherence requirement for an interval (i.e., the predicted lower bounds should not cross over their upper bounds), a soft non-crossing regularizer is introduced to the interval-valued ANN model. We conduct extensive experiments based on both simulation datasets and real-life datasets, and compare the proposed RANN method with multiple traditional models, including the linear constrained center and range method (CCRM), the least absolute shrinkage and selection operator-based interval-valued regression method (Lasso-IR), the nonlinear interval kernel regression (IKR), the interval multi-layer perceptron (iMLP) and the multi-output support vector regression (MSVR). Experimental results show that the proposed RANN model is an effective tool for interval-valued prediction tasks with high prediction accuracy.
An ensemble learning method for variable selection: application to high dimensional data and missing values
Standard approaches for variable selection in linear models are not tailored to deal properly with high dimensional and incomplete data. Currently, methods dedicated to high dimensional data handle missing values by ad-hoc strategies, like complete case analysis or single imputation, while methods dedicated to missing values, mainly based on multiple imputation, do not discuss the imputation method to use with high dimensional data. Consequently, both approaches appear to be limited for many modern applications. With inspiration from ensemble methods, a new variable selection method is proposed. It extends classical variable selection methods such as stepwise, lasso or knockoff in the case of high dimensional data with or without missing data. Theoretical properties are studied and the practical interest is demonstrated through a simulation study. In the low dimensional case without missing values, the performances of the method can be better than those obtained by standard techniques. Moreover, the procedure improves the control of the error risks. With missing values, the method performs better than reference selection methods based on multiple imputation. Similar performances are obtained in the high-dimensional case with or without missing values.
Hypernetwork Knowledge Graph Embeddings
Knowledge graphs are large graph-structured databases of facts, which typically suffer from incompleteness. Link prediction is the task of inferring missing relations (links) between entities (nodes) in a knowledge graph. We propose to solve this task by using a hypernetwork architecture to generate convolutional layer filters specific to each relation and apply those filters to the subject entity embeddings. This architecture enables a trade-off between non-linear expressiveness and the number of parameters to learn. Our model simplifies the entity and relation embedding interactions introduced by the predecessor convolutional model, while outperforming all previous approaches to link prediction across all standard link prediction datasets.
Who is Really Affected by Fraudulent Reviews? An analysis of shilling attacks on recommender systems in real-world scenarios
We present the results of an initial analysis conducted on a real-life setting to quantify the effect of shilling attacks on recommender systems. We focus on both algorithm performance as well as the types of users who are most affected by these attacks.
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.
• A Heterogeneity Based Case-Control Analysis of Motorcyclist Injury Crashes: Evidence from Motorcycle Crash Causation Study• The Variable Quality of Metadata About Biological Samples Used in Biomedical Experiments• Optimized Hierarchical Power Oscillations Control for Distributed Generation Under Unbalanced Conditions• Compiler Enhanced Scheduling for OpenMP for Heterogeneous Multiprocessors• CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement• The Distribution of Reversible Functions is Normal• Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer• Localization parameters for two interacting particles in disordered two-dimensional lattices• On the mathematics of the free-choice paradigm• Segmentation of Microscopy Data for finding Nuclei in Divergent Images• Non-monotone Submodular Maximization with Nearly Optimal Adaptivity Complexity• Artificial Neural Networks in Fluid Dynamics: A Novel Approach to the Navier-Stokes Equations• Compiling Adiabatic Quantum Programs• End to End Vehicle Lateral Control Using a Single Fisheye Camera• Predicting Stochastic Travel Times based on High-Volume Floating Car Data• PACO: Signal Restoration via PAtch COnsensus• Input-to-State Stability of Nonlinear Parabolic PDEs with Dirichlet Boundary Disturbances• Safe Intersection Management for Mixed Transportation Systems with Human-Driven and Autonomous Vehicles• Supervised Kernel PCA For Longitudinal Data• Adversarial Removal of Demographic Attributes from Text Data• On the Expected Value of the Maximal Bet in the Labouchere System• Percolation on Isotropically Directed Lattice• Stochastic Combinatorial Ensembles for Defending Against Adversarial Examples• Inverse Problems in Asteroseismology• Privacy Amplification by Iteration• On the Optimality of Ergodic Trajectories for Information Gathering Tasks• Deterministic Factorization of Sparse Polynomials with Bounded Individual Degree• Designs over finite fields by difference methods• Dynamically evolved community size and stability of random Lotka-Volterra ecosystems• A Hybrid DE Approach to Designing CNN for Image Classification• Out-of-Distribution Detection using Multiple Semantic Label Representations• Cayley Digraphs Associated to Arithmetic Groups• Stability for maximal independent sets• Noncommutative polynomials describing convex sets• Learning deep representations by mutual information estimation and maximization• Adversarial Sampling for Active Learning• Class2Str: End to End Latent Hierarchy Learning• Deep Multimodal Image-Repurposing Detection• Bayesian Function-on-Scalars Regression for High Dimensional Data• Local-Global Graph Clustering with Applications in Sense and Frame Induction• VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification• An Isolated Power Factor Corrected Power Supply Utilizing the Transformer Leakage Inductance• Graph connectivity in log-diameter steps using label propagation• A non-iterative algorithm for generalized Pig games• Near log-convexity of measured heat in (discrete) time and consequences• Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks• State Polytopes Related to Two Classes of Combinatorial Neural Codes• Modes of Information Flow• You Shall Know the Most Frequent Sense by the Company it Keeps• D.H. Lehmer’s Tridiagonal determinant: An Etude in (Andrews-Inspired) Experimental Mathematics• Wrapped Loss Function for Regularizing Nonconforming Residual Distributions• Fully Active Cops and Robbers• A parallel non-uniform fast Fourier transform library based on an ‘exponential of semicircle’ kernel• Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge• Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning• Abnormal Event Detection and Location for Dense Crowds using Repulsive Forces and Sparse Reconstruction• The story of conflict and cooperation• Lessons from Natural Language Inference in the Clinical Domain• Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion• Dominant Channel Estimation via MIPS for Large-Scale Antenna Systems with One-Bit ADCs• Channel Estimation for One-Bit Massive MIMO Systems Exploiting Spatio-Temporal Correlations• Automatic skin lesion segmentation on dermoscopic images by the means of superpixel merging• Designing Near-Optimal Policies for Energy Management in a Stochastic Environment• Stochastic Modeling and Analysis of User-Centric Network MIMO Systems• Energy Efficient Event Localization and Classification for Nano IoT• Direction of Arrival and Center Frequency Estimation for Impulse Radio Millimeter Wave Communications• Making a Dynamic Interaction Between Two Power System Analysis Software• Position Sensor-less and Adaptive Speed Design for Controlling Brush-less DC Motor Drives• Multi-task multiple kernel machines for personalized pain recognition from functional near-infrared spectroscopy brain signals• Stabilization for Networked Control Systems with Simultaneous Input Delay and Markovian Packet Losses• Central limit theorem for statistics of subcritical configuration models• Parametrix method for one-dimensional locally α α -stable Lévy-type processes• Real-time Analog Pixel-to-pixel Dynamic Frame Differencing with Memristive Sensing Circuits• Polynomial Chaos reformulation in Nonlinear Stochastic Optimal Control with application on a drivetrain subject to bifurcation phenomena• Critical two-point function for long-range models with power-law couplings: The marginal case for • A note on the approximate symmetry of Bregman distances• Parameter Synthesis Problems for Parametric Timed Automata• A computationally efficient correlated mixed Probit for credit risk modelling• Downsampling Strategies are Crucial for Word Embedding Reliability• Wavelet imaging of transient energy localization in nonlinear systems at thermal equilibrium: the case study of NaI crystals at high temperature• The Role of the Task Topic in Web Search of Different Task Types• A Usefulness-based Approach for Measuring the Local and Global Effect of IIR Services• Translational Grounding: Using Paraphrase Recognition and Generation to Demonstrate Semantic Abstraction Abilities of MultiLingual NMT• Existence of a Unique Quasi-stationary Distribution for Stochastic Reaction Networks• Analysis of Speeches in Indian Parliamentary Debates• Decompositions of log-correlated fields with applications• Optimum Transmission Rate in Fading Channels with Markovian Sources and QoS Constraints• Fully-Convolutional Point Networks for Large-Scale Point Clouds• Deep Learned Full-3D Object Completion from Single View• Search for Common Minima in Joint Optimization of Multiple Cost Functions• Deep Video-Based Performance Cloning• Demonstrating PAR4SEM – A Semantic Writing Aid with Adaptive Paraphrasing• Spanning surfaces in 3-graphs• Adversarial training for multi-context joint entity and relation extraction• Dissipation in parabolic SPDEs• ADMM for Exploiting Structure in MPC Problems• Automatic Generation of Text Descriptive Comments for Code Blocks• Self-supervised learning of a facial attribute embedding from video• The Turtleback Diagram for Conditional Probability• Multimodal Interaction-aware Motion Prediction for Autonomous Street Crossing• Discrete-attractor-like Tracking in Continuous Attractor Neural Networks• On Stronger Types of Locating-dominating Codes• Defending against Intrusion of Malicious UAVs with Networked UAV Defense Swarms• Optimal designs for two-level main effects models on a restricted design region• Scalable Population Synthesis with Deep Generative Modeling• On a New Improvement-Based Acquisition Function for Bayesian Optimization• Stable divisorial gonality is in NP• General hypergeometric distribution: A basic statistical distribution for the number of overlapped elements in multiple subsets drawn from a finite population• Smart energy models for atomistic simulations using a DFT-driven multifidelity approach• A Game-Theoretic Approach to Multi-Objective Resource Sharing and Allocation in Mobile Edge Clouds• A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation• Group Activity Selection with Few Agent Types• Minimalist designs• Greedy Harmony Search Algorithm for the Hop Constrained Connected Facility Location• Normal matrix ensembles at the hard edge, orthogonal polynomials, and universality• A novel approach to assess the impact of the Fano factor on the sensitivity of low-mass dark matter experiments• Robust Chemical Circuits• Microwave Hilbert Transformer and its Applications in Real-time Analog Processing (RAP)• Thresholding the virtual value: a simple method to increase welfare and lower reserve prices in online auction systems• Entropy of a quantum channel• Iterated Greedy Algorithms for the Hop-Constrained Steiner Tree Problem• Regularity and h-polynomials of binomial edge idals• Optimizing the Union of Intersections LASSO ( UoILASSO U o I L A S S O ) and Vector Autoregressive ( UoIVAR U o I V A R ) Algorithms for Improved Statistical Estimation at Scale• Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval• Resource Allocation for Cooperative D2D-Enabled Wireless Caching Networks• Gaussian Word Embedding with a Wasserstein Distance Loss• Real Time Elbow Angle Estimation Using Single RGB Camera• Functional convergence for moving averages with heavy tails and random coefficients• Student Cluster Competition 2017, Team University ofTexas at Austin/Texas State University: Reproducing Vectorization of the Tersoff Multi-Body Potential on the Intel Skylake and NVIDIA V100 Architectures• Quantitative contraction rates for Markov chains on general state spaces• URLLC Services in 5G – Low Latency Enhancements for LTE• QuAC : Question Answering in Context• CoQA: A Conversational Question Answering Challenge• A Hybridized Discontinuous Galerkin Method for A Linear Degenerate Elliptic Equation Arising from Two-Phase Mixtures• ISNA-Set: A novel English Corpus of Iran NEWS