Rohrhofer Franz Martin, Posch Stefan, Gößnitzer Clemens, Geiger Bernhard
2023
Physics-informed neural networks (PINNs) have emerged as a promising deep learning method, capable of solving forward and inverse problems governed by differential equations. Despite their recent advance, it is widely acknowledged that PINNs are difficult to train and often require a careful tuning of loss weights when data and physics loss functions are combined by scalarization of a multi-objective (MO) problem. In this paper, we aim to understand how parameters of the physical system, such as characteristic length and time scales, the computational domain, and coefficients of differential equations affect MO optimization and the optimal choice of loss weights. Through a theoretical examination of where these system parameters appear in PINN training, we find that they effectively and individually scale the loss residuals, causing imbalances in MO optimization with certain choices of system parameters. The immediate effects of this are reflected in the apparent Pareto front, which we define as the set of loss values achievable with gradient-based training and visualize accordingly. We empirically verify that loss weights can be used successfully to compensate for the scaling of system parameters, and enable the selection of an optimal solution on the apparent Pareto front that aligns well with the physically valid solution. We further demonstrate that by altering the system parameterization, the apparent Pareto front can shift and exhibit locally convex parts, resulting in a wider range of loss weights for which gradient-based training becomes successful. This work explains the effects of system parameters on MO optimization in PINNs, and highlights the utility of proposed loss weighting schemes.
Geiger Bernhard, Schuppler Barbara
2023
Given the development of automatic speech recognition based techniques for creating phonetic annotations of large speech corpora, there has been a growing interest in investigating the frequencies of occurrence of phonological and reduction processes. Given that most studies have analyzed these processes separately, they did not provide insights about their cooccurrences. This paper contributes with introducing graph theory methods for the analysis of pronunciation variation in a large corpus of Austrian German conversational speech. More specifically, we investigate how reduction processes that are typical for spontaneous German in general co-occur with phonological processes typical for the Austrian German variety. Whereas our concrete findings are of special interest to scientists investigating variation in German, the approach presented opens new possibilities to analyze pronunciation variation in large corpora of different speaking styles in any language.
Geiger Bernhard, Jahani Alireza, Hussain Hussain, Groen Derek
2023
In this work, we investigate Markov aggregation for agent-based models (ABMs). Specifically, if the ABM models agent movements on a graph, if its ruleset satisfies certain assumptions, and if the aim is to simulate aggregate statistics such as vertex populations, then the ABM can be replaced by a Markov chain on a comparably small state space. This equivalence between a function of the ABM and a smaller Markov chain allows to reduce the computational complexity of the agent-based simulation from being linear in the number of agents, to being constant in the number of agents and polynomial in the number of locations.We instantiate our theory for a recent ABM for forced migration (Flee). We show that, even though the rulesets of Flee violate some of our necessary assumptions, the aggregated Markov chain-based model, MarkovFlee, achieves comparable accuracy at substantially reduced computational cost. Thus, MarkovFlee can help NGOs and policy makers forecast forced migration in certain conflict scenarios in a cost-effective manner, contributing to fast and efficient delivery of humanitarian relief.
Rohrhofer Franz Martin, Posch Stefan, Gößnitzer Clemens, Geiger Bernhard
2023
This paper empirically studies commonly observed training difficulties of Physics-Informed Neural Networks (PINNs) on dynamical systems.Our results indicate that fixed points which are inherent to these systems play a key role in the optimization of the in PINNs embedded physics loss function.We observe that the loss landscape exhibits local optima that are shaped by the presence of fixed points.We find that these local optima contribute to the complexity of the physics loss optimization which can explain common training difficulties and resulting nonphysical predictions.Under certain settings, e.g., initial conditions close to fixed points or long simulations times, we show that those optima can even become better than that of the desired solution.
Posch Stefan, Gößnitzer Clemens, Rohrhofer Franz Martin, Geiger Bernhard, Wimmer Andreas
2023
The turbulent jet ignition concept using prechambers is a promising solution to achieve stable combustion at lean conditions in large gas engines, leading to high efficiency at low emission levels. Due to the wide range of design and operating parameters for large gas engine prechambers, the preferred method for evaluating different designs is computational fluid dynamics (CFD), as testing in test bed measurement campaigns is time-consuming and expensive. However, the significant computational time required for detailed CFD simulations due to the complexity of solving the underlying physics also limits its applicability. In optimization settings similar to the present case, i.e., where the evaluation of the objective function(s) is computationally costly, Bayesian optimization has largely replaced classical design-of-experiment. Thus, the present study deals with the computationally efficient Bayesian optimization of large gas engine prechambers design using CFD simulation. Reynolds-averaged-Navier-Stokes simulations are used to determine the target values as a function of the selected prechamber design parameters. The results indicate that the chosen strategy is effective to find a prechamber design that achieves the desired target values.
Rohrhofer Franz Martin, Posch Stefan, Gößnitzer Clemens, García-Oliver José M., Geiger Bernhard
2023
Flamelet models are widely used in computational fluid dynamics to simulate thermochemical processes in turbulent combustion. These models typically employ memory-expensive lookup tables that are predetermined and represent the combustion process to be simulated.Artificial neural networks (ANNs) offer a deep learning approach that can store this tabular data using a small number of network weights, potentially reducing the memory demands of complex simulations by orders of magnitude.However, ANNs with standard training losses often struggle with underrepresented targets in multivariate regression tasks, e.g., when learning minor species mass fractions as part of lookup tables.This paper seeks to improve the accuracy of an ANN when learning multiple species mass fractions of a hydrogen (\ce{H2}) combustion lookup table. We assess a simple, yet effective loss weight adjustment that outperforms the standard mean-squared error optimization and enables accurate learning of all species mass fractions, even of minor species where the standard optimization completely fails. Furthermore, we find that the loss weight adjustment leads to more balanced gradients in the network training, which explains its effectiveness.
Hoffer Johannes G., Ranftl Sascha, Geiger Bernhard
2023
We consider the problem of finding an input to a stochastic black box function such that the scalar output of the black box function is as close as possible to a target value in the sense of the expected squared error. While the optimization of stochastic black boxes is classic in (robust) Bayesian optimization, the current approaches based on Gaussian processes predominantly focus either on (i) maximization/minimization rather than target value optimization or (ii) on the expectation, but not the variance of the output, ignoring output variations due to stochasticity in uncontrollable environmental variables. In this work, we fill this gap and derive acquisition functions for common criteria such as the expected improvement, the probability of improvement, and the lower confidence bound, assuming that aleatoric effects are Gaussian with known variance. Our experiments illustrate that this setting is compatible with certain extensions of Gaussian processes, and show that the thus derived acquisition functions can outperform classical Bayesian optimization even if the latter assumptions are violated. An industrial use case in billet forging is presented.
Adilova Linara, Geiger Bernhard, Fischer Asja
2023
The information-theoretic framework promises to explain the predictive power of neural networks. In particular, the information plane analysis, which measures mutual information (MI) between input and representation as well as representation and output, should give rich insights into the training process. This approach, however, was shown to strongly depend on the choice of estimator of the MI. The problem is amplified for deterministic networks if the MI between input and representation is infinite. Thus, the estimated values are defined by the different approaches for estimation, but do not adequately represent the training process from an information-theoretic perspective. In this work, we show that dropout with continuously distributed noise ensures that MI is finite. We demonstrate in a range of experiments that this enables a meaningful information plane analysis for a class of dropout neural networks that is widely used in practice.
Berger Katharina, Rusch Magdalena, Pohlmann Antonia, Popowicz Martin, Geiger Bernhard, Gursch Heimo, Schöggl Josef-Peter, Baumgartner Rupert J.
2023
Digital product passports (DPPs) are an emerging technology and are considered as enablers of sustainable and circular value chains as they support sustainable product management (SPM) by gathering and containing product life cycle data. However, some life cycle data are considered sensitive by stakeholders, resulting in a reluctance to share such data. This contribution provides a concept illustrating how data science and machine learning approaches enable electric vehicle battery (EVB) value chain stakeholders to carry out confidentiality-preserving data exchange via a DPP. This, in turn, can support overcoming data sharing reluctances, consequently facilitating sustainability data management on a DPP for an EVB. The concept development comprised a literature review to identify data needs for sustainable EVB management, data management challenges, and potential data science approaches for data management support. Furthermore, three explorative focus group workshops and follow-up consultations with data scientists were conducted to discuss identified data sciences approaches. This work complements the emerging literature on digitalization and SPM by exploring the specific potential of data science, and machine learning approaches enabling sustainability data management and reducing data sharing reluctance. Furthermore, practical relevance is given, as this concept may provide practitioners with new impulses regarding DPP development and implementation.
Hobisch Elisabeth, Völkl Yvonne, Geiger Bernhard, Saric Sanja, Scholger Martina, Helic Denis, Koncar Philipp, Glatz Christina
2023
(extended abstract)
Gabler Philipp, Geiger Bernhard, Schuppler Barbara, Kern Roman
2023
Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation and noise, but there is a more fundamental deviation at play: for read speech, the audio signal is produced by recitation of the given text, whereas in spontaneous speech, the text is transcribed from a given signal. In this review, we embrace this difference by presenting a first introduction of causal reasoning into automatic speech recognition, and describing causality as a tool to study speaking styles and training data. After breaking down the data generation processes of read and spontaneous speech and analysing the domain from a causal perspective, we highlight how data generation by annotation must affect the interpretation of inference and performance. Our work discusses how various results from the causality literature regarding the impact of the direction of data generation mechanisms on learning and prediction apply to speech data. Finally, we argue how a causal perspective can support the understanding of models in speech processing regarding their behaviour, capabilities, and limitations.
Hoffer Johannes Georg, Geiger Bernhard, Kern Roman
2023
This research presents an approach that combines stacked Gaussian processes (stacked GP) with target vector Bayesian optimization (BO) to solve multi-objective inverse problems of chained manufacturing processes. In this context, GP surrogate models represent individual manufacturing processes and are stacked to build a unified surrogate model that represents the entire manufacturing process chain. Using stacked GPs, epistemic uncertainty can be propagated through all chained manufacturing processes. To perform target vector BO, acquisition functions make use of a noncentral χ-squared distribution of the squared Euclidean distance between a given target vector and surrogate model output. In BO of chained processes, there are the options to use a single unified surrogate model that represents the entire joint chain, or that there is a surrogate model for each individual process and the optimization is cascaded from the last to the first process. Literature suggests that a joint optimization approach using stacked GPs overestimates uncertainty, whereas a cascaded approach underestimates it. For improved target vector BO results of chained processes, we present an approach that combines methods which under- or overestimate uncertainties in an ensemble for rank aggregation. We present a thorough analysis of the proposed methods and evaluate on two artificial use cases and on a typical manufacturing process chain: preforming and final pressing of an Inconel 625 superalloy billet.
Steger Sophie, Rohrhofer Franz Martin, Geiger Bernhard
2022
Despite extensive research, physics-informed neural networks (PINNs) are still difficult to train, especially when the optimization relies heavily on the physics loss term. Convergence problems frequently occur when simulating dynamical systems with high-frequency components, chaotic or turbulent behavior. In this work, we discuss whether the traditional PINN framework is able to predict chaotic motion by conducting experiments on the undamped double pendulum. Our results demonstrate that PINNs do not exhibit any sensitivity to perturbations in the initial condition. Instead, the PINN optimization consistently converges to physically correct solutions that violate the initial condition only marginally, but diverge significantly from the desired solution due to the chaotic nature of the system. In fact, the PINN predictions primarily exhibit low-frequency components with a smaller magnitude of higher-order derivatives, which favors lower physics loss values compared to the desired solution. We thus hypothesize that the PINNs "cheat" by shifting the initial conditions to values that correspond to physically correct solutions that are easier to learn. Initial experiments suggest that domain decomposition combined with an appropriate loss weighting scheme mitigates this effect and allows convergence to the desired solution.
Xue Yani, Li Miqing, Arabnejad Hamid, Suleimenova, Geiger Bernhard, Jahani Alireza, Groen Derek
2022
In the context of humanitarian support for forcibly displaced persons, camps play an important role in protecting people and ensuring their survival and health. A challenge in this regard is to find optimal locations for establishing a new asylum-seeker/unrecognized refugee or IDPs (internally displaced persons) camp. In this paper we formulate this problem as an instantiation of the well-known facility location problem (FLP) with three objectives to be optimized. In particular, we show that AI techniques and migration simulations can be used to provide decision support on camp placement.
De Freitas Joao Pedro, Berg Sebastian, Geiger Bernhard, Mücke Manfred
2022
In this paper, we frame homogeneous-feature multi-task learning (MTL) as a hierarchical representation learning problem, with one task-agnostic and multiple task-specific latent representations. Drawing inspiration from the information bottleneck principle and assuming an additive independent noise model between the task-agnostic and task-specific latent representations, we limit the information contained in each task-specific representation. It is shown that our resulting representations yield competitive performance for several MTL benchmarks. Furthermore, for certain setups, we show that the trained parameters of the additive noise model are closely related to the similarity of different tasks. This indicates that our approach yields a task-agnostic representation that is disentangled in the sense that its individual dimensions may be interpretable from a task-specific perspective.
Lovric Mario, Antunović Mario, Šunić Iva, Vuković Matej, Kecorius Simon, Kröll Mark, Bešlić Ivan, Godec Ranka, Pehnec Gordana, Geiger Bernhard, Grange Stuart K, Šimić Iva
2022
In this paper, the authors investigated changes in mass concentrations of particulate matter (PM) during the Coronavirus Disease of 2019 (COVID-19) lockdown. Daily samples of PM1, PM2.5 and PM10 fractions were measured at an urban background sampling site in Zagreb, Croatia from 2009 to late 2020. For the purpose of meteorological normalization, the mass concentrations were fed alongside meteorological and temporal data to Random Forest (RF) and LightGBM (LGB) models tuned by Bayesian optimization. The models’ predictions were subsequently de-weathered by meteorological normalization using repeated random resampling of all predictive variables except the trend variable. Three pollution periods in 2020 were examined in detail: January and February, as pre-lockdown, the month of April as the lockdown period, as well as June and July as the “new normal”. An evaluation using normalized mass concentrations of particulate matter and Analysis of variance (ANOVA) was conducted. The results showed that no significant differences were observed for PM1, PM2.5 and PM10 in April 2020—compared to the same period in 2018 and 2019. No significant changes were observed for the “new normal” as well. The results thus indicate that a reduction in mobility during COVID-19 lockdown in Zagreb, Croatia, did not significantly affect particulate matter concentration in the long-term
Steger Sophie, Geiger Bernhard, Smieja Marek
2022
We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorporating semi-supervision information as hard constraints in a Hartigan-style algorithm. The introduced Constrained Markov Clustering (CoMaC) is an extension of a recent information-theoretic framework for (unsupervised) Markov aggregation to the semi-supervised case. Instantiating CoMaC for certain parameter settings further generalizes two previous information-theoretic objectives for unsupervised clustering. Our results indicate that CoMaC is competitive with the state-of-the-art
Schweimer Christoph, Gfrerer Christine, Lugstein Florian, Pape David, Velimsky Jan, Elsässer Robert, Geiger Bernhard
2022
Online social networks are a dominant medium in everyday life to stay in contact with friends and to share information. In Twitter, users can connect with other users by following them, who in turn can follow back. In recent years, researchers studied several properties of social networks and designed random graph models to describe them. Many of these approaches either focus on the generation of undirected graphs or on the creation of directed graphs without modeling the dependencies between reciprocal (i.e., two directed edges of opposite direction between two nodes) and directed edges. We propose an approach to generate directed social network graphs that creates reciprocal and directed edges and considers the correlation between the respective degree sequences.Our model relies on crawled directed graphs in Twitter, on which information w.r.t.\ a topic is exchanged or disseminated. While these graphs exhibit a high clustering coefficient and small average distances between random node pairs (which is typical in real-world networks), their degree sequences seem to follow a $\chi^2$-distribution rather than power law. To achieve high clustering coefficients, we apply an edge rewiring procedure that preserves the node degrees.We compare the crawled and the created graphs, and simulate certain algorithms for information dissemination and epidemic spreading on them. The results show that the created graphs exhibit very similar topological and algorithmic properties as the real-world graphs, providing evidence that they can be used as surrogates in social network analysis. Furthermore, our model is highly scalable, which enables us to create graphs of arbitrary size with almost the same properties as the corresponding real-world networks.
Hoffer Johannes Georg, Ofner Andreas Benjamin, Rohrhofer Franz Martin, Lovric Mario, Kern Roman, Lindstaedt Stefanie , Geiger Bernhard
2022
Most engineering domains abound with models derived from first principles that have beenproven to be effective for decades. These models are not only a valuable source of knowledge, but they also form the basis of simulations. The recent trend of digitization has complemented these models with data in all forms and variants, such as process monitoring time series, measured material characteristics, and stored production parameters. Theory-inspired machine learning combines the available models and data, reaping the benefits of established knowledge and the capabilities of modern, data-driven approaches. Compared to purely physics- or purely data-driven models, the models resulting from theory-inspired machine learning are often more accurate and less complex, extrapolate better, or allow faster model training or inference. In this short survey, we introduce and discuss several prominent approaches to theory-inspired machine learning and show how they were applied in the fields of welding, joining, additive manufacturing, and metal forming.
Ofner Andreas Benjamin, Kefalas Achilles, Posch Stefan, Geiger Bernhard
2022
This article introduces a method for the detection of knock occurrences in an internal combustion engine (ICE) using a 1-D convolutional neural network trained on in-cylinder pressure data. The model architecture is based on expected frequency characteristics of knocking combustion. All cycles were reduced to 60° CA long windows with no further processing applied to the pressure traces. The neural networks were trained exclusively on in-cylinder pressure traces from multiple conditions, with labels provided by human experts. The best-performing model architecture achieves an accuracy of above 92% on all test sets in a tenfold cross-validation when distinguishing between knocking and non-knocking cycles. In a multiclass problem where each cycle was labeled by the number of experts who rated it as knocking, 78% of cycles were labeled perfectly, while 90% of cycles were classified at most one class from ground truth. They thus considerably outperform the broadly applied maximum amplitude of pressure oscillation (MAPO) detection method, as well as references reconstructed from previous works. Our analysis indicates that the neural network learned physically meaningful features connected to engine-characteristic resonances, thus verifying the intended theory-guided data science approach. Deeper performance investigation further shows remarkable generalization ability to unseen operating points. In addition, the model proved to classify knocking cycles in unseen engines with increased accuracy of 89% after adapting to their features via training on a small number of exclusively non-knocking cycles. The algorithm takes below 1 ms to classify individual cycles, effectively making it suitable for real-time engine control.
Hoffer Johannes Georg, Geiger Bernhard, Kern Roman
2022
The avoidance of scrap and the adherence to tolerances is an important goal in manufacturing. This requires a good engineering understanding of the underlying process. To achieve this, real physical experiments can be conducted. However, they are expensive in time and resources, and can slow down production. A promising way to overcome these drawbacks is process exploration through simulation, where the finite element method (FEM) is a well-established and robust simulation method. While FEM simulation can provide high-resolution results, it requires extensive computing resources to do so. In addition, the simulation design often depends on unknown process properties. To circumvent these drawbacks, we present a Gaussian Process surrogate model approach that accounts for real physical manufacturing process uncertainties and acts as a substitute for expensive FEM simulation, resulting in a fast and robust method that adequately depicts reality. We demonstrate that active learning can be easily applied with our surrogate model to improve computational resources. On top of that, we present a novel optimization method that treats aleatoric and epistemic uncertainties separately, allowing for greater flexibility in solving inverse problems. We evaluate our model using a typical manufacturing use case, the preforming of an Inconel 625 superalloy billet on a forging press.
Amjad Rana Ali, Liu Kairen, Geiger Bernhard
2022
In this work, we investigate the use of three information-theoretic quantities--entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler (KL) divergence--to understand and study the behavior of already trained fully connected feedforward neural networks (NNs). We analyze the connection between these information-theoretic quantities and classification performance on the test set by cumulatively ablating neurons in networks trained on MNIST, FashionMNIST, and CIFAR-10. Our results parallel those recently published by Morcos et al., indicating that class selectivity is not a good indicator for classification performance. However, looking at individual layers separately, both mutual information and class selectivity are positively correlated with classification performance, at least for networks with ReLU activation functions. We provide explanations for this phenomenon and conclude that it is ill-advised to compare the proposed information-theoretic quantities across layers. Furthermore, we show that cumulative ablation of neurons with ascending or descending information-theoretic quantities can be used to formulate hypotheses regarding the joint behavior of multiple neurons, such as redundancy and synergy, with comparably low computational cost. We also draw connections to the information bottleneck theory for NNs.
Geiger Bernhard
2021
(extended abstract)
Hoffer Johannes Georg, Geiger Bernhard, Ofner Patrick, Kern Roman
2021
The technical world of today fundamentally relies on structural analysis in the form of design and structural mechanic simulations.A traditional and robust simulation method is the physics-based Finite Element Method (FEM) simulation. FEM simulations in structural mechanics are known to be very accurate, however, the higher the desired resolution, the more computational effort is required. Surrogate modeling provides a robust approach to address this drawback. Nonetheless, finding the right surrogate model and its hyperparameters for a specific use case is not a straightforward process.In this paper, we discuss and compare several classes of mesh-free surrogate models based on traditional and thriving Machine Learning (ML) and Deep Learning (DL) methods.We show that relatively simple algorithms (such as $k$-nearest neighbor regression) can be competitive in applications with low geometrical complexity and extrapolation requirements. With respect to tasks exhibiting higher geometric complexity, our results show that recent DL methods at the forefront of literature (such as physics-informed neural networks), are complicated to train and to parameterize and thus require further research before they can be put to practical use. In contrast, we show that already well-researched DL methods such as the multi-layer perceptron are superior with respect to interpolation use cases and can be easily trained with available tools.With our work, we thus present a basis for selection and practical implementation of surrogate models.
Geiger Bernhard, Kubin Gernot
2021
This Special Issue aims to investigate the properties of the information bottleneck (IB) functional in its new context in deep learning and to propose learning mechanisms inspired by the IB framework. More specifically, we invited authors to submit manuscripts that provide novel insight into the properties of the IB functional that apply the IB principle for training deep, i.e., multi-layer machine learning structures such as NNs and that investigate the learning behavior of NNs using the IBframework. To cover the breadth of the current literature, we also solicited manuscripts that discuss frameworks inspired by the IB principle, but that depart from them in a well-motivated manner.
Smieja Marek, Wolczyk Maciej, Tabor Jacek, Geiger Bernhard
2021
We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and is implemented in a typical Wasserstein autoencoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of the Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and, thus, obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: 1) interpolation between any pair of points in the latent space produces realistically looking samples; 2) combining the interpolation property with disentangling of class and style information, SeGMA is able to perform continuous style transfer from one class to another; and 3) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components.
Geiger Bernhard
2021
We review the current literature concerned with information plane (IP) analyses of neural network (NN) classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review this evidence together with a detailed analysis of how the respective information quantities were estimated. Our survey suggests that compression visualized in IPs is not necessarily information-theoretic but is rather often compatible with geometric compression of the latent representations. This insight gives the IP a renewed justification. Aside from this, we shed light on the problem of estimating mutual information in deterministic NNs and its consequences. Specifically, we argue that, even in feedforward NNs, the data processing inequality needs not to hold for estimates of mutual information. Similarly, while a fitting phase, in which the mutual information is between the latent representation and the target increases, is necessary (but not sufficient) for good classification performance, depending on the specifics of mutual information estimation, such a fitting phase needs to not be visible in the IP.
Basirat Mina, Geiger Bernhard, Roth Peter
2021
Information plane analysis, describing the mutual information between the input and a hidden layer and between a hidden layer and the target over time, has recently been proposed to analyze the training of neural networks. Since the activations of a hidden layer are typically continuous-valued, this mutual information cannot be computed analytically and must thus be estimated, resulting in apparently inconsistent or even contradicting results in the literature. The goal of this paper is to demonstrate how information plane analysis can still be a valuable tool for analyzing neural network training. To this end, we complement the prevailing binning estimator for mutual information with a geometric interpretation. With this geometric interpretation in mind, we evaluate the impact of regularization and interpret phenomena such as underfitting and overfitting. In addition, we investigate neural network learning in the presence of noisy data and noisy labels.
Schweimer Christoph, Geiger Bernhard, Wang Meizhu, Gogolenko Sergiy, Gogolenko Sergiy, Mahmood Imran, Jahani Alireza, Suleimenova Diana, Groen Derek
2021
Automated construction of location graphs is instrumental but challenging, particularly in logistics optimisation problems and agent-based movement simulations. Hence, we propose an algorithm for automated construction of location graphs, in which vertices correspond to geographic locations of interest and edges to direct travelling routes between them. Our approach involves two steps. In the first step, we use a routing service to compute distances between all pairs of L locations, resulting in a complete graph. In the second step, we prune this graph by removing edges corresponding to indirect routes, identified using the triangle inequality. The computational complexity of this second step is O(L3), which enables the computation of location graphs for all towns and cities on the road network of an entire continent. To illustrate the utility of our algorithm in an application, we constructed location graphs for four regions of different size and road infrastructures and compared them to manually created ground truths. Our algorithm simultaneously achieved precision and recall values around 0.9 for a wide range of the single hyperparameter, suggesting that it is a valid approach to create large location graphs for which a manual creation is infeasible.
Geiger Bernhard, Al-Bashabsheh Ali
2021
We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functions of the corresponding Gibbs field. We illustrate our sufficient conditions at the hand of several examples and discuss implications for practical applications of MRFs. As a side result, we give a partial characterization of functions of MRFs that are information preserving.
Schweimer Christoph, Geiger Bernhard, Wang Meizhu, Gogolenko Sergiy, Mahmood Imran, Jahani Alireza, Suleimenova Diana, Groen Derek
2021
Kefalas Achilles, Ofner Andreas Benjamin, Pirker Gerhard, Posch Stefan, Geiger Bernhard, Wimmer Andreas
2021
The phenomenon of knock is an abnormal combustion occurring in spark-ignition (SI) engines and forms a barrier that prevents an increase in thermal efficiency while simultaneously reducing CO2 emissions. Since knocking combustion is highly stochastic, a cyclic analysis of in-cylinder pressure is necessary. In this study we propose an approach for efficient and robust detection and identification of knocking combustion in three different internal combustion engines. The proposed methodology includes a signal processing technique, called continuous wavelet transformation (CWT), which provides a simultaneous analysis of the in-cylinder pressure traces in the time and frequency domains with coefficients. These coefficients serve as input for a convolutional neural network (CNN) which extracts distinctive features and performs an image recognition task in order to distinguish between non-knock and knock. The results revealed the following: (i) The CWT delivered a stable and effective feature space with the coefficients that represents the unique time-frequency pattern of each individual in-cylinder pressure cycle; (ii) the proposed approach was superior to the state-of-the-art threshold value exceeded (TVE) method with a maximum amplitude pressure oscillation (MAPO) criterion improving the overall accuracy by 6.15 percentage points (up to 92.62%); and (iii) The CWT + CNN method does not require calibrating threshold values for different engines or operating conditions as long as enough and diverse data is used to train the neural network.
Geiger Bernhard, Kubin Gernot
2020
guest editorial for a special issue
Geiger Bernhard, Fischer Ian
2020
In this short note, we relate the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional, respectively. Although the two functionals were shown to be equivalent, it was empirically observed that optimizing bounds on the CEB functional achieves better generalization performance and adversarial robustness than optimizing those on the IB functional. This work tries to shed light on this issue by showing that, in the most general setting, no ordering can be established between these variational bounds, while such an ordering can be enforced by restricting the feasible sets over which the optimizations take place. The absence of such an ordering in the general setup suggests that the variational bound on the CEB functional is either more amenable to optimization or a relevant cost function for optimization in its own regard, i.e., without justification from the IB or CEB functionals.
Klimashevskaia Anastasia, Geiger Bernhard, Hagmüller Martin, Helic Denis, Fischer Frank
2020
(extended abstract)
Hobisch Elisbeth, Scholger Martina, Fuchs Alexandra, Geiger Bernhard, Koncar Philipp, Saric Sanja
2020
(extended abstract)
Schrunner Stefan, Geiger Bernhard, Zernig Anja, Kern Roman
2020
Classification has been tackled by a large number of algorithms, predominantly following a supervised learning setting. Surprisingly little research has been devoted to the problem setting where a dataset is only partially labeled, including even instances of entirely unlabeled classes. Algorithmic solutions that are suited for such problems are especially important in practical scenarios, where the labelling of data is prohibitively expensive, or the understanding of the data is lacking, including cases, where only a subset of the classes is known. We present a generative method to address the problem of semi-supervised classification with unknown classes, whereby we follow a Bayesian perspective. In detail, we apply a two-step procedure based on Bayesian classifiers and exploit information from both a small set of labeled data in combination with a larger set of unlabeled training data, allowing that the labeled dataset does not contain samples from all present classes. This represents a common practical application setup, where the labeled training set is not exhaustive. We show in a series of experiments that our approach outperforms state-of-the-art methods tackling similar semi-supervised learning problems. Since our approach yields a generative model, which aids the understanding of the data, it is particularly suited for practical applications.
Amjad Rana Ali, Geiger Bernhard
2020
In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that recent successes reported about training DNNs using the IB framework must be attributed to such solutions. As a side effect, our results indicate limitations of the IB framework for the analysis of DNNs. We also note that rather than trying to repair the inherent problems in the IB functional, a better approach may be to design regularizers on latent representation enforcing the desired properties directly.
Gogolenko Sergiy, Groen Derek, Suleimenova Dian, Mahmood Imra, Lawenda Marcin, Nieto De Santos Javie, Hanley Joh, Vukovic Milana, Kröll Mark, Geiger Bernhard, Elsaesser Rober, Hoppe Dennis
2020
Accurate digital twinning of the global challenges (GC) leadsto computationally expensive coupled simulations. These simulationsbring together not only different models, but also various sources of mas-sive static and streaming data sets. In this paper, we explore ways tobridge the gap between traditional high performance computing (HPC)and data-centric computation in order to provide efficient technologicalsolutions for accurate policy-making in the domain of GC. GC simula-tions in HPC environments give rise to a number of technical challengesrelated to coupling. Being intended to reflect current and upcoming situ-ation for policy-making, GC simulations extensively use recent streamingdata coming from external data sources, which requires changing tradi-tional HPC systems operation. Another common challenge stems fromthe necessity to couple simulations and exchange data across data centersin GC scenarios. By introducing a generalized GC simulation workflow,this paper shows commonality of the technical challenges for various GCand reflects on the approaches to tackle these technical challenges in theHiDALGO project
Amjad Rana Ali, Bloechl Clemens, Geiger Bernhard
2020
We propose an information-theoretic Markov aggregation framework that is motivated by two objectives: 1) The Markov chain observed through the aggregation mapping should be Markov. 2) The aggregated chain should retain the temporal dependence structure of the original chain. We analyze our parameterized cost function and show that it contains previous cost functions as special cases, which we critically assess. Our simple optimization heuristic for deterministic aggregations characterizes the optimization landscape for different parameter values.
Koncar Philipp, Fuchs Alexandra, Hobisch Elisabeth, Geiger Bernhard, Scholger Martina, Helic Denis
2020
Spectator periodicals contributed to spreading the ideas of the Age of Enlightenment, a turning point in human history and the foundation of our modern societies. In this work, we study the spirit and atmosphere captured in the spectator periodicals about important social issues from the 18th century by analyzing text sentiment of those periodicals. Specifically, based on a manually annotated corpus of over 3 700 issues published in five different languages and over a period of more than one hundred years, we conduct a three-fold sentiment analysis: First, we analyze the development of sentiment over time as well as the influence of topics and narrative forms on sentiment. Second, we construct sentiment networks to assess the polarity of perceptions between different entities, including periodicals, places and people. Third, we construct and analyze sentiment word networks to determine topological differences between words with positive and negative polarity allowing us to make conclusions on how sentiment was expressed in spectator periodicals.Our results depict a mildly positive tone in spectator periodicals underlining the positive attitude towards important topics of the Age of Enlightenment, but also signaling stylistic devices to disguise critique in order to avoid censorship. We also observe strong regional variation in sentiment, indicating cultural and historic differences between countries. For example, while Italy perceived other European countries as positive role models, French periodicals were frequently more critical towards other European countries. Finally, our topological analysis depicts a weak overrepresentation of positive sentiment words corroborating our findings about a general mildly positive tone in spectator periodicals.We believe that our work based on the combination of the sentiment analysis of spectator periodicals and the extensive knowledge available from literary studies sheds interesting new light on these publications. Furthermore, we demonstrate the inclusion of sentiment analysis as another useful method in the digital humanist’s distant reading toolbox.
Fuchs Alexandra, Geiger Bernhard, Hobisch Elisabeth, Koncar Philipp, More Jacqueline, Saric Sanja, Scholger Martina
2020
Chiancone Alessandro, Cuder Gerald, Geiger Bernhard, Harzl Annemarie, Tanzer Thomas, Kern Roman
2019
This paper presents a hybrid model for the prediction of magnetostriction in power transformers by leveraging the strengths of a data-driven approach and a physics-based model. Specifically, a non-linear physics-based model for magnetostriction as a function of the magnetic field is employed, the parameters of which are estimated as linear combinations of electrical coil measurements and coil dimensions. The model is validated in a practical scenario with coil data from two different suppliers, showing that the proposed approach captures the different magnetostrictive properties of the two suppliers and provides an estimation of magnetostriction in agreement with the measurement system in place. It is argued that the combination of a non-linear physics-based model with few parameters and a linear data-driven model to estimate these parameters is attractive both in terms of model accuracy and because it allows training the data-driven part with comparably small datasets.
Santos Tiago, Schrunner Stefan, Geiger Bernhard, Pfeiler Olivia, Zernig Anja, Kaestner Andre, Kern Roman
2019
Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this paper, we go one step further and analyze test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps. To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: 1) classical image processing and restoration techniques in combination with sophisticated feature engineering and 2) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modeled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results.
Fuchs Alexandra, Geiger Bernhard, Hobisch Elisabeth, Koncar Philipp, Saric Sanja, Scholger Martina
2019
with contributions from Denis Helic and Jacqueline More
Lindstaedt Stefanie , Geiger Bernhard, Pirker Gerhard
2019
Big Data and data-driven modeling are receiving more and more attention in various research disciplines, where they are often considered as universal remedies. Despite their remarkable records of success, in certain cases a purely data-driven approach has proven to be suboptimal or even insufficient.This extended abstract briefly defines the terms Big Data and data-driven modeling and characterizes scenarios in which a strong focus on data has proven to be promising. Furthermore, it explains what progress can be made by fusing concepts from data science and machine learning with current physics-based concepts to form hybrid models, and how these can be applied successfully in the field of engine pre-simulation and engine control.
Geiger Bernhard, Koch Tobias
2019
In 1959, Rényi proposed the information dimension and the d-dimensional entropy to measure the information content of general random variables. This paper proposes a generalization of information dimension to stochastic processes by defining the information dimension rate as the entropy rate of the uniformly quantized stochastic process divided by minus the logarithm of the quantizer step size 1/m in the limit as m → ∞. It is demonstrated that the information dimension rate coincides with the rate-distortion dimension, defined as twice the rate-distortion function R(D) of the stochastic process divided by - log(D) in the limit as D ↓ 0. It is further shown that among all multivariate stationary processes with a given (matrixvalued) spectral distribution function (SDF), the Gaussian process has the largest information dimension rate and the information dimension rate of multivariate stationary Gaussian processes is given by the average rank of the derivative of the SDF. The presented results reveal that the fundamental limits of almost zero-distortion recovery via compressible signal pursuit and almost lossless analog compression are different in general.
Schweimer Christoph, Geiger Bernhard, Suleimenova Diana, Groen Derek, Gfrerer Christine, Pape David, Elsaesser Robert, Kocsis Albert Tihamér, Liszkai B., Horváth Zoltan
2019
Toller Maximilian, Geiger Bernhard, Kern Roman
2019
Distance-based classification is among the most competitive classification methods for time series data. The most critical componentof distance-based classification is the selected distance function.Past research has proposed various different distance metrics ormeasures dedicated to particular aspects of real-world time seriesdata, yet there is an important aspect that has not been considered so far: Robustness against arbitrary data contamination. In thiswork, we propose a novel distance metric that is robust against arbitrarily “bad” contamination and has a worst-case computationalcomplexity of O(n logn). We formally argue why our proposedmetric is robust, and demonstrate in an empirical evaluation thatthe metric yields competitive classification accuracy when appliedin k-Nearest Neighbor time series classification.
Geiger Bernhard
2019
joint work with Tobias Koch, Universidad Carlos III de Madrid
Maritsch Martin, Diana Suleimenova, Geiger Bernhard, Derek Groen
2019
Geiger Bernhard, Schrunner Stefan, Kern Roman
2019
Schrunner and Geiger have contributed equally to this work.
Clemens Bloechl, Rana Ali Amjad, Geiger Bernhard
2019
We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain aggregation. The goal of our cost function is to minimize relevant information loss, hence it connects to the information bottleneck formalism. Moreover, via the connection to Markov aggregation, our cost function is not ad hoc, but inherits its justification from the operational qualities associated with the corresponding Markov aggregation problem. We furthermore show that, for appropriate parameter settings, our cost function is identical to well-known approaches from the literature, such as “Information-Theoretic Co-Clustering” by Dhillon et al. Hence, understanding the influence of this parameter admits a deeper understanding of the relationship between previously proposed information-theoretic cost functions. We highlight some strengths and weaknesses of the cost function for different parameters. We also illustrate the performance of our cost function, optimized with a simple sequential heuristic, on several synthetic and real-world data sets, including the Newsgroup20 and the MovieLens100k data sets.
Geiger Bernhard
2018
This short note presents results about the symmetric Jensen-Shannon divergence between two discrete mixture distributions p1 and p2. Specifically, for i=1,2, pi is the mixture of a common distribution q and a distribution p̃ i with mixture proportion λi. In general, p̃ 1≠p̃ 2 and λ1≠λ2. We provide experimental and theoretical insight to the behavior of the symmetric Jensen-Shannon divergence between p1 and p2 as the mixture proportions or the divergence between p̃ 1 and p̃ 2 change. We also provide insight into scenarios where the supports of the distributions p̃ 1, p̃ 2, and q do not coincide.
Geiger Bernhard
2018
This entry for the 2018 MDPI English Writing Prize has been published as a chapter of "The Global Benefits of Open Research", edited by Martyn Rittman.