Lovric Mario
2018
The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules. Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102. https://pubchem.ncbi.nlm.nih.gov/ RDKit: Open-source cheminformatics; http://www.rdkit.org
Lovric Mario, Stipaničev Draženka , Repec Siniša , Malev Olga , Klobučar Göran
2018
Lovric Mario, Krebs Sarah, Cemernek David, Kern Roman
2018
The use of big data technologies has a deep impact on today’s research (Tetko et al., 2016) and industry (Li et al., n.d.), but also on public health (Khoury and Ioannidis, 2014) and economy (Einav and Levin, 2014). These technologies are particularly important for manufacturing sites, where complex processes are coupled with large amounts of data, for example in chemical and steel industry. This data originates from sensors, processes. and quality-testing. Typical application of these technologies is related to predictive maintenance and optimisation of production processes. Media makes the term “big data” a hot buzzword without going to deep into the topic. We noted a lack in user’s understanding of the technologies and techniques behind it, making the application of such technologies challenging. In practice the data is often unstructured (Gandomi and Haider, 2015) and a lot of resources are devoted to cleaning and preparation, but also to understanding causalities and relevance among features. The latter one requires domain knowledge, making big data projects not only challenging from a technical perspective, but also from a communication perspective. Therefore, there is a need to rethink the big data concept among researchers and manufacturing experts including topics like data quality, knowledge exchange and technology required. The scope of this presentation is to present the main pitfalls in applying big data technologies amongst users from industry, explain scaling principles in big data projects, and demonstrate common challenges in an industrial big data project
Lovric Mario
2018
Today's data amount is significantly increasing. A strong buzzword in research nowadays is big data.Therefore the chemistry student has to be well prepared for the upcoming age where he does not only rule the laboratories but is a modeler and data scientist as well. This tutorial covers the very basics of molecular modeling and data handling with the use of Python and Jupyter Notebook. It is the first in a series aiming to cover the relevant topics in machine learning, QSAR and molecular modeling, as well as the basics of Python programming
Babić Sanja, Barišić Josip, Stipaničev Draženka, Repec Siniša, Lovric Mario, Malev Olga, Čož-Rakovac Rozalindra, Klobučar GIV
2018
Quantitative chemical analyses of 428 organic contaminants (OCs) confirmed the presence of 313 OCs in the sediment extracts from river Sava, Croatia. Pharmaceuticals were present in higher concentration than pesticides thus confirming their increasing threat to freshwater ecosystems. Toxicity evaluation of the sediment extracts from four locations (Jesenice, Rugvica, Galdovo and Lukavec) using zebrafish embryotoxicity test (ZET) accompanied with semi-quantitative histopathological analyses exhibited good correlation with cumulative number and concentrations of OCs at investigated sites (10,048.6, 15,222.8, 1,247.6, and 9,130.5 ng/g respectively) and proved its role as a good indicator of toxic potential of complex contaminant mixtures. Toxicity prediction of sediment extracts and sediment was assessed using Toxic unit (TU) approach and PBT (persistence, bioaccumulation and toxicity) ranking. Also, prior-knowledge informed chemical-gene interaction models were generated and graph mining approaches used to identify OCs and genes most likely to be influential in these mixtures. Predicted toxicity of sediment extracts (TUext) for sampled locations was similar to the results obtained by ZET and associated histopathology resulting in Rugvica sediment as being the most toxic, followed by Jesenice, Lukavec and Galdovo. Sediment TU (TUsed) favoured OCs with low octanol-water partition coefficient like herbicide glyphosate and antibiotics ciprofloxacin and sulfamethazine thus indicating locations containing higher concentrations of these OCs (Galdovo and Rugvica) as most toxic. Results suggest that comprehensive in silico sediment toxicity predictions advocate providing equal attention to organic contaminants with either very low or very high log Kow