Hier finden Sie von Know-Center MitarbeiterInnen verfasste wissenschaftliche Publikationen


Lovric Mario, Molero Perez Jose Manuel, Kern Roman

PySpark and RDKit: moving towards Big Data in QSAR

Molecular Informatics, Wiley, 2019

The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark‐RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low‐end workstations

Babić Sanja, Barišić Josip, Stipaničev Draženka, Repec Siniša, Lovric Mario, Malev Olga, Čož-Rakovac Rozalindra, Klobučar GIV

Assessment of river sediment toxicity: Combining empirical zebrafish embryotoxicity testing with in silico toxicity characterization

Science of the Total Environment, Elsevier, 2018

Quantitative chemical analyses of 428 organic contaminants (OCs) confirmed the presence of 313 OCs in the sediment extracts from river Sava, Croatia. Pharmaceuticals were present in higher concentration than pesticides thus confirming their increasing threat to freshwater ecosystems. Toxicity evaluation of the sediment extracts from four locations (Jesenice, Rugvica, Galdovo and Lukavec) using zebrafish embryotoxicity test (ZET) accompanied with semi-quantitative histopathological analyses exhibited good correlation with cumulative number and concentrations of OCs at investigated sites (10,048.6, 15,222.8, 1,247.6, and 9,130.5 ng/g respectively) and proved its role as a good indicator of toxic potential of complex contaminant mixtures. Toxicity prediction of sediment extracts and sediment was assessed using Toxic unit (TU) approach and PBT (persistence, bioaccumulation and toxicity) ranking. Also, prior-knowledge informed chemical-gene interaction models were generated and graph mining approaches used to identify OCs and genes most likely to be influential in these mixtures. Predicted toxicity of sediment extracts (TUext) for sampled locations was similar to the results obtained by ZET and associated histopathology resulting in Rugvica sediment as being the most toxic, followed by Jesenice, Lukavec and Galdovo. Sediment TU (TUsed) favoured OCs with low octanol-water partition coefficient like herbicide glyphosate and antibiotics ciprofloxacin and sulfamethazine thus indicating locations containing higher concentrations of these OCs (Galdovo and Rugvica) as most toxic. Results suggest that comprehensive in silico sediment toxicity predictions advocate providing equal attention to organic contaminants with either very low or very high log Kow

Lovric Mario

Molecular modeling of the quantitative structure activity relationship in Python – a tutorial (part I)

Journal of Chemists and Chemical Engineers, Croatian Society of Chemical Engineers, Zagreb, 2018

Today's data amount is significantly increasing. A strong buzzword in research nowadays is big data.Therefore the chemistry student has to be well prepared for the upcoming age where he does not only rule the laboratories but is a modeler and data scientist as well. This tutorial covers the very basics of molecular modeling and data handling with the use of Python and Jupyter Notebook. It is the first in a series aiming to cover the relevant topics in machine learning, QSAR and molecular modeling, as well as the basics of Python programming

Lovric Mario, Krebs Sarah, Cemernek David, Kern Roman


XII Meeting of Young Chemical Engineers, Zagreb, Kroatien, 2018

The use of big data technologies has a deep impact on today’s research (Tetko et al., 2016) and industry (Li et al., n.d.), but also on public health (Khoury and Ioannidis, 2014) and economy (Einav and Levin, 2014). These technologies are particularly important for manufacturing sites, where complex processes are coupled with large amounts of data, for example in chemical and steel industry. This data originates from sensors, processes. and quality-testing. Typical application of these technologies is related to predictive maintenance and optimisation of production processes. Media makes the term “big data” a hot buzzword without going to deep into the topic. We noted a lack in user’s understanding of the technologies and techniques behind it, making the application of such technologies challenging. In practice the data is often unstructured (Gandomi and Haider, 2015) and a lot of resources are devoted to cleaning and preparation, but also to understanding causalities and relevance among features. The latter one requires domain knowledge, making big data projects not only challenging from a technical perspective, but also from a communication perspective. Therefore, there is a need to rethink the big data concept among researchers and manufacturing experts including topics like data quality, knowledge exchange and technology required. The scope of this presentation is to present the main pitfalls in applying big data technologies amongst users from industry, explain scaling principles in big data projects, and demonstrate common challenges in an industrial big data project

Lovric Mario, Stipaničev Draženka , Repec Siniša , Malev Olga , Klobučar Göran

Combined toxic unit: Moving towards a multipath risk assessment strategy of organic contaminants in river sediment

The International Water Association, 2018


Lovric Mario

Chemical outlier dataset

Zenodo, 2018

The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules. Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102. RDKit: Open-source cheminformatics;
Kontakt Karriere

Hiermit erkläre ich ausdrücklich meine Einwilligung zum Einsatz und zur Speicherung von Cookies. Weiter Informationen finden sich unter Datenschutzerklärung

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.