DAMSS 2017

DAMSS 2017 was November 30 - December 2, 2017. The challenge of organizing a scientific event on such focused subjects was again completely achieved: DAMSS 2017 had 11 invited expertise lecturers, 11 oral presentations distributed by 4 sessions, 47 poster presentations in a total of 137 participants from 14 countries representing academia, research institutes.

Participants of DAMSS 2017

Here you will find the program of workshop DAMSS 2017.

Proceedings of 9th International Workshop "Data analysis methods for software systems" – DAMSS : Druskininkai, Lithuania, November 30 - December 2, 2017 / Lithuanian Computer Society. Vilnius University Institute of Data Science and Digital Technologies. Lithuanian Academy of Sciences. Druskininkai: Vilnius University, 2017, ISBN: 978-9986-680-64-2,DOI: http://dx.doi.org/10.15388/DAMSS.2017


DAMSS 2017: Invited Speakers

Yaroslav D. Sergeyev is Distinguished Professor at the University of Calabria, Italy (professorship awarded by the Italian Government) and Head of Numerical Calculus Laboratory at the same university. He is also Member of the University International Council and Professor (part-time contract) at Lobachevsky Nizhniy Novgorod State University, Russia, Affiliated Researcher at the Institute of High Performance Computing and Networking of the Italian National Research Council, and Affiliated Faculty at the Center for Applied Optimization, University of Florida, Gainesville, USA. He is Elected President of the International Society of Global Optimization.
His research interests include numerical analysis, global optimization (since 2016 he is Vice-President of the International Society of Global Optimization), infinity computing and calculus, philosophy of computations, set theory, number theory, fractals, parallel computing, and interval analysis. Prof. Sergeyev was awarded several research prizes (Pythagoras International Prize in Mathematics, Italy, 2010; EUROPT Fellow, 2016; Outstanding Achievement Award from the 2015 World Congress in Computer Science, Computer Engineering, and Applied Computing, USA; Honorary Fellowship, the highest distinction of the European Society of Computational Methods in Sciences, Engineering and Technology, 2015; The 2015 Journal of Global Optimization (Springer) Best Paper Award; Lagrange Lecture, Turin University, Italy, 2010; MAIK Prize for the best scientific monograph published in Russian, Moscow, 2008, etc.), the 2017 Khwarizmi International Award. His list of publications contains more than 200 items (among them 5 books). He is a member of editorial boards of 6 international journals and co-editor of 8 special issues. He delivered more than 50 plenary and keynote lectures at prestigious international congresses. He was Chairman of 6 international conferences and a member of Scientific Committees of more than 60 international congresses. He is Coordinator of numerous national and international research and educational projects. Software developed under his supervision is used in more than 40 countries of the world. Numerous magazines, newspapers, TV and radio channels have dedicated a lot of space to his research.

Talk title: Lipschitz Global Optimization

Abstract: Global optimization is a thriving branch of applied mathematics and an extensive literature is dedicated to it. In this lecture, the global optimization problem of a multidimensional function satisfying the Lipschitz condition over a hyperinterval with an unknown Lipschitz constant is considered. It is supposed that the objective function can be “black box”, multiextremal, and non-differentiable. It is also assumed that evaluation of the objective function at a point is a time-consuming operation. Many algorithms for solving this problem have been discussed in literature. They can be distinguished, for example, by the way of obtaining information about the Lipschitz constant and by the strategy of exploration of the search domain. Different exploration techniques based on various adaptive partition strategies are analyzed. The main attention is dedicated to two types of algorithms. The first of them is based on using space-filling curves in global optimization. A family of derivative-free numerical algorithms applying space-filling curves to reduce the dimensionality of the global optimization problem is discussed. A number of unconventional ideas, such as adaptive strategies for estimating Lipschitz constant, balancing global and local information to accelerate the search, etc. are presented. Diagonal global optimization algorithms is the second type of methods under consideration. They have a number of attractive theoretical properties and have proved to be efficient in solving applied problems. In these algorithms, the search hyperinterval is adaptively partitioned into smaller hyperintervals and the objective function is evaluated only at two vertices corresponding to the main diagonal of the generated hyperintervals. It is demonstrated that the traditional diagonal partition strategies do not fulfil the requirements of computational efficiency because of executing many redundant evaluations of the objective function.
A new adaptive diagonal partition strategy that allows one to avoid such computational redundancy is described. Some powerful multidimensional global optimization algorithms based on the new strategy are introduced. Extensive numerical experiments are performed on the GKLS-generator that is used nowadays in more than 40 countries in the world to test numerical methods.
Results of the tests demonstrate that proposed methods outperform their competitors in terms of both number of trials of the objective function and qualitative analysis of the search domain, which is characterized by the number of generated hyperintervals. A number of possible generalizations to problems with multiextremal partially generated constraints is mentioned. The usage of parallel computations and problems with multiextremal constraints are discussed briefly and theoretical results on the possible speed-up are presented.

Prof. Dr. Bożena Kostek
Bożena Kostek holds professorship at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology (GUT), Poland. Prof. Kostek is a corresponding member of the Polish Academy of Sciences. She is a Fellow of the Audio Engineering Society. She is Head of the Audio Acoustics Laboratory. She led a number of research projects sponsored by the Polish Ministry of Science and Education, also Polish National Centre for R&D and National Science Centre (39 projects). She also took part in the 6th FP and 7th Framework Program projects. Her research activities are interdisciplinary, however the main research interests focus on cognitive bases of hearing and vision, music information retrieval, musical acoustics, studio technology, Quality-of-Experience, human-computer-interaction (HCI) as well as applications of soft computing and computational intelligence to the mentioned domains. She also authored or coauthored some technological solutions (40). Prof. Kostek has published and presented more than 600 scientific papers in journals (more than 120 articles), chapters in books and at conferences, she also received several patents. She has got more than 1850 citations to date. In 1999, she published a book entitled “Soft Computing in Acoustics“, and later in 2002 she co-authored a book devoted to “Computer technology applications to audiology and speech therapy”. In 2005 she published another monograph entitled “Perception-based Data Processing in Acoustics“. Prof. Kostek was also one of the Editors of the first volume of the Transactions of Rough Sets and Guest Editor J. Intelligent Information Systems.

Talk title: Rough Sets Applied to Music Informatics

Abstract: In this presentation music data processing and mining in large databases is investigated based on soft computing methods. First, principles of rule-based classifiers and particularly rough sets are presented, showing their usability in music informatics. Several examples of music processing are shown, including music genre/mood classification, automatic music collection tagging, personal recommendation, composing a playlist, etc. Next, for the purpose of this research study a large number of 30000 audio files divided into different music genres/music mood were gathered to form a database. All files contained in this database were parametrized and resulted in a feature vector of 173 parameters. To reduce the dimensionality of data the correlation analysis was performed. This was then compared to the rough set-based processing of the same feature vectors as such an algorithm produced reducts containing the most promising descriptors in the context of music genre/mood recognition. Classification tests were conducted using the Rough Set Exploration System (RSES), a toolset for analyzing data with the use of methods based on the rough set theory as well as in the WEKA environment with the use of k-Nearest Neighbors (kNN), Bayesian Network (Net) and Sequential Minimal Optimization (SMO) algorithms. All results were analyzed in terms of the recognition rate and computation time efficiency. In conclusion, a potential of rough set-based approach when applied to music informatics was underlined as it offers the possibility to deal with imprecise, vague and indiscernible data objects.

M. Heričko is a full-time professor at the Institute of Informatics. He is the head of the Information Systems Laboratory and head of the Institute of Informatics. He received his PhD in Computer Science from University of Maribor in 1998. He is main research interests include all aspects of information systems development, software and service engineering, agile methods, process frameworks, software metrics, functional size measurement, SOA, component-based development, object-orientation, software reuse and software patterns. Dr. M. Heričko has been a project or work co-ordinator in several applied projects, project or work co-ordinator in several international research projects and committee member and chair of several international conferences.

Talk title: Challenges in Metric-based Identification of Critical Software Components

Abstract: Knowing the threshold of reliable software metrics can contribute to product quality evaluation and, consequently, increase the usefulness of software metrics in practice. In order to deliver a software product with the required quality, it is crucial to address and manage the quality assurance domain properly. Software metrics can be used as a reflection of the qualitative characteristics of software components in a quantitative way, presenting a control instrument in the software development and maintenance process. Software metrics assess software from different views, but overall, reflect the internal quality of software systems. The usefulness of metrics without knowing their reference values is very limited, due mainly to interpretation difficulties. To overcome the above-mentioned difficulties it is important that reliable reference/threshold values of software metrics are available. Thresholds are heuristic values that are used to set ranges for desirable and undesirable metric values for measured software and, furthermore, used to identify anomalies, which may be an actual problem. Threshold values explain if a metric value is in the normal range and, consequently, provide a scale for paying attention to components that exceed the threshold. There are many approaches available to compute threshold values. In this presentation we are going to address some challenges related to threshold derivation approaches and their application on practical projects, i.e. finding a representative benchmark data; addressing different statistical properties of software metrics; using a suitable statistical approach for threshold derivation; applying the metric threshold values in practice to different code context and maximizing the intersection between the sets of detected code-smells within different tools.

Kaisa Miettinen is Professor of Industrial Optimization and vice-rector of the University of Jyvaskyla. Her research interests include theory, methods, applications and software of nonlinear multiobjective optimization including interactive and evolutionary approaches and she heads the Research Group on Industrial Optimization. She has authored over 150 refereed journal, proceedings and collection papers, edited 13 proceedings, collections and special issues and written a monograph Nonlinear Multiobjective Optimization. She is a member of the Finnish Academy of Science and Letters, Section of Science and the Immediate-Past President of the International Society on Multiple Criteria Decision Making. She belongs to the editorial boards of five international journals and the Steering Committee of Evolutionary Multi-Criterion Optimization. She has worked at IIASA, International Institute for Applied Systems Analysis in Austria, KTH Royal Institute of Technology in Stockholm, Sweden and at Helsinki School of Economics in Finland. In July 2017, she received the Georg Cantor Award of the International Society on Multiple Criteria Decision Making for independent inquiry in developing innovative ideas in the theory and methodology of MCDM. At the University of Jyväskylä, she heads the profiling area called Decision Analytics utilizing Causal Models and Multiobjective Optimization.

Talk title: Data-driven Decision Support with Multiobjective Optimization

Abstract: Thanks to digitalization, we can collect and have access to various types of data and the question of how to make the most of the data arises. We can use descriptive or predictive analytics but to make recommendations based on the data, we need prescriptive or decision analytics. If the decision problems derived contain multiple conflicting objectives, we must employ methods of multiobjective optimization. Lot sizing is an example of such a data-driven optimization problem. Lot sizing is important in production planning and inventory management where a decision maker needs support in particular when the demand is stochastic. We propose a problem formulation with three objectives and solve it with interactive multiobjective optimization methods. In interactive methods, a decision maker directs the search for the best balance between the conflicting objectives by providing preference information. In this way, (s)he can learn about what kind of solutions are available for the problem and also learn about the feasibility of one's preferences. We consider the lot sizing problem of a Finnish production company. The results of this data-driven interactive multiobjective optimization approach are encouraging.

Mario Cannataro is a full professor of computer engineering at the University "Magna Graecia" of Catanzaro, Italy. He is the Director of the Data Analytics research centre of University of Catanzaro. His current research interests include bioinformatics, parallel and distributed computing, data mining, problem solving environments, and medical informatics. He is a Member of the editorial boards of Briefings in Bioinformatics, Encyclopedia of Bioinformatics and Computational Biology, Encyclopedia of Systems Biology. He was guest editor of several special issues on bioinformatics and he is serving as a program committee member of several conferences. He published three books and more than 200 papers in international journals and conference proceedings. Prof. Cannataro is a Senior Member of IEEE, ACM and BITS (Bioinformatics Italian Society), and a member of the Board of Directors of ACM Special Interest Group on Bioinformatics, Computational Biology, and Biomedical Informatics (SIGBio).

Talk title: High Performance Management and Analysis of Omics Data: Experiences at University Magna Graecia of Catanzaro

Abstract: Genomics, proteomics, and interactomics refer to the study of the genome, and interactome of an organism. Such omics disciplines are gaining an increasing interest in the scientific community due to the availability of novel, platforms for the investigation of the cell machinery, such as mass spectrometry, microarray, next generation sequencing, that are producing an overwhelming amount of experimental omics data.
On the other hand, the large volumes of omics data new challenges both for the efficient storage and integration of the data and for their efficient preprocessing and analysis. Moreover, both raw experimental data and derived information extracted by raw data are more and more stored in various databases spread all over the Internet, not fully integrated.
Thus, managing omics data requires both support and spaces for data storing as well as procedures and structures for data preprocessing, analysis, and sharing. The resulting scenario comprises a set of methodologies and bioinformatics tools, often implemented as services, for the management and analysis of omics data stored locally or in geographically distributed biological databases.
The talk describes some parallel and distributed bioinformatics tools for the preprocessing and analysis of genomics, and interactomics data, developed at the Bioinformatics Laboratory of the University Magna Graecia of Catanzaro. Tools for gene expression and genotyping (SNP) data analysis (e.g. micro-CS, DMET-Analyzer, DMET-Miner, OSAnalyzer, coreSNP) as well as for functional enrichment analysis (e.g. GO-WAR, GoD) will be briefly underlined.

Professor Fokianos is a professor at the Department of Mathematics & Statistics, University of Cyprus. He first obtained a B.Sc. Degree in Mathematics form University of Ioannina, Greece, and subsequently he received an M.A. and a Ph.D. in Statistics, from University of Maryland at College Park, USA. He was a visiting Assistant Professor of Statistics at The Ohio State University, USA, for period of 2.5 years and has international collaborations with many institutions all over Europe and USA. H His research interests are focused on the analysis and methodology for time series data and in semiparametric models. In addition, he has published several methodological papers in science. In particular, he has published extensively over the last years methodological papers for the analysis of integer valued time series. He is co-author, with B. Kedem, of the book Regression Models for time Series Analysis published by Wiley, 2002. He has co-edited two volumes and he is the author of around 60 peer-reviewed articles. He is an elected member of the International Statistical Institute since 2005 and an Associate Editor of Statistics and Probability Letters, Journal of Time Series Analysis and Statistics. He has also served as an Associate Editor for the Journal of Environmental Statistics and Computational Statistics & Data Analysis.

Talk title: Tests of Independence Based on Multivariate Distance Correlation Matrix

Abstract: We introduce the notions of multivariate auto-distance covariance and correlation functions for time series analysis.
These concepts have been recently discussed in the context of independent and time series data but we extend them in a different direction by putting forward their matrix version.We discuss their interpretation and we give consistent estimators for practical implementation. Additionally, we develop a test for testing the iid hypothesis for multivariate time series data. The proposed test statistic performs better than the standard multivariate version of Ljung-Box test statistic. Several computational aspects are discussed and some data examples are provided for illustration of the methodology.

Associate Professor of Brno University of Technology, Czech Republic. His interest are focused on retinal imaging and image processing, perfusion imaging, image registration, image segmentation, classification and pattern recognition. He is author or co-author of 14 papers in impacted journals, 12 paper in peer-reviewed journal, more than 60 conference papers, one chapter in book and two study textbooks.

Talk title: Retinal Imaging and Image Processing for Glaucoma Diagnosis

Abstract: Retinal imaging is still a developing field in ophthalmology. Mainly, optical coherence tomography had changed the quality of retinal imaging. Nevertheless, other new modalities are also important, because they are able to image the functional properties of the retinal tissue. The progress in this research area helps to understand many retinal diseases, including glaucoma, which is still not well understood. This talk will briefly discuss imaging techniques for glaucoma diagnosis and image processing techniques used in this area. The main part will describe our current research activities with Department of Ophthalmology, Friedrich-Alexander-University Erlangen–Nürnberg, the possibilities of video-ophthalmoscopy and parallel retinal imaging with the focus on functional aspects of this modality.

Dr. Michael Emmerich is Associate Professor at LIACS, Leiden University, and leader of the Multicriteria Optimization and Decision Analysis research group. He was born in 1973 in Coesfeld (Germany) and received his doctorate in 2005 from Dortmund University (promoter: H.-P. Schwefel). He carried out projects as a researcher at ICD e.V. (Germany), IST Lisbon, University of the Algarve (Portugal), ACCESS Material Science e.V. (Germany), and the FOM/AMOLF institute on Fundamental Science of Matter (Netherlands). He is known for pioneering work on model-assisted and indicator-based multiobjective optimization, and has edited four books, and co-authored more than 120 research papers and articles in multicriteria optimization algorithms and their application in computational chemistry and engineering. Moreover, he is organizer of more than four international conferences & Lorentz center workshops on optimization and decision support algorithms and their applications.

Talk title: Bayesian Approaches to Multiobjective Optimization and Decision Analysis and its Integration into Industry 4.0 Frameworks

Abstract: The talk deals with the utilization of big and small data sets on evaluations of systems for the purpose of finding improvements and making better decisions. Decision making is typically based on multiple objectives and the goal is to find solutions that are efficient (avoiding lose lose scenarios) and that represent interesting alternatives. In systems optimization, control parameters and configurations are searched for that yield improvements with respect to the current state. As an example we will look at industrial processes from steel production, chemical production processes, and biogas plant control, and discuss how to find optimal process setups and control strategies with respect to multiple objectives (robustness, average performance, environmental impact, etc.). One of the challenges within the so-called "Industry 4.0" frameworks is to integrate advanced measurement data and simulation data in decision support systems. Systems optimization forms an important ingredient of this. Techniques from Bayesian global optimization, that were originated in Lithuania in the 70ties, appear to offer mathematically rigorous, yet flexible, methodologies. The talk will focus on new developments of such techniques and their generalization to big data sets and multiobjective optimization, which makes them viable components in "Industry 4.0" solutions.

Researcher at the Centre for Biomedical Technology, Universidad Politécnica de Madrid, graduated in Aeronautical Management at the Universidad Autónoma de Madrid and PhD from Universida de Nova de Lisboa. With more than 100 published peer-reviewed contributions in international conferences and journals, he has vast experience in complex systems and data mining research. His main topics of interest are Complex Networks and Data Science, both from a theoretical perspective and through their application to several real-world problems. His contributions include the first research works in which complex networks and data mining algorithms have been combined for the study of biomedical problems, specifically the creation of diagnostic tools for Mild Cognitive Impairment and Alzheimer's diseases. He is a member of the editorial team of Nature Scientific Reports, the European Journal of Social Behaviour, PeerJ and PeerJ Computer Science. He is also the National Representative for the CA15120 Cost Action "Open Multiscale Systems Medicine", and leader of one of its work groups.

Talk title: How Can Statistical Physics Help in Data Mining Tasks?

Abstract: The statistical physics concept of complex network shares many characteristics with data mining, more that what may prima facie appear. Not only do both share the same general goal, that of extracting information from data to ultimately create compact and quantifiable representations; but they also often address similar problems too. If these two scientific fields have mostly walked separated paths, these are now starting to converge. This talk will shortly review the concepts and hypotheses underlying both approaches, and how they have historically been used to perform different data-related tasks. We will additionally discuss how complex networks and data mining are expected to interact in the future, with a special focus on the emerging field of systems medicine.