29 September 2019 to 3 October 2019
Montenegro, Budva, Becici
Europe/Moscow timezone

Abstracts of lectures, tutorials

Abstracts of lectures, tutorials


Lecture. Distributed computing and Big Data

Lecturer: Korenkov V.V. 

    The experiments at the Large Hadron Collider (LHC) at CERN (Geneva, Switzerland) played a leading role in scientific research not only in elementary particle physics and nuclear physics but also in the field of Big Data Analytics. Global distributed system for processing, storage and analyzing data WLCG (Worldwide LHC Computing GRID) brings together the resources of about 180 computer centers in 50 countries, the total storage capacity is more than 1 Exabytes. Data processing and analysis are carried out using high-performance complexes (Grid), academic, national and commercial resources of cloud computing, supercomputers and other resources. JINR is actively involved in the integration of distributed heterogeneous resources and the development of Big data technologies to provide modern megaprojects in such high-intensity fields of science. JINR is actively working on the construction of a unique NICA accelerator complex, which requires new approaches to the implementation of distributed infrastructure for processing and analysis of experimental data.
   The report provides an overview of major integrated infrastructures to support mega-projects and trends in their evolution. The report also presents the main results of the Laboratory of Information Technologies Joint Institute for Nuclear Research (JINR) in the development of distributed computing.
    A brief overview of the projects in the field of the development of distributed
computations performed by LIT JINR in Russia, CERN, the USA, China, JINR Member States. The analysis works on the integration of technology HPC, grid, cloud, BigData for large international projects.


Lecture. Deep learning applications in experimental High Energy and Nuclear Physics

Lecturer: Ososkov G.A.

    After pointing out the relationship between machine and deep learning, the basics of artificial neural networks from one artificial neuron to deep neural networks are described. A brief introduction to the various types of deep neural networks and the problems of their learning is given. The important computational problems arising in the training of deep neural networks are highlighted: the curse of dimensionality, overtraining, getting stuck in a local minimum, a vanishing or explosive gradient, and methods for solving them. The presentation is carried out on examples of problems that arise when processing data in experimental high-energy physics.


Tutorial. Development with Python

Lecturer: Pelevanuk I.S.

    In this tutorial, we will talk about tools and approaches which can make a developer's life easier. Testing, linting, continuous integration, managing packets, environments and more.


Tutorial. Advanced Python for data analysis

Lecturer: Kadochnikov I.S. 

    Practical stages of working with data in Python. The Pandas framework, tools for data profiling, cleaning, analysis and visualization. Preparing data for machine learning, feature extraction.


Tutorial. SDN Application Development

Lecturers: Stepanov E.P. , Antonenko V.A. 

RuNOS School

Goals
    • To get acquainted with Software Defined Networking (SDN), to learn the basics and main protocols in SDN;
    • To get acquainted with SDN controller RuNOS, understand the difference between core and user application for RuNOS, understand the application dependencies;
    • To learn how to develop a simple application for SDN controller RuNOS, to learn how the application life-cycle is organized. 
Workflow
    1. Deploying the network topology with “mininet” tool (introduction to SDN, OpenVSwitch, mininet tool, topology library in mininet (single switch, linear, tree)).
    2. Deploying SDN controller RuNOS (introduction to installation process based on “nix”, application template based on “conan”, deploying the basic version of SDN controller RuNOS).
    3. Learning the OpenFlow protocol, understanding the protocol details with the Wireshark tool (introduction in network managing approaches in SDN, OpenFlow protocol, Wireshark tool).
    4. Developing user application “LoadBalancer” for SDN controller RuNOS (introduction to main core application of SDN controller RuNOS, coding the application for traffic balancing in SDN).
    5. Delivering the developed application for SDN controller RuNOS via “conan” (introduction to packet management system for SDN controller RuNOS, adding packet on allocation server, packet delivering example)
    6. Developing the application which has a dependency on another application (coding the “HostManager” application, which will depend on information on the “Learning Switch” application).
Listeners Requirements
    1. Medium C++ skills.
    2. Basic level in the computer networking area.
    3. Basic level experience in GNU/Linux.


Lecture. Use of advanced computer science technologies for quasi-online data processing and primary analysis in the pipe-line approach – on example of experiments on EU-XFEL and CryoEM in structural biology.

Lecturer: Ilyin V.A.

    Modern scientific experiments often require the use of serious computer resources and appropriate software to conduct effective processing and analysis in these experiments. In the case of megascience projects, such as LHC, the problem was solved by creating "heavy" (and therefore expensive ...) computer infrastructures, such as grid/cloud/... . For smaller projects, this approach is not justified. Such projects requires new solutions such as pipe-line, which became possible as a result of the rapid development of computer science in recent years, and which directly meet the requirements of this type of experiments: 1) data processing and primary analysis in quasi-online mode, 2) implementation of existing and newly created application software for individual stages of processing/analysis, 3) the use of modern zoo of computer equipment from servers to supercomputers, 4) "easy" use of pipe-line software by inexperienced users, 5) ... . In the lecture, these issues will be discussed on the example of experiments in the field of structural biology (creation of 3D structure of biomolecules - viruses, for example), rapidly developing in recent decades. Application of modern achievements of computer science for creation of pipeline solutions for experiments on EU-XFEL and CryoEM will be discussed.


Tutorial. Application of computer vision methods and approaches for solving applied problems

Lecturers: Stadnik A.V., Streltsova O.I.

    The tutorial is devoted to the analysis of the applied problem of detecting custom objects in images or in a video stream using the deep neural network of the YOLOv3 single-pass architecture.
    The process of acquiring and preparing data to form a training sample and configure the neural network, the process of training, its peculiarities and the validation of results will be considered in the framework of the tutorial. Special attention will be paid to techniques for extending the sample to obtain consistent results of detection.
    The tutorial will be held on the basis of the HybriLIT heterogeneous computing platform within the created ecosystem for tasks of machine and deep learning.


Tutorial. Parallel and HPC programming 

Lecturers: Sivkov D. , Intel. 

 


Tutorial. Agent Global Web data aggregation

Lecturer: Inkina V.A. 

   The training is dedicated to collecting information from various information resources using agent technologies. Agent technologies are those of using special autonomous programs (agents) in solving various problems. The use of agent technologies allows you to distribute the functional stress among the individual constituent elements - agents, which can provide high stability, flexibility and efficiency in data collection and processing. During the training additional tools Xpath, Selenium and program modules will be considered: request, lxml, bs4, openpyxl, which simplify the development of the agent. Audience members will be offered an information resource for the implementation of agent gathering together with the trainer. The result of the training is a developed software agent for collecting information from the global Internet in the Python programming language.


Tutorial. Big Data technologies

Lecturer: Kadochnikov I.S. 

    Introduction to popular generic data formats: CSV, json, avro; schemas and schema inference. Using Kafka message queues and NiFi processors to organize stream processing. Managing the data pipeline from the web spider to the NoSQL database using docker containers.