The what, where and when of the space-time cubes

Daniela Ballari

Since the 1970s, the availability of earth observation data has steadily increased. For instance, the well-known series of Landsat missions (among the most prominent, 1 in 1972, 5 in 1984, 7 in 1999, 8 in 2013 and 9 in 2021); MODIS (Terra in 1999 and Aqua in 2002); Aster (1999); Quickbird (2001); and the Copernicus program with its 5 Sentinels missions (in orbit from 2015). Furthermore, images captured under demand by drones also increase this list. Therefore, in the last two decades, not only the available satellites for monitoring the earth’s physical, climate and environmental variables have increased, but also the periodicity, amount and details of such images have also augmented and improved. In this context, the geosciences have not been excluded from these transformations. New challenges and opportunities have emerged for big, multi-dimensional and multi-temporal data storage and analysis. This talk presents “The what, where and when of space-time cubes” and focuses on describing the spatial and temporal dimensions of the data structure known as space-time cubes. First, its dimensions will be conceptualized using the rubik’s cube metaphor; second, applications of satellite images' time series for climate and environmental variables will be shown; finally, software tools implementing spatio-temporal cubes will be presented.

Bio

Daniela Ballari is land surveyor by National University of Córdoba in Argentina (2014), and obtained her Ph.D by Wageningen University in the Netherlands (2012). Currently, she is Professor at University of Azuay (Cuenca, Ecuador). Her research deals with geostatistical and spatio-temporal methods for climate and environmental studies. She is passionate about spatial optimization, open geo-data, in-situ and remote data integration, as well as cartographic and scientific communication. She has been involved in academic and research activities in several countries such as Argentina, Ecuador, Spain, Netherlands, Cuba, Panama and Colombia. Recent publications are available at: https://orcid.org/0000-0002-6926-4827.

Real-time detection & nucleation of small earthquakes in Stable South America

Marcelo Bianchi

Monitoring earthquakes is the most basic task in seismology. The Brazilian Seismographic Network is a continental-wide network with more than 90 stations that are used to monitor Brazilian seismicity. While the network is a revolution in terms of available data most of the events in Brazil continue to be declared by visually analyzing day-plots records. We lack well-defined parameters and even, adapted procedures to search for events recorded by the network or a subset of it. The traditional process of declaring a new earthquake involves the detection of seismic phases arrival times, mainly P- and S-wave on continuous records, that later must be sorted and finally grouped to nucleate possible events while false detections must be discarded. The most simple and used detection algorithm is the STA/LTA method that returns detection times from continuous records. The nucleation is a delicate process that normally works well on the global scale, or just when detection times of different events and phases are not mixed in time. Since earthquake position and origin time are not known, it is impossible to compute theoretical travel times at this stage. The information available is restricted to station coordinates and detection times. In some cases, instead of using detection time, the full detection waveform can be used. In that case, events must be nucleated directly from waveforms instead of considering the individual detection times.

Bio

Marcelo Bianchi is currently a professor at the Instituto de Astronomia, Geofísica e Ciências Atmosféricas from Universidade de São Paulo, where he develops research in the area of seismology, basically with the development of methods and procedures for detecting seismic events and studies of crust and mantle structure. He was also during the years of 2009-2010 a research scientist working for Schlumberger Stavanger Research Center in Norway and later a researcher inside the GEOFON group in Potsdam/Germany. He has experience with different programming languages, developing from small processing scripts to intermediate size, complex object-oriented programs to manage and process seismological data.

Scalable Machine Learning

Mustapha Lebbah

This course will provide students with an introduction to scalable Machine Learning. I will describe recent work on scalable clustering. By this course, I would like to teach students that the construction of scalable models is not necessarily associated with strictly computer engineering. The traditional steps of modeling and estimation remain essential. The course will be followed by a short Lab session using spark-notebook.io.

Bio

Mustapha Lebbah is currently Associate Professor at the University of Paris 13 and a member of Machine learning Team A3, LIPN. His main researches are centered on machine learning and data mining (Unsupervised Learning, Self-organizing map, Probabilistic and Statistic, scalable machine learning and data science). Graduated from USTO University where he received his engineer diploma in 1998. Thereafter, he gained an MSC (DEA) in Artificial Intelligence from the Paris 13 University in 1999. In 2003, after three year in RENAULT R&D, he received his PhD degree in Computer Science from the University of Versailles. He received the “Habilitation à Diriger des Recherches” (accreditation to lead research) degree in Computer Science from Paris 13 University in 2012. He is a member of the french group in “complex data mining”, and Secretary for the French Classification Society since november 2012.

Automation of Machine Learning

Aurora Pozo

The use of computers to solve complex problems has made significant progress in recent years and an ever-growing number of disciplines rely on it. However, this progress weighty depends on human experts to perform manual tasks such as hyperparameter optimization, neural architecture search, and dynamic algorithm configuration. As the complexity of these tasks is often beyond non-experts, a demand for off-the-shelf methods that can be used easily and without expert knowledge has emerged. This research area that targets progressive automation of methods is called Automation machine learning. Among those algorithms, deep learning, particularly deep convolutional neural networks have shown very good success and attracted attention from industry people and researchers in computer vision and image processing, neural networks, and machine learning. Our approach relies on evolutionary computation techniques (ES). ES started playing a significant role in automatically determining deep structures, transfer functions, and parameters to tackle image classification tasks, and have great potential to advance the developments of deep structures and algorithms. This talk will provide an extended view of deep learning, overview the state-of-the-art evolutionary deep learning using Genetic Programming (GP) to automatically evolving deep structures. Applications will be described on Image classification, image segmentation, and natural language processing.

Bio

Aurora Pozo is currently a Professor of Computer Science at Universidade Federal do Paraná, where she heads the interdisciplinary Evolutionary Computation Research Group. Her scientific contributions are in the area of ​​computational intelligence, mainly focused on machine learning, and particularly in evolutionary computation and learning (using genetic programming, particle swarm optimization, and learning classifier systems), big dimensionality reduction, computer vision, and image processing, multi-objective optimization, and evolutionary deep learning. In addition to technical contributions in the area of ​​computational intelligence, Aurora works to consolidate the area in Brazil, actively participating in events related to the area (BRACIS, ENIAC, CTDIAC). Having already organized the events and coordinated the BRACIS program committee. The bidder has coordinated CNPq Universal and Fundação Araucária projects in the area of ​​bioinspired computing and applications. The projects have the collaboration of professors from different Universities in the State of Paraná, such as the State University of Ponta Grossa and the State University of the Center-West. Internationally, she coordinated a project MCT/CAPES/CNPq PVE 400125/2014-5 which allowed fruitful cooperation with the University of the Vasco Country/Spain. She has participated as a scientific editor in special issues in the journal Natural Computing (2018), Neurocomputing (2015), in the Journal of the Brazilian Computer Society (JBCS 2015). She has also participated as a reviewer for different international scientific journals such as IEEE Transaction on Evolutionary Computation, Applied Soft Computing, Neurocomputing, Swarm and Evolutionary Computation and Information Sciences. Member of international conference program committees such as “Conference on Hybrid Intelligent System” and “Conference on Evolutionary Multi-criterion Optimization”. And nationally, she was co-chair of the Theses and Dissertations Contest in Artificial and Computational Intelligence (2018-2020). As well as a member of the CSBC Thesis and Dissertations Contest committees (2013, 2014, 2015, 2016, 2017 and 2018), among others. In addition to these activities, during this period, she has been dedicated to the training of human resources, continuously training masters and doctors. As well, it has published several articles in periodicals and conferences of recognized quality.

Data and Geo Sciences – doomed to a happy wedlock

Aderson do Nascimento

Data plays a central role in understanding Earth’s systems and is a vital asset to assess how we find, extract and manage our natural resources (from the cradle to the grave). Additionally, data is pervasive in every aspect of evaluating the impact of our activities on the environment. More specifically, geoscientific data can be acquired, stored, retrieved, processed in many different ways and demands a broader interaction between data science and geoscientific communities. In this talk, I briefly introduce geoscientists' way of thinking and show how both communities can benefit from each other’s profiles to set the track to address many challenges we face concerning energy, hazards, and the environment.

Bio

Aderson do Nascimento is a Professor in the Department of Geophysics at the Federal University of Rio Grande do Norte (UFRN), Brazil. He holds a BSc (Physics) and M.Sc. (Geophysics) both from UFRN and a Ph.D. (Geophysics) from the University of Edinburgh. His main research interests are man-associated and natural earthquakes and seismic interferometry. Since joining UFRN, he has also been actively involved in planning the Geophysics Department and the Geophysics undergraduate program. He has served terms as Director of Studies for the Physics and Geophysics programs (this later with a deputy role) and as the Deputy International Officer for UFRN from 2011 until 2017. From 2005 to 2011, he also was the NE Brazil Regional Secretary for the Brazilian Geophysical Society. Since 2012 he has been the Coordinator of UFRN’s Seismological Lab. (LabSis/UFRN). Recently has been appointed as UFRN’s representative at the federal government’s public policy REATE 2020 (Program of Revitalization for the Activity of Exploration and Production of Oil and Natural Gas on Land Areas). He also has been a PI in several projects funded by both the public and the private sector.

Data Science Platforms: how to set data science experiments at different scales

Javier A. Espinosa-Oviedo

Vast collections of heterogeneous data have become the backbone of scientific, analytic and forecasting processes to understand and predict physical phenomena. In this context, data must go through complex and repetitive analytical processes called “data science pipelines”.

The enactment of data science pipelines involves the balance of different services: (i) hardware (computing, storage and memory), (ii) communication (bandwidth and reliability), and (iii) scheduling of greedy in-memory analytical operations. Current data science environments (e.g., Google Colab, Kaggle) make these aspects transparent to final users. However, understanding how these environments integrate these computational services is key for running data science pipelines at different scales.

This lecture introduces current data science platforms and stacks and illustrates their usage for implementing basic data science pipelines in notebooks. The objective is to give you practical elements for helping you run, reproduce and share results when working in local and cloud environments.

Bio

Javier A. Espinosa-Oviedo is a computer scientist working in the domains of databases and distributed systems. More specifically, Big Data management and processing, cloud computing, and data-centric systems design. He currently works as postdoctoral researcher at the University of Lyon in France. He is also former researcher of the Delft University of Technology, the Barcelona Supercomputing Centre, the Grenoble Informatics Laboratory and the CNRS French-Mexican Laboratory of Informatics and Automatic Control. He obtained his PhD in Computer Science from the University of Grenoble in 2013 and his master and bachelor’s degree in Computer Science and Computer Systems Engineering, from UDLAP, in Mexico, in 2006 and 2008, respectively.

Marcus A. Nunes

Marcus A. Nunes is an Assistant Professor at the Department of Statistics at the Federal University of Rio Grande do Norte (UFRN). He holds a PhD in Statistics from Pennsylvania State University. He is interested in applications of statistics to real-world problems. He has published original research on time series, linguistics, and ecology. He has supervised more than 100 students on his statistical consulting project.