Background

The world is experiencing a data revolution, and the ability to harness and interpret vast amounts of information has become a key driver of economic and social development. In the context of official statistics, the challenge lies in integrating traditional methods with cutting-edge technologies, such as machine learning (ML), to improve data quality, enhance decision-making processes, and ensure that statistical systems are responsive to evolving needs.

Official statistics are the backbone of public policy, informing decisions related to economic planning, social welfare, environmental sustainability, and governance. In the Asia-Pacific region, which is diverse in terms of its economies, cultures, and levels of technological development, the potential for machine learning to revolutionize the production, dissemination, and use of statistics is immense. However, there are significant challenges related to the accessibility of high-quality data, the development of appropriate ML models, and the capacity to adopt such technologies across varying institutional contexts.

This workshop aims to bring together statisticians and data scientists from the Asia-Pacific region to explore the application of machine learning techniques in the context of official statistics. The focus will be on enhancing the capacity of national statistical offices (NSOs) to integrate machine learning tools into their operations and improving the quality and timeliness of statistical outputs to support evidence-based decision-making that drives progress towards the SDGs.

Objectives

This workshop is organized by the Regional Hub on Big Data and Data Science for Asia and the Pacific, hosted by the Indonesian Government, in partnership with UNSD and UNSIAP. The specific objectives of this workshop include:

  • Introducing the fundamental concepts and techniques of machine learning for improving official statistics, with a focus on SDG indicators.
  • Developing technical skills for the utilization of machine learning to enhance official statistics and improve the monitoring of SDGs.
  • Discussing opportunities for leveraging Machine Learning in participants' countries.Further information will be provided on the Regional Hub webpage as it becomes available: https://hub.bps.go.id/

Date and Venue

The workshop will take place over five days, from 3 to 7 February 2025, at the Politeknik Statistika STIS, Jakarta, Indonesia.

Please note that participants are responsible for booking their own accommodation.

The recommended nearby hotels include:

- Harper Hotel Cawang

- Best Western Cawang

Contact

For any questions relating to the workshop, please contact Ms. Lya Hulliyyatus Suadaa (lya@stis.ac.id).

Programme

Time

Topics

Day 1

7:30-8:00

 Registration

8.00-9:00

Transportation from STIS to BPS Statistics Indonesia

9:00-9:30

Opening Session

9:30 – 10:00

Tea Break

10:00 - 12:00

National Statistics Command Centre (NSCC) visit

12:00 - 13:00

Lunch

13:00 - 13:30

Transportation to STIS

13:30-13:45

Orientation: Course objectives, structure and expected results

13:45 – 14:45

Module 1: Data Science

Session 1.1 Data Science basics

         Data Science and Statistics

         Methods and tools for Data Science (SIAP)

14:45 – 15:00

 Tea Break

15:00-16:00

Module  2 : Machine Learning

Session 1.2: Principles of ML:  You’ve seen this before!

       Statistical Learning vs Machine Learning

 

Day 2

08:30 – 09:30

Module  2 (continued)

Session 2.2: ML in practice

· Crash course on R markdown    

. Practical exercises with R (Handouts)

 

09:30 – 10:30

Module 3 : Classification Methods

 

Session 3.1: How classification works?

 

       Supervised vs unsupervised classification

       Examples of classifiers

 

10:30 – 10:45

Tea Break

10:45 – 12:00

Module 3 : Classification Methods

Session 3.2 : Classification Methods

 

       Measures of fit

       Logit as a classifier

       How to choose the best model?

 

12:00 – 13:00

Lunch

13:00-14:45

Module 3 : Classification Methods

Session 3.3:  Classification in Practice

       Practical exercises with R (Handouts)

       Methods and tools for Classification

 

14:45-15:00

Tea Break

15:00- 16:00

Module 3 : Classification Methods

Session 3.4:  Case Studies of ML classification

       ML for Classification

       An example from BPS

 

Day 3

08:30 – 9:30

Module 4 : Regression Methods

Session 4.1: Regression methods

       Linear Regression and all his friends

       Selection of regressors

9:30-10:30

Module 4 : Regression Methods

Session 4.2: Regression methods

       Penalization Methods

       How to choose the best model?

10:30 – 10:45

Tea Break

10:45 – 12:00

Module 4 : Regression Methods

Session 4.3: Regression in practice

       Practical exercises with R (Handouts)

       Methods and tools for Regression

12:00 – 13:00

Lunch

13:00-15:00

Field Trip to TELKOMSEL Indonesia

15:00-15:15

Tea Break

15:15- 16:00

Field Trip to TELKOMSEL Indonesia (Continued)

Day 4

08:30 – 10:30

Module 4 : Regression Methods

Session 4.4: ML Case Study

10:30 – 10:45

Tea Break

10:45 – 12:00

Module 5 : Tree-based models

Session 5.1:  Decision Trees Models

       Decisions Trees

       Bagging and Boosting

 

12:00 – 13:00

Lunch

13:00-15:00

Module 5 : Tree-based models

Session 5.2:  Random Forest Models

       Random Forest methods

       Hyperparameters in Random Forest models

 

15:00-15:15

Tea Break

15:15- 16:00

Module 5 : Tree-based models

Session 5.3:  Random Forest in Practice

     Handouts (in R)

     Imputation using Random Forest

 

Day 5

08:30 – 10:30

Module 5 : Tree-based models

Session 5.4:  Use of Random Forest  (case studies)

       ML and GIS for Land Cover Estimation

10:30 – 10:45

Tea Break

10:45 – 12:00

Session 5.4:  Use of Random Forest  (case studies)

12:00 – 13:30

Lunch

13:45 – 16:00

Closing Session

 

       Evaluation

       Countries plans to use ML

       Certificates

 

16:00

Adjourn