Background
The world is experiencing a data revolution, and the ability to harness and interpret vast amounts of information has become a key driver of economic and social development. In the context of official statistics, the challenge lies in integrating traditional methods with cutting-edge technologies, such as machine learning (ML), to improve data quality, enhance decision-making processes, and ensure that statistical systems are responsive to evolving needs.
Official statistics are the backbone of public policy, informing decisions related to economic planning, social welfare, environmental sustainability, and governance. In the Asia-Pacific region, which is diverse in terms of its economies, cultures, and levels of technological development, the potential for machine learning to revolutionize the production, dissemination, and use of statistics is immense. However, there are significant challenges related to the accessibility of high-quality data, the development of appropriate ML models, and the capacity to adopt such technologies across varying institutional contexts.
This workshop aims to bring together statisticians and data scientists from the Asia-Pacific region to explore the application of machine learning techniques in the context of official statistics. The focus will be on enhancing the capacity of national statistical offices (NSOs) to integrate machine learning tools into their operations and improving the quality and timeliness of statistical outputs to support evidence-based decision-making that drives progress towards the SDGs.
Objectives
This workshop is organized by the Regional Hub on Big Data and Data Science for Asia and the Pacific, hosted by the Indonesian Government, in partnership with UNSD and UNSIAP. The specific objectives of this workshop include:
- Introducing the fundamental concepts and techniques of machine learning for improving official statistics, with a focus on SDG indicators.
- Developing technical skills for the utilization of machine learning to enhance official statistics and improve the monitoring of SDGs.
- Discussing opportunities for leveraging Machine Learning in participants' countries.Further information will be provided on the Regional Hub webpage as it becomes available: https://hub.bps.go.id/
Date and Venue
The workshop will take place over five days, from 3 to 7 February 2025, at the Politeknik Statistika STIS, Jakarta, Indonesia.
Please note that participants are responsible for booking their own accommodation.
The recommended nearby hotels include:
Contact
For any questions relating to the workshop, please contact Ms. Lya Hulliyyatus Suadaa (lya@stis.ac.id).
Programme
Time |
Topics |
Day
1 |
|
7:30-8:00 |
Registration |
8.00-9:00 |
Transportation
from STIS to BPS Statistics Indonesia |
9:00-9:30 |
Opening Session |
9:30 – 10:00 |
Tea
Break |
10:00 - 12:00 |
National Statistics Command Centre
(NSCC) visit |
12:00 - 13:00 |
Lunch |
13:00 - 13:30 |
Transportation to STIS |
13:30-13:45 |
Orientation: Course objectives,
structure and expected results |
13:45 – 14:45 |
Module
1: Data Science Session 1.1 Data Science
basics •
Data Science and Statistics •
Methods and tools for Data Science (SIAP) |
14:45 – 15:00 |
Tea
Break |
15:00-16:00 |
Module 2 : Machine
Learning Session 1.2: Principles of
ML: “You’ve
seen this before!” ●
Statistical Learning vs
Machine Learning |
Day
2 |
|
08:30 – 09:30 |
Module
2 (continued) Session 2.2: ML in practice · Crash course on R
markdown . Practical exercises with R
(Handouts) |
09:30 – 10:30 |
Module
3 : Classification Methods Session 3.1: How classification works? ●
Supervised vs
unsupervised classification ●
Examples of classifiers |
10:30 – 10:45 |
Tea
Break |
10:45 – 12:00 |
Module
3 : Classification Methods Session 3.2 : Classification Methods ●
Measures of fit ●
Logit as a classifier ● How to choose the best
model? |
12:00 – 13:00 |
Lunch |
13:00-14:45 |
Module
3 : Classification Methods Session 3.3: Classification in Practice ●
Practical
exercises with R (Handouts) ● Methods and tools for Classification |
14:45-15:00 |
Tea
Break |
15:00- 16:00 |
Module
3 : Classification Methods Session 3.4: Case Studies of ML classification ●
ML
for Classification ● An example from BPS |
Day
3 |
|
08:30 – 9:30 |
Module
4 : Regression Methods Session 4.1: Regression methods ●
Linear Regression and all his friends ●
Selection of regressors |
9:30-10:30 |
Module
4 : Regression Methods Session 4.2: Regression methods ●
Penalization Methods ●
How to choose the best model? |
10:30 – 10:45 |
Tea Break |
10:45 – 12:00 |
Module
4 : Regression Methods Session 4.3: Regression in practice ●
Practical
exercises with R (Handouts) ● Methods and tools for Regression |
12:00 – 13:00 |
Lunch |
13:00-15:00 |
Field Trip to TELKOMSEL Indonesia |
15:00-15:15 |
Tea
Break |
15:15-
16:00 |
Field Trip to TELKOMSEL Indonesia
(Continued) |
Day
4 |
|
08:30 – 10:30 |
Module
4 : Regression Methods Session 4.4: ML Case Study |
10:30 – 10:45 |
Tea Break |
10:45 – 12:00 |
Module
5 : Tree-based models Session 5.1: Decision Trees Models ●
Decisions Trees ●
Bagging and Boosting |
12:00 – 13:00 |
Lunch |
13:00-15:00 |
Module
5 : Tree-based models Session 5.2: Random Forest Models ●
Random Forest methods ●
Hyperparameters in Random Forest models |
15:00-15:15 |
Tea Break |
15:15- 16:00 |
Module
5 : Tree-based models Session 5.3: Random Forest in Practice ●
Handouts (in R) ●
Imputation using Random Forest |
Day
5 |
|
08:30 – 10:30 |
Module
5 : Tree-based models Session 5.4: Use of Random Forest (case studies) ● ML and GIS for Land Cover
Estimation |
10:30 – 10:45 |
Tea Break |
10:45 – 12:00 |
Session 5.4: Use of Random Forest (case studies) |
12:00 – 13:30 |
Lunch |
13:45 – 16:00 |
Closing Session ●
Evaluation ●
Countries
plans to use ML ●
Certificates
|
16:00 |
Adjourn |