15–27 September 2025 | Holiday Inn Resort Baruna Bali, Indonesia
Background
The era of Big Data has brought about significant transformations in the way governments, businesses, and societies make decisions. In the field of official statistics, the availability of new, large-scale, and real-time data sources provides unprecedented opportunities to enhance the quality, timeliness, and relevance of statistical products. Among the most promising sources is Mobile Positioning Data (MPD), generated by mobile network operators whenever subscribers interact with cellular networks.
MPD is particularly powerful because of its scale, frequency, and spatial detail. It can be used to generate insights on population mobility, tourism flows, commuting patterns, and urban dynamics, which are traditionally costly and time-consuming to measure through surveys. In Indonesia, MPD has already been successfully applied to produce tourism statistics, providing policymakers with evidence-based insights to support sustainable economic development.
Recognizing this potential, the Regional Hub on Big Data and Data Science for Asia and the Pacific, hosted by Statistics Indonesia (BPS) and Politeknik Statistika STIS, has initiated the Knowledge Development Series on MPD for Official Statistics. This series has two complementary dimensions:
- Policy level capacity building through the Executive Training for Policy Makers held in Jakarta in July 2025, where senior officials from across the region discussed strategic directions, governance frameworks, and international case studies.
- Technical level training, which is the focus of this short course in Bali, dedicated to equipping technical staff with the skills and tools required to transform raw MPD into meaningful and policy-relevant statistics
By combining strategic awareness at the leadership level with practical competencies among technical staff, the program aims to create a holistic ecosystem for MPD adoption in the region.
Objectives
This two-week short course has three overarching objectives, each aligned with the broader agenda of statistical modernization:
- Strengthen technical capacity. Participants receive intensive training in modern data science tools, including SQL, Python, and PySpark. The sessions are designed to build competencies in handling large MPD datasets, conducting data cleaning and transformation, and applying analytical techniques such as clustering, stop-spot analysis, and event detection.
- Deepen understanding of MPD applications. Beyond technical exercises, the course provides modules on how MPD can be integrated into official statistics, particularly in tourism, which is a priority sector for many partner countries. Participants explore practical applications such as home location detection, domestic and inbound tourism indicators, and ICT-related statistics.
- Promote regional cooperation. By bringing together national statistical offices, tourism authorities, and mobile network operators from multiple countries, the course fosters dialogue and knowledge-sharing. This regional approach ensures that the development of MPD methodologies is not isolated, but instead benefits from shared experiences and collaborative problem-solving
These objectives together lay the foundation for sustained progress, both at the national and regional levels.
Participants
The short course convened 14 technical staff from across Southeast Asia, representing three categories of institutions:
- National Statistical Offices (NSOs) from the Philippines, Cambodia, and Timor-Leste.
- Tourism authorities from the same three countries, ensuring that the outputs of MPD analysis are directly connected to policy needs in the tourism sector.
- Mobile network operators in Timor-Leste, highlighting the importance of public–private collaboration in accessing and using MPD
Participants were carefully selected to ensure that they are technical professionals with backgrounds in ICT, statistics, and data processing. Their involvement guarantees that the knowledge acquired during the course will be directly relevant and immediately applicable in their institutions.
The diversity of participants also reflects the multi-stakeholder nature of MPD adoption: while NSOs provide statistical expertise, tourism agencies articulate sectoral needs, and MNOs supply the data infrastructure.
Programme
The training programme ran for 13 effective days, from 15 to 27 September 2025, structured into several progressive modules
Week 1: Data Science Foundations
- Databases for Big Data Analytics: covering relational database systems, SQL fundamentals (DDL and DML), and advanced topics such as NoSQL and MongoDB.
- Python for Big Data Analytics: introducing Python programming (via Google Colab), data cleaning and transformation, exploratory data analysis (EDA), and visualization techniques.
- PySpark for Scalable Analytics: introducing distributed computing, PySpark DataFrame operations, window functions, and SQL integration.
Week 2: MPD-Specific Modules and Applications
- Tourism Statistics and MPD: concepts and definitions of domestic and inbound tourism, introduction to MPD access procedures, ICT-related indicators, and methodologies for stop-spot and event analysis.
- Survey Design and Weighting: to link MPD-derived indicators with traditional survey frameworks.
- Country Group Work: participants worked in teams to prepare country-specific feasibility assessments on the potential integration of MPD into their statistical systems.
SHORT COURSE AGENDA | |||
15 September - 27 September 2025 | |||
Date | Time | Topic | Materials |
---|---|---|---|
Monday, 15 September 2025 | 08.00 – 09.00 | Registration | |
09.00 – 09.15 | Event Report | ||
09.15 – 09.30 | Welcome Speech | ||
09.30 – 09.45 | Opening Speech | ||
09.45 – 09.50 | Group Photo | ||
09.50 – 10.10 | Coffee Break | ||
10.10 – 12.10 |
Databases for Big Data Analytics - Getting started with Database - Install RDBMS |
Slides | |
12.10 – 13.30 | Lunch Break | ||
13.30 – 15.30 |
Databases for Big Data Analytics - Querying Data with SQL (DDL) - DDL Exercise |
Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 |
Databases for Big Data Analytics - Querying Data with SQL (DML) - Part 1 - DDL Exercise |
Slides | |
Tuesday, 16 September 2025 | 09.00 – 10.00 |
Databases for Big Data Analytics - Querying Data with SQL (DML) - Part 2 - DML Exercise |
Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 |
Databases for Big Data Analytics and Introduction to NoSQL |
Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 |
Databases for Big Data Analytics MongoDB and NoSQL Exercise |
Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 |
Databases for Big Data Analytics MongoDB and NoSQL Exercise |
Slides | |
Wednesday, 17 September 2025 | 09.00 – 10.00 |
Python for Big Data Analytics - Introduction to Google Colab - Introduction to Python - Write/Read Files - Pandas Basic |
Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 |
Python for Big Data Analytics - Data Cleaning & Transformation - Exploratory Data Analysis (EDA) |
Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 |
Python for Big Data Analytics Data Visualization using Python |
Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 |
Python for Big Data Analytics Data Visualization using Python |
Slides | |
Thursday, 18 September 2025 | 09.00 – 10.00 |
Python for Big Data Analytics Data Visualization using Kepler.gl |
Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 |
Python for Big Data Analytics - Introduction to PySpark - PySpark DataFrame Operations |
Slides , Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 |
Python for Big Data Analytics - Joining & aggregating using PySpark - PySpark window function - SQL using PySpark |
Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 |
Python for Big Data Analytics DBSCAN Clustering |
Slides , Slides | |
Friday, 19 September 2025 | 09.00 – 10.00 |
Concept and Definition of Tourism Domestic & Inbound |
Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 |
Concept and Definition of Tourism Domestic & Inbound |
Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Introduction to MPD and MPD Access | Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 | MPD for ICT Indicators | Slides | |
Saturday, 20 September 2025 | 09.00 – 10.00 | Quality Assurance | Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 | Quality Assurance | Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Usual Environment | Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 | Usual Environment | Slides | |
Sunday, 21 September 2025 | 09.00 – 10.00 | Stopspot | Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 | Stopspot | Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Domestic Tourism | Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 | Domestic Tourism | Slides | |
Monday, 22 September 2025 | 08.00 – 12.00 | Visit to Badung Smart City | |
12.00 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Inbound Tourism | Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 | Inbound Tourism | Slides | |
Tuesday, 23 September 2025 | 09.00 – 10.00 | Event Analysis | |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 | Event Analysis | Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Survey Design and Weighting | Slides | |
15.30 – 15.45 | Coffee Break | ||
15.45 – 16.45 | Survey Design and Weighting | Slides | |
Wednesday, 24 September 2025 | 09.00 – 10.00 | Tourism Statistics | Slides |
10.00 – 10.15 | Coffee Break | ||
10.15 – 12.15 | Tourism Statistics | Slides | |
12.15 – 13.30 | Lunch Break | ||
13.30 – 15.30 | Group Presentation | ||
15.30 – 15.45 | Lunch Break | ||
15.45 – 16.45 | Group Presentation | ||
Thursday, 25 September 2025 | 09.00 – 16.00 | Group Discussion | |
Friday, 26 September 2025 | 09.00 – 12.00 | Group Discussion | |
12.00 – 14.00 | Lunch Break | ||
14.00 – 15.00 | Big Data for Official Statistics | ||
15.00 – 16.00 | Closing | ||
Saturday, 27 September 2025 | 08.00 – 17.00 | Sociocultural Activity |
Outputs and Outcomes
The short course generated several important outputs. Participants gained practical skills in SQL, Python, and PySpark for big data analytics, along with a solid understanding of MPD methodologies such as data preprocessing, trip identification, stop-spot analysis, and the production of tourism-related indicators. In addition, each country team developed feasibility assessments that mapped opportunities, identified challenges, and proposed possible cooperation mechanisms with mobile network operators in their respective contexts. These outputs provided concrete deliverables that participants could take back to their institutions for immediate use.
In terms of long-term outcomes, the program is expected to strengthen the readiness of partner countries to integrate MPD into their official statistical systems. It also laid the groundwork for future pilot projects in MPD-based tourism statistics while enhancing regional collaboration and building a stronger community of practice. Through this shared experience, participating countries are better positioned to modernize their statistical processes and contribute to the advancement of official statistics across Asia and the Pacific.
Acknowledgment
This short course was made possible through the generous support of Indonesia AID as the funding partner, the commitment of BPS–Statistics Indonesia, and the dedication of the Regional Hub Secretariat at Politeknik Statistika STIS.
Special thanks are extended to the trainers and facilitators from BPS and STIS, who not only delivered technical sessions but also mentored participants throughout the exercises. Most importantly, the organizers express gratitude to all participants for their active engagement, enthusiasm, and commitment over the two-week program.
