Scraping Real-time Earthquake Data from BMKG Website using BS4 and Django

Danish Wang
7 min readOct 2, 2023

--

Chapter 1: Background Story

Indonesia is one of many countries situated in the Ring of Fire region and experiences earthquakes on a daily basis compared to other countries that are also situated within the Ring of Fire region. Therefore, in this context, BMKG (Indonesia’s national meteorological, climatological, and geophysical agency) plays a crucial role in monitoring, analyzing, and responding to this seismic events, ensuring the safety and well-being of its citizens.

One way to fulfill this role is to create a website that can monitor earthquakes in real-time.

BMKG Website Page for Real-time Earthquake Data

Unfortunately, the page only shows the top 192 latest data. If there is new incoming data, we are no longer able to access the older data directly from the page.

There are two alternative solutions to this issue:
1. We can go to this page to request the data by inputting some parameters we need, or
2. We can create a web scraper app to store the data into our own database.

In this article, I’m going to explain the second solution.

Chapter 2: Technical Approach

2.1 Disclaimer

I’m using PyCharm as my IDE and already had Python Virtual Environment created in my working space.

I’m also using PostgreSQL for my database.

2.2 Install Django

First thing first, we need to install Django framework in our workspace to create a Django project.

Run below command to install:

2.3 Install Django REST Framework

Django REST Framework (DRF) is a powerful tool for building RESTful APIs with Django. It provides a number of features that make it easy to create APIs that are well-designed, easy to use, and scalable.

Run below command to install:

2.4 Install Beautiful Soup 4

This is the library we are going to use to parse HTML and XML documents. It can also be used to extract data from websites, such as the title of a web page or the text of a paragraph.

Run below command to install:

2.5 Install Psycopg2

Since I’m using PostgreSQL, I need to install Psycopg2 library for connecting to and interacting with PostgreSQL databases.

Run below command to install:

2.6 Create a Project

Once the Django framework is successfully installed, we are now able to create a project by running this command format:
django-admin startproject [ProjectName] [Directory]

ProjectName part is simply how we want our project to be named.
Directory part is the location where we want our project to be located.

Mine is like this:

I called my project with ComputationalScienceProject, and the dot (.) directory means to tell Django that I want to create the project in the current directory I’m working.

This is how our project structure should look like now:

Remember that we installed a library to interact with PostgreSQL before. Now, inside the settings.py file listed in the ComputationalScienceProject directory, we need to set up the database connection like this:

Fill the ENGINE property value exactly the same like being showed in the picture.

The rest of the fields is adjusted to your current situation based on the guide below:

NAME: The name of your database
USER: The username of your database
PASSWORD: The password of your database
HOST: The host of your database
PORT: The port of your database

2.7 Create an App

Let’s create an app called earthquake by running below command:

Now, our project structure should look like this:

Again, we need to set up our app in settings.py file in the application definition part as shown below:

2.8 Create a Data Model

Models are used to define the structure of the data that will be stored in the database, such as the fields that each record will have.

Every time we create a new data model or make any changes to it, we need to run two important commands as follow:

Makemigrations scans your Django models and generates Python files that contain the SQL changes necessary to update your database to match your models.

Migrate applies the SQL changes generated by makemigrations to your database.

2.9 Create Business Logics

  1. Import requests and BeautifulSoup library

We are going to make use of requests to retrieve the whole content in an HTML form, and BeautifulSoup to parse the HTML.

2. Create functions to retrieve, parse, and map the data to objects

I created a function called collect_new_data.
The collect_new_data function has a parameter called target which will be filled with an argument contains a keyword to define a URL to be scraped.

The target parameter will be used later in retrieve_data_from_web function.

Here, the implementation of requests and BeautifulSoup library occured.

If the requests.get() method successfully retrieved the content from the URL, then the HTML form of the page will be looking like this:

Next, I applied one of BeautifulSoup features called html.parser, and our HTML form will now look like this:

Now, we can find where the table data is located along with their tags easily,

and we can map the data to our model object.

I don’t want my data to be duplicated in the database. Therefore, I created a function to filter out the already existed data.

Last, the data will be saved using the syntax below (showed in the previous collect_new_data function):

models.Data.objects.bulk_create(new_incoming_data)

2.10 Create the API

In the urls.py file in ComputationalScienceProject directory, we need to set up our URLs in this format inside a list:

path(“end/point/”, the_related.method, name = “optional”)

In the views.py file in earthquake directory, we define methods related to the URLs in the urls.py file.
But first we need to import the required libraries

We usejsonto convert the request body into JSON object.

We useHttpResponse class to create HTTP responses, which are the responses that Django sends back to clients when they make requests.

The @api_view decorator is used to convert a function-based view into an API view. This means that the view function will be accessible to clients over the HTTP protocol.

The business_logics is a module where methods for all the URLs is located.

2.11 Test Drive

In this part, I’m using Postman to hit the API program that running locally in my computer.

To run the program, just run below command:
python manage.py runserver

Watch the video below to see the demo

Chapter 3: That’s a Wrap!

That’s it for this article covering the topic about scraping data from BMKG earthquake real-time page.

Hope this article helps to open up the possibilities of other explorations.

Thank you.

--

--