Predicting House Price in Hong Kong #4
Date: 26 July 2020
In #3, I faced difficulty in having a lot of missing values in the property transaction data. After searching the web, I found that Centaline claimed they have spent 10 million HKD to fill the missing values. Maybe, I should try scraping data from Centaline.
Luckily, I have scrapped Centaline data before. Knowing that scraping website by HTML elements is quite inefficient, I head directly to scraping Centaline’s android app.
How to find the hidden APIs in an Android app?
- First, I found the Android APK file by doing a Google Search.
- I used an opensource script in GitHub to modify the security exception in the Android app. For reasons why we need to do so, please refer to the Github repository’s explanation.
- I installed the modified app back to my Android phone.
- I used Mitmproxy to listen to HTTP traffic.
- I found the hidden APIs for getting transactional data
For a visual step by step guide, please visit my Video on LinkedIn.
Results
curl -H 'userId: ff0a1e7e-73fa-459b-bc0d-f5acd0ba228c' -H 'lang: tc' -H 'Content-Type: application/json; charset=UTF-8' -H 'Connection: Keep-Alive' --compressed -H 'User-Agent: okhttp/4.7.2' -X POST https://hkapi.centanet.com/api/Transaction/Map.json -d '{"daterange":90,"postType":"s","refdate":"20200726","order":"desc","page":2,"pageSize":20,"pixelHeight":2220,"pixelWidth":1080,"points[0].lat":22.695053063373795,"points[0].lng":113.85844465345144,"points[1].lat":22.695053063373795,"points[1].lng":114.38281349837781,"points[2].lat":21.993328259196705,"points[2].lng":114.38281349837781,"points[3].lat":21.993328259196705,"points[3].lng":113.85844465345144,"sort":"score","zoom":9.745128631591797,"platform":"android"}'
By using the curl command above, you can get the most recent 90 days transaction data. And, you can modify the options to fit your requirements.
In the next step, I am going to further analyze the API and find out whether there are less missing values in Centaline’s data.
P.S. You can keep track on all project codes in this Git repo.
Comments ()