Identifying Urban Areas by Combining Human Judgment and Machine Learning : An Application to India
This paper proposes a methodology for identifying urban areas that combines subjective assessments with machine learning, and applies it to India, a country where several studies see the official urbanization rate as an under-estimate. For a repres...
Main Authors: | , , |
---|---|
Language: | English |
Published: |
World Bank, Washington, DC
2020
|
Subjects: | |
Online Access: | http://documents.worldbank.org/curated/en/920791582554716856/Identifying-Urban-Areas-by-Combining-Human-Judgment-and-Machine-Learning-An-Application-to-India http://hdl.handle.net/10986/33392 |
Summary: | This paper proposes a methodology for
identifying urban areas that combines subjective assessments
with machine learning, and applies it to India, a country
where several studies see the official urbanization rate as
an under-estimate. For a representative sample of cities,
towns and villages, as administratively defined, human
judgment of Google images is used to determine whether they
are urban or rural in practice. Judgments are collected
across four groups of assessors, differing in their
familiarity with India and with urban issues, following two
different protocols. The judgment-based classification is
then combined with data from the population census and from
satellite imagery to predict the urban status of the sample.
The Logit model, and LASSO and random forests methods, are
applied. These approaches are then used to decide whether
each of the out-of-sample administrative units in India is
urban or rural in practice. The analysis does not find that
India is substantially more urban than officially claimed.
However, there are important differences at more
disaggregated levels, with “other towns” and “census towns”
being more rural, and some southern states more urban, than
is officially claimed. The consistency of human judgment
across assessors and protocols, the easy availability of
crowd-sourcing, and the stability of predictions across
approaches, suggest that the proposed methodology is a
promising avenue for studying urban issues. |
---|