View on GitHub

DATA310

DATA 310 - Professor Frazier; MWF 1200-1250

Project 3

Data

Country: Kenya

Map of Kenya at the adm2 level:

image

Kenya as a whole had too much data for my computer to process, so I chose a much smaller area to look at - Kisumu. Kisumu is the third largest city in Kenya, and is located in southwest Kenya along Lake Victoria. Below is an image that shows where Kisumu is located (shown in red), followed by a close up of the region at the adm3 level:

image

image

Raster Files

In addition to the shapefile of Kisumu, we have 12 raster files with information on important features of the region, as well as a population raster dataset.

Linear Regression

Plot of predicted population sums:

image

Plot that shows the difference between actual and predicted population sums:

image

As you can see, the linear regression model underpredicts the population in one area of Kisumu. This area is a highly populated area. The model likely underpredicts population in this area because it does not take into account building height, so does not realize that some buildings are tall and can house many people.

3D plot of ME:

image

All in all, this plot shows that the model is relatively good. It could probably be improved by increasing the number of variables that can be used to determine population density in more urban areas.

Random Forest

Plot of the model:

image

Plot of the variable importance:

image

Plot of predicted population sums:

image

Plot that shows the difference between actual and predicted population sums:

image

3D plot that shows the difference between actual and predicted population sums:

image

3D plot of ME:

image

Again, the model is relatively good but could probably be improved by increasing the number of variables that can be used to determine population density in more urban areas. That said, it is not quite as good as the linear model, with higher ME in urban areas. Therefore, the linear regression model is the best out of the two.