How to Plot a World Map Using Python and GeoPandas
Learn how to plot maps of the world, continents, and countries using Python and GeoPandas.
A couple weeks ago, I authored an article on Getting Started with the International Disaster Database (EM-DAT) using Python and Pandas.
While that article briefly reviewed the hierarchical fashion in which EM-DAT organizes natural disaster types, today I want to:
By the end of this article, you’ll have a strong understanding of how the EM-DAT database organizes natural disasters in a hierarchical manner.
Table of Contents
For the sake of simplicity and reproducibility, I am using the Kaggle-hosted version of the EM-DAT dataset, which is freely available to download and use.
If you instead use the official EM-DAT version from CRED (which requires registration), the results of running my code may look slightly different.
Kindly keep in mind potential result discrepancy when running your own experiments.
The EM-DAT dataset organizes natural disasters into a hierarchy (image credit)
The EM-DAT database catalogs over 25,000 mass disasters from the year 1900 to the present day, including a total of 58 unique disaster types (e.g., flood, hurricane, tornado, etc.).
As I mentioned in my introductory article on the EM-DAT dataset, EM-DAT organizes natural disasters in a hierarchical fashion, making it (theoretically) easier for data scientists to navigate the dataset.
I say “easier” because working with EM-DAT has a bit of a learning curve, one that can only be overcome by exploring the data.
The hierarchical structure of EM-DAT allows you to drill down into natural disaster types based on the following five columns:
The best way to fully comprehend this hierarchical structure is with a series of examples.
To start, we can load the EM-DAT dataset from disk:
# import the necessary packages
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import os
# specify the path to the EM-DAT dataset
emdat_dataset_path = os.path.join(
"natural-disasters-data",
"em-dat",
"EMDAT_1900-2021_NatDis.csv"
)
# load the EM-DAT natural disasters dataset from disk
df = pd.read_csv(emdat_dataset_path)
df.tail()
Our dataframe is now ready for analysis:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15822 | 2020-0031-ZMB | 2020 | 31 | Natural | Hydrological | Flood | NaN | NaN | NaN | Affected | Zambia | ZMB | Eastern Africa | Africa | Gwembe, Siavonga, Mambwe and Lumezi districts | Heavy rains | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Km2 | NaN | NaN | NaN | NaN | 2020 | 1.0 | NaN | 2020 | 1.0 | NaN | NaN | NaN | 1500.0 | NaN | 1500.0 | NaN | NaN | NaN | NaN |
15823 | 2020-0110-ZMB | 2020 | 110 | Natural | Hydrological | Flood | NaN | NaN | NaN | Affected | Zambia | ZMB | Eastern Africa | Africa | Samfya, Mushindamo, Nakonde districts (Luapula province) | Heavy rains | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Km2 | NaN | NaN | NaN | NaN | 2020 | 3.0 | 20.0 | 2020 | 3.0 | 26.0 | NaN | NaN | 700000.0 | NaN | 700000.0 | NaN | NaN | NaN | NaN |
15824 | 2021-0036-ZWE | 2021 | 36 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical cyclone 'Eloise' | Kill | Zimbabwe | ZWE | Eastern Africa | Africa | Eswatini | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2021 | 1.0 | 23.0 | 2021 | 1.0 | 23.0 | 3.0 | NaN | 1745.0 | NaN | 1745.0 | NaN | NaN | NaN | NaN |
15825 | 2020-0131-TLS | 2020 | 131 | Natural | Hydrological | Flood | Riverine flood | NaN | NaN | Affected | Timor-Leste | TLS | South-Eastern Asia | Asia | Cristo Rei, Nain Feto, Dom Aleixo, and Vera Cruz (Dili municipality) | Heavy rains | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Km2 | NaN | NaN | NaN | NaN | 2020 | 3.0 | 13.0 | 2020 | 3.0 | 13.0 | 3.0 | 7.0 | 9124.0 | NaN | 9131.0 | NaN | NaN | 20000.0 | NaN |
15826 | 2020-0362-SSD | 2020 | 362 | Natural | Hydrological | Flood | NaN | NaN | NaN | Affected | South Sudan | SSD | Northern Africa | Africa | Bor South, Twic East, Duk, Ayod Countie (Jonglei); Renk county (Eastern Nile); Pochallla county (Pibor); Lakes, Unity, Upper Nile, Warra, Western Equatoria, Central Equatoria, Northern Bahr-el-Ghazal | Heavy rains | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Km2 | NaN | NaN | NaN | White Nile, Akobo River | 2020 | 7.0 | NaN | 2020 | 12.0 | NaN | NaN | NaN | 1042000.0 | NaN | 1042000.0 | NaN | NaN | NaN | NaN |
Let’s now move on to exploring the hierarchical structure of the EM-DAT dataset.
The base of the EM-DAT hierarchy starts with the “Disaster Group” column:
# display the disaster groups
df["Disaster Group"].unique()
However, this is an uninteresting place to start since this column has only a single value (i.e., “Natural”):
array(['Natural'], dtype=object)
For this reason, and for all practical purposes, we typically consider the “Disaster Subgroup” column to be our starting point of the EM-DAT hierarchy.
The following code snippet allows us to explore all possible “Disaster Subgroups” in EM-DAT:
# display the natural disaster subgroups
df["Disaster Subgroup"].unique()
Which gives us:
array(['Climatological', 'Geophysical', 'Meteorological', 'Hydrological',
'Biological', 'Extra-terrestrial'], dtype=object)
For example, we can grab all “Meteorological” natural disasters from the EM-DAT dataset using the following code:
# grab all rows that are part of the 'meteorological' disaster subgroup
df_meteo = df[df["Disaster Subgroup"] == "Meteorological"]
df_meteo.tail()
As our output dataframe shows, we’ve successfully filtered all “Meteorological” events:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15807 | 2020-0425-VNM | 2020 | 425 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical storm 'Nangka' (Nika) | Waiting | Viet Nam | VNM | South-Eastern Asia | Asia | Nam Dinh, Ninh Bình, Thanh Hóa provinces | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 85.0 | Kph | NaN | NaN | NaN | NaN | 2020 | 10.0 | 13.0 | 2020 | 10.0 | 14.0 | 2.0 | NaN | 67855.0 | 2925.0 | 70780.0 | NaN | NaN | NaN | NaN |
15808 | 2020-0462-VNM | 2020 | 462 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical storm 'Noul' (Leon) | Kill | Viet Nam | VNM | South-Eastern Asia | Asia | Da Nang | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 85.0 | Kph | NaN | NaN | NaN | NaN | 2020 | 9.0 | 18.0 | 2020 | 9.0 | 21.0 | 6.0 | NaN | 125000.0 | NaN | 125000.0 | NaN | NaN | 33000.0 | NaN |
15809 | 2020-0558-VNM | 2020 | 558 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical depression 'Vicky' (Krovanh) | Affected | Viet Nam | VNM | South-Eastern Asia | Asia | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 12.0 | 21.0 | 2020 | 12.0 | 21.0 | 1.0 | 4.0 | NaN | NaN | 4.0 | NaN | NaN | NaN | NaN |
15810 | 2020-0132-VUT | 2020 | 132 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Cyclone 'Harold' | -- | Vanuatu | VUT | Melanesia | Oceania | Pentecost, Espiritu Santo, Penama, Sanma, Malampa, Shefa, Torba | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 4.0 | 2020 | 4.0 | 5.0 | 5.0 | NaN | 83837.0 | NaN | 83837.0 | NaN | NaN | NaN | NaN |
15824 | 2021-0036-ZWE | 2021 | 36 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical cyclone 'Eloise' | Kill | Zimbabwe | ZWE | Eastern Africa | Africa | Eswatini | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2021 | 1.0 | 23.0 | 2021 | 1.0 | 23.0 | 3.0 | NaN | 1745.0 | NaN | 1745.0 | NaN | NaN | NaN | NaN |
Nested under the “Disaster Subgroup”, the “Disaster Type” offers a more specific classification of the natural disaster.
Let’s explore the “Disaster Type” for all “Meteorological” events:
# display all natural disaster types for "meteorological" events
df_meteo["Disaster Type"].unique()
Here is the output:
array(['Storm', 'Extreme temperature', 'Fog'], dtype=object
We see there are three types of meteorological Disaster Types:
To examine all “Storm” Disaster Types from the “Meteorological” Disaster Subgroup, we can use the following snippet:
# grab all rows that are part of the 'meteorological' disaster subgroup
df_storm = df_meteo[df_meteo["Disaster Type"] == "Storm"]
df_storm.tail()
Notice how the following dataframe only includes “Storm” events:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15807 | 2020-0425-VNM | 2020 | 425 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical storm 'Nangka' (Nika) | Waiting | Viet Nam | VNM | South-Eastern Asia | Asia | Nam Dinh, Ninh Bình, Thanh Hóa provinces | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 85.0 | Kph | NaN | NaN | NaN | NaN | 2020 | 10.0 | 13.0 | 2020 | 10.0 | 14.0 | 2.0 | NaN | 67855.0 | 2925.0 | 70780.0 | NaN | NaN | NaN | NaN |
15808 | 2020-0462-VNM | 2020 | 462 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical storm 'Noul' (Leon) | Kill | Viet Nam | VNM | South-Eastern Asia | Asia | Da Nang | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 85.0 | Kph | NaN | NaN | NaN | NaN | 2020 | 9.0 | 18.0 | 2020 | 9.0 | 21.0 | 6.0 | NaN | 125000.0 | NaN | 125000.0 | NaN | NaN | 33000.0 | NaN |
15809 | 2020-0558-VNM | 2020 | 558 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical depression 'Vicky' (Krovanh) | Affected | Viet Nam | VNM | South-Eastern Asia | Asia | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 12.0 | 21.0 | 2020 | 12.0 | 21.0 | 1.0 | 4.0 | NaN | NaN | 4.0 | NaN | NaN | NaN | NaN |
15810 | 2020-0132-VUT | 2020 | 132 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Cyclone 'Harold' | -- | Vanuatu | VUT | Melanesia | Oceania | Pentecost, Espiritu Santo, Penama, Sanma, Malampa, Shefa, Torba | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 4.0 | 2020 | 4.0 | 5.0 | 5.0 | NaN | 83837.0 | NaN | 83837.0 | NaN | NaN | NaN | NaN |
15824 | 2021-0036-ZWE | 2021 | 36 | Natural | Meteorological | Storm | Tropical cyclone | NaN | Tropical cyclone 'Eloise' | Kill | Zimbabwe | ZWE | Eastern Africa | Africa | Eswatini | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2021 | 1.0 | 23.0 | 2021 | 1.0 | 23.0 | 3.0 | NaN | 1745.0 | NaN | 1745.0 | NaN | NaN | NaN | NaN |
The Disaster Subtype provides an even more detailed breakdown of the Disaster Type.
Let’s take “Storm” as our starting “Disaster Type”:
# display all natural disaster subtypes for "storm" events
df_storm["Disaster Subtype"].unique()
Which gives us the following output:
array(['Tropical cyclone', 'Convective storm', nan,
'Extra-tropical storm'], dtype=object)
We see there are four Disaster Subtypes when we start with “Storm” as our root “Disaster Type”:
Any rows with a value of “NA” implies that the particular natural disaster is not categorized beyond the “Disaster Type”.
Let’s now filter on all “Convective storm” samples:
# grab all rows that are part of the 'convective form' disaster subtype
df_convective = df_storm[df_storm["Disaster Subtype"] == "Convective storm"]
df_convective.tail()
Which gives us the following dataframe:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15786 | 2020-0167-USA | 2020 | 167 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Waiting | United States of America (the) | USA | Northern America | Americas | Texas, Oklahoma, Louisiana, Mississippi, Alabama, Georgia, Florida, Virginia | NaN | Flood | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 21.0 | 2020 | 4.0 | 24.0 | 3.0 | 31.0 | NaN | NaN | 31.0 | NaN | NaN | 1400000.0 | NaN |
15787 | 2020-0011-USA | 2020 | 11 | Natural | Meteorological | Storm | Convective storm | Severe storm | NaN | Kill | United States of America (the) | USA | Northern America | Americas | Texas, Oklahoma, Missouri, Arkansas, Louisiana, Mississippi, Alabama, Tennessee, Kentucky, Georgia states | NaN | Flood | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 1.0 | 10.0 | 2020 | 1.0 | 12.0 | 10.0 | NaN | NaN | NaN | NaN | NaN | NaN | 1200000.0 | NaN |
15791 | 2020-0165-VNM | 2020 | 165 | Natural | Meteorological | Storm | Convective storm | Lightning/Thunderstorms | NaN | Affected | Viet Nam | VNM | South-Eastern Asia | Asia | Ha Giang, Son La, Yen Bai, Lao Cai, and Quang Binh Provinces | NaN | Flood | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 22.0 | 2020 | 4.0 | 27.0 | 3.0 | 13.0 | 30000.0 | NaN | 30013.0 | NaN | NaN | NaN | NaN |
15798 | 2020-0082-USA | 2020 | 82 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Waiting | United States of America (the) | USA | Northern America | Americas | Nashville (Tennessee), Kentucky, Missouri, Mississippi, Georgia, Texas Oklahoma, Illinois, Indiana, Ohio, Arkansas, West Virginia, Pennsylvania | NaN | NaN | NaN | NaN | NaN | Yes | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 3.0 | 2.0 | 2020 | 3.0 | 5.0 | 25.0 | 300.0 | 12000.0 | NaN | 12300.0 | NaN | NaN | 2500000.0 | NaN |
15799 | 2020-0582-USA | 2020 | 582 | Natural | Meteorological | Storm | Convective storm | Severe storm | NaN | SigDam | United States of America (the) | USA | Northern America | Americas | Missouri, Oklahoma, Texas, Illinois, Indiana, Ohio, Arkansas, Kentucky, Tennessee, West Virginia, Pennnsylvania | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 3.0 | 27.0 | 2020 | 3.0 | 28.0 | NaN | NaN | NaN | NaN | NaN | NaN | 2200000.0 | 2900000.0 | NaN |
Now, only convective storms are included in the output.
However, if you examine the “Disaster Subsubtype” column, you’ll see further categorization of the natural disaster, including “Tornado”, “Severe storm”, “Lightning/Thunderstorms”, etc.).
The most granular level of classification in the EM-DAT hierarchy is the Disaster Subsubtype.
Let’s start with “Convective storm” as the “Disaster Subtype” and determine all possible “Disaster Subsubtype” values:
# display all natural disaster subtypes for "storm" events
df_convective["Disaster Subsubtype"].unique()
The output of the code follows:
array(['Tornado', 'Hail', 'Severe storm', 'Winter storm/Blizzard',
'Lightning/Thunderstorms', nan, 'Sand/Dust storm', 'Rain',
'Storm/Surge', 'Derecho'], dtype=object)
Which tells us there are 10 Disaster Subsubtypes for “Convective storms”:
As a final example, let’s grab all rows where the “Disaster Subsubtype” is “Tornado”:
# grab all rows that are part of the 'tornado form' disaster subsubtype
df_tornado = df_convective[df_convective["Disaster Subsubtype"] == "Tornado"]
df_tornado.tail()
And sure enough, we’ve now filtered only the tornado events from EM-DAT:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15298 | 2019-0081-USA | 2019 | 81 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Kill | United States of America (the) | USA | Northern America | Americas | Alabama, Georgia, South Carolina, Florida, Mississippi, | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2019 | 3.0 | 3.0 | 2019 | 3.0 | 4.0 | 28.0 | 90.0 | NaN | NaN | 90.0 | NaN | 140000.0 | 190000.0 | 100.0 |
15780 | 2020-0190-USA | 2020 | 190 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | SigDam | United States of America (the) | USA | Northern America | Americas | Illinois, Iowa, Wisconsin, Michigan, Indiana, Ohio, Kentucky, Arkansas, Tennessee, Missouri | NaN | Hail | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 6.0 | 2020 | 4.0 | 9.0 | NaN | NaN | NaN | NaN | NaN | NaN | 2200000.0 | 2900000.0 | NaN |
15785 | 2020-0148-USA | 2020 | 148 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Kill | United States of America (the) | USA | Northern America | Americas | Louisiana, Texas, Mississippi, South Carolina, Georgia, Tennessee, Arkansas, North Carolina, Alabama | NaN | Flood | NaN | NaN | NaN | NaN | NaN | 160.0 | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 10.0 | 2020 | 4.0 | 14.0 | 38.0 | 200.0 | NaN | NaN | 200.0 | NaN | 2600000.0 | 3500000.0 | NaN |
15786 | 2020-0167-USA | 2020 | 167 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Waiting | United States of America (the) | USA | Northern America | Americas | Texas, Oklahoma, Louisiana, Mississippi, Alabama, Georgia, Florida, Virginia | NaN | Flood | NaN | NaN | NaN | NaN | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 4.0 | 21.0 | 2020 | 4.0 | 24.0 | 3.0 | 31.0 | NaN | NaN | 31.0 | NaN | NaN | 1400000.0 | NaN |
15798 | 2020-0082-USA | 2020 | 82 | Natural | Meteorological | Storm | Convective storm | Tornado | NaN | Waiting | United States of America (the) | USA | Northern America | Americas | Nashville (Tennessee), Kentucky, Missouri, Mississippi, Georgia, Texas Oklahoma, Illinois, Indiana, Ohio, Arkansas, West Virginia, Pennsylvania | NaN | NaN | NaN | NaN | NaN | Yes | NaN | NaN | Kph | NaN | NaN | NaN | NaN | 2020 | 3.0 | 2.0 | 2020 | 3.0 | 5.0 | 25.0 | 300.0 | 12000.0 | NaN | 12300.0 | NaN | NaN | 2500000.0 | NaN |
The above sections provided code snippets demonstrating how the EM-DAT hierarchy is organized.
However, since we are using Pandas, we can instead filter directly on an individual column instead of navigating the entire hierarchy.
The benefit of filtering directly on a column is that it requires only a single line of code.
For example, let’s grab all “Avalanche” events, which requires us to filter on the “Disaster Subtype” column:
# find all avalanches in the EM-DAT dataset by filtering *directly* on the
# Disaster Subtype of the original dataframe
df_avalanche = df[df["Disaster Subtype"] == "Avalanche"]
df_avalanche.tail()
And now we have a dataframe consisting of just the “Avalanche” events:
Dis No | Year | Seq | Disaster Group | Disaster Subgroup | Disaster Type | Disaster Subtype | Disaster Subsubtype | Event Name | Entry Criteria | Country | ISO | Region | Continent | Location | Origin | Associated Dis | Associated Dis2 | OFDA Response | Appeal | Declaration | Aid Contribution | Dis Mag Value | Dis Mag Scale | Latitude | Longitude | Local Time | River Basin | Start Year | Start Month | Start Day | End Year | End Month | End Day | Total Deaths | No Injured | No Affected | No Homeless | Total Affected | Reconstruction Costs ('000 US$) | Insured Damages ('000 US$) | Total Damages ('000 US$) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14580 | 2017-0466-MNG | 2017 | 466 | Natural | Hydrological | Landslide | Avalanche | NaN | NaN | Kill | Mongolia | MNG | Eastern Asia | Asia | Otgontenger mountain (Khangai mountain range, Zavkhan province) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2017 | 10.0 | 22.0 | 2017 | 10.0 | 22.0 | 17.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 95.878166 |
14625 | 2017-0034-TJK | 2017 | 34 | Natural | Hydrological | Landslide | Avalanche | NaN | NaN | Waiting | Tajikistan | TJK | Central Asia | Asia | Pamir region. Road between Douchanbe (Tadshikistan territories) and Khodjent (Sogd), Gorno-Badakhshan region (East) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2017 | 1.0 | 27.0 | 2017 | 1.0 | 28.0 | 13.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 95.878166 |
15517 | 2020-0063-AFG | 2020 | 63 | Natural | Hydrological | Landslide | Avalanche | NaN | NaN | Kill | Afghanistan | AFG | Southern Asia | Asia | Daykundi Province | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2020 | 2.0 | 13.0 | 2020 | 2.0 | 14.0 | 22.0 | 10.0 | NaN | 250.0 | 260.0 | NaN | NaN | NaN | NaN |
15625 | 2020-0574-IRN | 2020 | 574 | Natural | Hydrological | Landslide | Avalanche | NaN | NaN | Kill | Iran (Islamic Republic of) | IRN | Southern Asia | Asia | Darabad mountains | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2020 | 12.0 | 25.0 | 2020 | 12.0 | 25.0 | 12.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
15736 | 2020-0044-TUR | 2020 | 44 | Natural | Hydrological | Landslide | Avalanche | NaN | NaN | Kill | Turkey | TUR | Western Asia | Asia | Bahçesaray and Çatak districts (Van province) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2020 | 2.0 | 4.0 | 2020 | 2.0 | 5.0 | 41.0 | 84.0 | NaN | NaN | 84.0 | NaN | NaN | NaN | NaN |
I suggest using this approach if you need to filter on a specific natural disaster type in EM-DAT.
I assume you have enough Pandas knowledge to understand this, but thought I would include the disclaimer as a matter of completeness.
The hierarchical structure of the EM-DAT dataset facilities a systematic approach to drilling down into natural disaster types.
As a data scientist, I can utilize either a top-down or bottom-up approach in my analysis of natural disasters.
Furthermore, using Pandas and column-based indexing, drilling down into natural disaster type is trivially easy.
Note: While the above code snippets explored meteorological events, the same approach can be utilized for other events in the EM-DAT dataset, including biological, climatological, etc.
Understanding the EM-DAT hierarchy, and how to effectively navigate it using Pandas, will equip you with a robust toolkit to explore the vast data on natural disasters in the EM-DAT database.
Adrian Rosebrock. “Decoding the EM-DAT Natural Disaster Dataset Hierarchy”, NaturalDisasters.ai, 2023, https://naturaldisasters.ai/posts/em-dat-dataset-hierarchy-explained/.
@incollection{ARosebrock_EMDATDatasetHierarchy”,
author = {Adrian Rosebrock},
title = {Decoding the EM-DAT Natural Disaster Dataset Hierarchy},
booktitle = {NaturalDisasters.ai},
year = {2023},
url = {https://naturaldisasters.ai/posts/em-dat-dataset-hierarchy-explained/},
}
AI generated content disclaimer: I’ve used a sprinkling of AI magic in this blog post, namely in the “Takeaways” section, where I used AI to create a concise summary of this article. Don’t fret, my human eyeballs have read and edited every word of the AI generated content, so rest assured, what you’re reading is as accurate as I possibly can make it. If there are any discrepancies or inaccuracies in the post, it’s my fault, not that of our machine assistants.
Header photo by Joshua Earle on Unsplash