Title: | Wrangler for Emergency Events Database |
---|---|
Description: | Makes research involving EMDAT and related datasets easier. These Datasets are manually filled and have several formatting and compatibility issues. Weed aims to resolve these with its functions. |
Authors: | Ram Kripa [aut, cre] |
Maintainer: | Ram Kripa <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.2 |
Built: | 2025-03-09 05:57:52 UTC |
Source: | https://github.com/rammkripa/weed |
Uses the location_word
and Country
columns of the data frame to make queries
to the geonames API and geocode the locations in the dataset.
Note:
The Geonames API (for free accounts) limits you to 1000 queries an hour
You need a geonames username to make queries. You can learn more about that here
geocode(., n_results = 1, unwrap = FALSE, geonames_username)
geocode(., n_results = 1, unwrap = FALSE, geonames_username)
. |
a data frame which has been locationized (see |
n_results |
number of lat/longs to get |
unwrap |
if true, returns lat1, lat2, lng1, lng2 etc. as different columns, otherwise one lat column and 1 lng column |
geonames_username |
Username for geonames API. More about getting one is in the note above. |
the same data frame with a lat column/columns and lng column/columns
df <- tibble::tribble( ~value, ~location_word, ~Country, "mumbai region, district of seattle, sichuan province", "mumbai","India", "mumbai region, district of seattle, sichuan province", "seattle", "USA" ) geocode(df, n_results = 1, unwrap = TRUE, geonames_username = "rammkripa")
df <- tibble::tribble( ~value, ~location_word, ~Country, "mumbai region, district of seattle, sichuan province", "mumbai","India", "mumbai region, district of seattle, sichuan province", "seattle", "USA" ) geocode(df, n_results = 1, unwrap = TRUE, geonames_username = "rammkripa")
Geocode in batches
geocode_batches( ., batch_size = 990, wait_time = 4800, n_results = 1, unwrap = FALSE, geonames_username )
geocode_batches( ., batch_size = 990, wait_time = 4800, n_results = 1, unwrap = FALSE, geonames_username )
. |
data frame |
batch_size |
size of each batch to geocode |
wait_time |
in seconds between batches Note: default batch_size and wait_time were set to accomplish the geocoding task optimally within the constraints of geonames free access |
n_results |
same as geocode |
unwrap |
as in geocode |
geonames_username |
as in geocode |
df geocoded
df <- tibble::tribble( ~value, ~location_word, ~Country, "mumbai region, district of seattle, sichuan province", "mumbai","India", "mumbai region, district of seattle, sichuan province", "seattle", "USA", "mumbai region, district of seattle, sichuan province", "sichuan", "China, People's Republic" ) geocode_batches(df, batch_size = 2, wait_time = 0.4, geonames_username = "rammkripa")
df <- tibble::tribble( ~value, ~location_word, ~Country, "mumbai region, district of seattle, sichuan province", "mumbai","India", "mumbai region, district of seattle, sichuan province", "seattle", "USA", "mumbai region, district of seattle, sichuan province", "sichuan", "China, People's Republic" ) geocode_batches(df, batch_size = 2, wait_time = 0.4, geonames_username = "rammkripa")
Creates a new column (in_box) that tells whether the lat/long is in a certain box or not.
located_in_box( ., lat_column = "lat", lng_column = "lng", top_left_lat, top_left_lng, bottom_right_lat, bottom_right_lng )
located_in_box( ., lat_column = "lat", lng_column = "lng", top_left_lat, top_left_lng, bottom_right_lat, bottom_right_lng )
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
top_left_lat |
Latitude at top left corner of box |
top_left_lng |
Longitude at top left corner of box |
bottom_right_lat |
Latitude at bottom right corner of box |
bottom_right_lng |
Longitude at bottom right corner of box |
A dataframe that contains the latlong box data
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) located_in_box(d, lat_column = "lat", lng_column = "lng", top_left_lat = 45, bottom_right_lat = 12, top_left_lng = -80, bottom_right_lng = 90)
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) located_in_box(d, lat_column = "lat", lng_column = "lng", top_left_lat = 45, bottom_right_lat = 12, top_left_lng = -80, bottom_right_lng = 90)
Creates a new column (in_shape) that tells whether the lat/long is in a certain shapefile.
located_in_shapefile( ., lat_column = "lat", lng_column = "lng", shapefile = NA, shapefile_name = NA )
located_in_shapefile( ., lat_column = "lat", lng_column = "lng", shapefile = NA, shapefile_name = NA )
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
shapefile |
The shapefile itself (either shapefile or shapefile_name must be provided) |
shapefile_name |
FileName/Path to shapefile (either shapefile or shapefile_name must be provided) |
Data Frame with the shapefile data as well as the previous data
## Not run: d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.2847) located_in_shapefile(d, lat_column = "lat", lng_column = "lng", shapefile_name = "~/dummy_name") ## End(Not run)
## Not run: d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.2847) located_in_shapefile(d, lat_column = "lat", lng_column = "lng", shapefile_name = "~/dummy_name") ## End(Not run)
Nest Location Data into a column of Tibbles
nest_locations( ., key_column = "Dis No", columns_to_nest = c("location_word", "lat", "lng"), keep_nested_cols = FALSE )
nest_locations( ., key_column = "Dis No", columns_to_nest = c("location_word", "lat", "lng"), keep_nested_cols = FALSE )
. |
Locationized data frame (see |
key_column |
Column name for Column that uniquely IDs each observation |
columns_to_nest |
Column names for Columns to nest inside the mini-dataframes |
keep_nested_cols |
Boolean to Keep the nested columns externally or not. |
Data Frame with A column of data frames
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york","new york","USA", c(40.71427, 40.6501), c(-74.00597, -73.94958), "kerala", "kerala", "India",c(10.41667, 8.4855), c(76.5, 76.94924), "chennai municipality","chennai","India", c(13.08784, 12.98833),c(80.27847, 80.16578), "san francisco", "san francisco","USA", c(37.77493, 37.33939), c(-122.41942, -121.89496)) nest_locations(d, key_column = "value")
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york","new york","USA", c(40.71427, 40.6501), c(-74.00597, -73.94958), "kerala", "kerala", "India",c(10.41667, 8.4855), c(76.5, 76.94924), "chennai municipality","chennai","India", c(13.08784, 12.98833),c(80.27847, 80.16578), "san francisco", "san francisco","USA", c(37.77493, 37.33939), c(-122.41942, -121.89496)) nest_locations(d, key_column = "value")
Tells us how successful the geocoding is.
How many of the disasters in this data frame have non NA coordinates?
percent_located_disasters( ., how = "any", lat_column = "lat", lng_column = "lng", plot_result = TRUE )
percent_located_disasters( ., how = "any", lat_column = "lat", lng_column = "lng", plot_result = TRUE )
. |
Data Frame that has been locationized. see |
how |
takes in a function, "any", or "all" to determine how to count the disaster as being geocoded if any, at least one location must be coded, if all, all locations must have lat/lng if a function, it must take in a logical vector and return a single logical |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
plot_result |
Determines output type (Plot or Summarized Data Frame) |
The percent and number of Locations that have been geocoded (see plot_result
for type of output)
d <- tibble::tribble( ~`Dis No`, ~value, ~location_word, ~Country, ~lat, ~lng, 1, "city of new york", "new york", "USA", 40.71427, -74.00597, 2, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, 2, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) percent_located_disasters(d, how = "any", lat_column = "lat", lng_column = "lng", plot_result = FALSE)
d <- tibble::tribble( ~`Dis No`, ~value, ~location_word, ~Country, ~lat, ~lng, 1, "city of new york", "new york", "USA", 40.71427, -74.00597, 2, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, 2, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) percent_located_disasters(d, how = "any", lat_column = "lat", lng_column = "lng", plot_result = FALSE)
Tells us how successful the geocoding is.
How many of the locations in this data frame have non NA coordinates?
percent_located_locations( ., lat_column = "lat", lng_column = "lng", plot_result = TRUE )
percent_located_locations( ., lat_column = "lat", lng_column = "lng", plot_result = TRUE )
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
plot_result |
Determines output type (Plot or Summarized Data Frame) |
The percent and number of Locations that have been geocoded (see plot_result
for type of output)
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) percent_located_locations(d, lat_column = "lat", lng_column = "lng", plot_result = FALSE)
d <- tibble::tribble( ~value, ~location_word, ~Country, ~lat, ~lng, "city of new york", "new york", "USA", 40.71427, -74.00597, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847) percent_located_locations(d, lat_column = "lat", lng_column = "lng", plot_result = FALSE)
Reads Excel files downloaded from the EMDAT Database linked here
read_emdat(path_to_file, file_data = TRUE)
read_emdat(path_to_file, file_data = TRUE)
path_to_file |
A String, the Path to the file downloaded. |
file_data |
A Boolean, Do you want information about the file and how it was created? |
Returns a list containing one or two tibbles, one for the Disaster Data, and one for File Metadata.
## Not run: read_emdat(path_to_file = "~/dummy", file_data = TRUE) ## End(Not run)
## Not run: read_emdat(path_to_file = "~/dummy", file_data = TRUE) ## End(Not run)
Changes the unit of analysis from a disaster, to a disaster-location. This is useful as preprocessing before geocoding each disaster-location pair.
Can be used in piped operations, making it tidy!
split_locations( ., column_name = "locations", dummy_words = c("cities", "states", "provinces", "districts", "municipalities", "regions", "villages", "city", "state", "province", "district", "municipality", "region", "township", "village", "near", "department"), joiner_regex = ",|\\(|\\)|;|\\+|( and )|( of )" )
split_locations( ., column_name = "locations", dummy_words = c("cities", "states", "provinces", "districts", "municipalities", "regions", "villages", "city", "state", "province", "district", "municipality", "region", "township", "village", "near", "department"), joiner_regex = ",|\\(|\\)|;|\\+|( and )|( of )" )
. |
data frame of disaster data |
column_name |
name of the column containing the locations |
dummy_words |
a vector of words that we don't want in our final output. |
joiner_regex |
a regex that tells us how to split the locations |
same data frame with the location_word column added as well as a column called uncertain_location_specificity where the same location could be referred to in varying levels of specificity
locs <- c("city of new york", "kerala, chennai municipality, and san francisco", "mumbai region, district of seattle, sichuan province") d <- tibble::as_tibble(locs) split_locations(d, column_name = "value")
locs <- c("city of new york", "kerala, chennai municipality, and san francisco", "mumbai region, district of seattle, sichuan province") d <- tibble::as_tibble(locs) split_locations(d, column_name = "value")