Title: | Data and Functions to Accompany the Book "Learning R" |
---|---|
Description: | Crabs in the English channel, deer skulls, English monarchs, half-caste Manga characters, Jamaican cities, Shakespeare's The Tempest, drugged up cyclists and sexually transmitted diseases. |
Authors: | Richie Cotton |
Maintainer: | Richie Cotton <[email protected]> |
License: | Unlimited |
Version: | 0.30 |
Built: | 2024-11-11 02:57:59 UTC |
Source: | https://github.com/richierocks/learningr |
Fastest times for the Alpe d'Huez stage of the Tour de France cycle race, plus some contextual information on year and drug use allegations.
A data frame with the following columns.
Character time of ride in the form M' S".
Numeric time of ride in minutes.
Name of rider.
Year of race.
Nationality of rider.
Have allegations of drug use been made against the
rider? In alpe_d_huez
the values are "Y" or "N"; in
alpe_d_huez2
this is a logical vector.
The data was kindly compiled by William Hogan.
The dataset is not guaranteed to be error free. Please double check the data if you want to use it for something serious.
William Hogan [email protected] compiled the data from http://en.wikipedia.org/wiki/Alpe_d%27Huez. Richard Cotton [email protected] made some modifications while importing it into R.
An old version of the plyr
package's count
function that fails when you pass it a factor
input.
buggy_count(df, vars = NULL, wt_var = NULL)
buggy_count(df, vars = NULL, wt_var = NULL)
df |
A data frame or an atomic input. |
vars |
Variables in |
wt_var |
Optional variable to weight by. See
|
A data frame with label and freq columns.
In case the “buggy” part of the name didn't give it away, this is not suitable for real world usage!
## Not run: buggy_count(factor()) #oops! ## End(Not run)
## Not run: buggy_count(factor()) #oops! ## End(Not run)
Depth and temperature data from a sensor tag attached to an edible crab (Cancer Pangurus) in the North Sea in 2008 and 2009.
A list with 5 elements, as follows.
id_block
is a list with 2 elements
Version of the firmware used in the crab tag.
Build level of the firmware used in the crab tag.
tag_notebook
is a list with 5 elements.
Number of days of data.
Date and time that the tag was released into the sea.
UNKNOWN
UNKNOWN
UNKNOWN
lifetime_notebook
is a list with 3 elements.
The unique ID of the tag.
UNKNOWN
The number of sensors on the tag.
deployment_notebook
is a data frame with X columns.
Start date and time of recording.
Stop date and time of recording.
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
UNKNOWN
daylog
is a data frame with X columns.
Integer number of days since the start of recording.
Date of record.
Maximum temperature (Celcius) recorded that day.
Minimum temperature (Celcius) recorded that day.
Maximum depth (m) recorded that day.
Minimum depth (m) recorded that day.
Voltage of tag battery.
This data was kindly supplied by Ewan Hunter of CEFAS. It is part of a larger dataset consisting of many crabs.
Ewan Hunter [email protected] ran the project where the data was collected. A full analysis is in Hunter, E, Eaton, D, Stewart, C, Lawler, A & Smith, M. 2013. Edible crabs "go west": migrations and incubation cycle revealed by electronic tags. https://www.ncbi.nlm.nih.gov/pubmed/23734180. Richard Cotton [email protected] made some modifications while importing it into R.
The dataset contains the endocranial volume of 33 red deer (Cervus elaphus), using four different methods of measurement: cardiac tomography, filling the skull with glass beads (yes, the skulls are from dead deer), simply measuring the length, width and height and multiplying the numbers, and using Finarelli's equation. "Endocranial volume" is a proxy for brain size.
A data frame with the following columns.
A unique identifier for each red deer.
Endocranial volume calculated by cardiac tomography.
Endocranial volume calculated by glass beads.
Endocranial volume calculated by length*width*height.
Endocranial volume calculated by Finarelli's equation.
A second measurement via cardiac tomography.
A second measurement via glass beads.
A second measurement via l*w*h.
The data was kindly provided by Corina Logan. Second measurements are provided for several of the deer. Finarelli's equation is used for estimating the brain volume of non-bovid ruminant Artiodactylid species (say that 10 times fast).
The dataset was collected by Corina Logan
[email protected]. It is stored in the Dryad
Digital Repository, doi:10.5061/dryad.4t7h2
. A more
full analysis is given in the paper Logan CJ,
Clutton-Brock TH. 2013. Validating methods for estimating
endocranial volume in individual red deer (Cervus
elaphus). Behavioural Processes 92:143-146.
doi:10.1016/j.beproc.2012.10.015.
http://www.sciencedirect.com/science/article/pii/S037663571200232X
Richard Cotton [email protected] made some
modifications while importing it into R.
Names, dates and houses of English kings and queens from post-Roman rule (the fifth century) until England invaded Ireland in the Early 13th century.
A data frame with the following columns.
Name of monarch(s).
Royal house of monarch(s).
Year they rose to power.
Year they left power.
Region of England ruled over.
This dataset is a bit messy and ambiguous in places, because history is like that. In fact, the messy parts of the dataset are in general a good indicator that soemthing interesting was happening at the time. (See, for example, missing or multiple rulers, starts and ends of reigns in the same year, and rulers that appear several times with different territories.) Evan defining a monarch of England is tricky. Most of the monarchs in this dataset were around before England existed (it consisted of seven territories called the heptarchy). The data stops before John I (the bad guy from the Robin Hood stories) because he proclaimed himself King of Ireland, although some people consider monarchs up to Anne, five hundred years later, to be English Monarchs even though they ruled over Ireland, Wales and Scotland to varying degrees.
The heptarchy consisted of East Anglia, Essex, Kent, Mercia, Northumbria, Sussex and Wessex. Northubria was originally divided into Deria and Bernicia. There are also periods of Norse and Danish rule.
The dataset was compiled from Wikipedia and thus is not guaranteed to be error free. Please double check the data if you want to use it for something serious.
Richard Cotton [email protected] compiled the dataset from various Wikipedia pages. http://en.wikipedia.org/wiki/Kings_of_england http://en.wikipedia.org/wiki/Kings_of_East_Anglia http://en.wikipedia.org/wiki/Kings_of_Essex http://en.wikipedia.org/wiki/Kings_of_Kent http://en.wikipedia.org/wiki/Kings_of_Mercia http://en.wikipedia.org/wiki/Kings_of_Northumbria http://en.wikipedia.org/wiki/Kings_of_Sussex http://en.wikipedia.org/wiki/Kings_of_Wessex
Some filenames have been altered in order to comply with portability requirements on CRAN. This function converts the filenames between the CRAN forms and the book forms.
fix_filenames(x = c("tobook", "tocran"), dir = system.file("extdata", packages = "learningr"))
fix_filenames(x = c("tobook", "tocran"), dir = system.file("extdata", packages = "learningr"))
x |
Either “tocran” or “tobook”. |
dir |
Directory containing the files. |
A logical vector of length 4, TRUE
for each file
whose name was changed.
## Not run: #To convert the files to the book form, use: fix_filenames("tobook") #The files were converted to CRAN form using: fix_filenames("tocran", "learningr/inst/extdata") ## End(Not run)
## Not run: #To convert the files to the book form, use: fix_filenames("tobook") #The files were converted to CRAN form using: fix_filenames("tocran", "learningr/inst/extdata") ## End(Not run)
Rates of gonorrhoea infection in the US by year, with contextual information about age, ethnicity and gender.
A data frame with the following columns.
Year that infected people visited the clinic.
Age group of person infected.
Ethnicity of person infected.
Gender of person infected.
Number of infections per 100000 people.
Compiled by Richard Cotton [email protected] from http://www.cdc.gov/std/stats11/tables/22b.htm
Half-caste manga characters.
Both data frames have the following columns.
Integer year that the manga was made.
Name of series.
Name of character.
Gender of character.
Nationality of character's father.
Nationality of character's mother.
Character's eye colour.
Character's hair colour.
Notes on data collection or ambiguity.
hafu2
has these additional columns.
The dataset was kindly provided by Gwern Branwen.
hafu2
is a lightly cleaned up version of
hafu
.
Gwern's notes: The following list includes manga, light novel, anime, and video game characters (there being little point in keeping the mediums separate). It also includes characters who are not hafu themselves but a quarter-foreign inasmuch as they imply a hafu at some point. Characters are treated separately even if they are in the same work (eg. siblings). Classification is based on in-universe or out-of- universe information, since appearance can be highly misleading in anime (blue eyes may indicate heroic status, rather than being Caucasian; hair color may be chosen for contrast against other characters or signal stereotypes like red hair indicating a fiery personality), and different groups will identify the same anime character as belonging to their own race (Lu 2009), perhaps due to minimalistic drawings intended to save money or enable viewers to project themselves onto a character.
The dataset was compiled by Gwern Branwen [email protected]. The original is available from http://www.gwern.net/hafu#list.
Calculate the (Pythagorean) hypotenuse of two numeric vectors using the obvious algorithm.
hypotenuse(x, y)
hypotenuse(x, y)
x |
A numeric vector. |
y |
A numeric vector. |
A numeric vector of the hyptenuse of the inputs.
This algorithm fails when the inputs are very large of very small, making it unsuitable for real-world use.
Cleve Moler (MATLAB creator and discoverer of the Moler-Morrison algorithm for calculating hypotenuses) discusses the pro and cons of several algorithms here. http://blogs.mathworks.com/cleve/2012/07/30/pythagorean-addition
hypotenuse(5, 12) #okay hypotenuse(1e-300, 1e-300) #fails hypotenuse(1e300, 1e300) #fails
hypotenuse(5, 12) #okay hypotenuse(1e-300, 1e-300) #fails hypotenuse(1e300, 1e300) #fails
learningr
contains datasets that are used in
examples in the book “Learning R”.
Richard Cotton [email protected]
Cotton, R (2013) Learning R O Reilly. ISBN 978-1-4493-5710-8.
State-by-state voting information in the 2008 US presidential election, along with contextual information on income, unemployment, ethnicity and religion.
A data frame with 52 observations (one for each US state) and the following columns.
The name of the US state.
The US Federal region.
Percentage of voters who voted for Barack Obama in the 2008 presidential election.
Percentage of voters who voted for John McCain in the 2008 presidential election.
Percentage of people who voted in the 2008 presidential election.
Percentage of people who are unemployed.
Mean annual income in US dollars.
Number of people living in the state.
Percentage of people identifying as Catholic.
Percentage of people identifying as Protestant.
Percentage of people identifying as religious, but not Catholic or Protestant.
Percentage of people identifying as non-religious.
Percentage of people identifying as black.
Percentage of people identifying as Latino.
Percentage of people living in an urban area.
Religious identification data are not available for Alaska and Hawaii. The totals of these columns is generally less than 100, since some people didn't give an answer. The District of Columbia is included, even though it isn't a state. The dataset is not guaranteed to be error free. Please double check the data if you want to use it for something serious.
This dataset was kindly compiled and provided by Edwin Thoen [email protected].
The voting information came from http://www.uselectionatlas.org/, extracted on 2011-12-09.
The ethnicity, income and urbanisation information came from http://quickfacts.census.gov, extracted on 2011-12-09.
The unemployment information came from http://data.bls.gov/timeseries/LNS14000000, extracted 2011-12-09.
The religious information came from Table 12 of the American Religious Identification Survey 2008. http://commons.trincoll.edu/aris/files/2011/08/ARIS_Report_2008.pdf.