Package 'overviewR' reference manual

Title:	Easily Extracting Information About Your Data
Description:	Makes it easy to display descriptive information on a data set. Getting an easy overview of a data set by displaying and visualizing sample information in different tables (e.g., time and scope conditions). The package also provides publishable 'LaTeX' code to present the sample information.
Authors:	Cosima Meyer [cre, aut], Dennis Hammerschmidt [aut]
Maintainer:	Cosima Meyer <[email protected]>
License:	GPL-3
Version:	0.0.13
Built:	2025-03-10 04:20:22 UTC
Source:	https://github.com/cosimameyer/overviewr

calculate_share_non_row_wise

Description

Function used in 'overview_na' to calculate the column-wise share of NA

Usage

calculate_share_non_row_wise(dat = NULL)
calculate_share_non_row_wise(dat = NULL)

Arguments

dat

Data frame

Value

The function returns a data set that has the information on the column-wise NA share

calculate_share_row_wise

Description

Function used in 'overview_na' to calculate the share of NA row-wise

Usage

calculate_share_row_wise(dat = NULL)
calculate_share_row_wise(dat = NULL)

Arguments

dat

Data frame

Value

The function returns a data set that has the information on the row-wise NA share

find_int_runs

Description

Function used in 'overview_tab' to find running integers

Usage

find_int_runs(run = NULL)
find_int_runs(run = NULL)

Arguments

run

Variable (integer) that should be checked for consecutive values

Value

The function returns a data set

overview_add_na_output

Description

Function used in 'overview_na' to generate a new data frame with na_count and percentage share of NAs for each row

Usage

overview_add_na_output(dat_result = NULL, dat = NULL)
overview_add_na_output(dat_result = NULL, dat = NULL)

Arguments

`dat_result`	Data.frame from 'overview_na'
`dat`	Data frame

Value

The function returns a data set that has the information on the row-wise NA share

overview_crossplot

Description

This function plots a ggplot to visualize a cross table plot.

Usage

overview_crossplot(
  dat,
  id,
  time,
  cond1,
  cond2,
  threshold1,
  threshold2,
  xaxis = "Condition 1",
  yaxis = "Condition 2",
  label = FALSE,
  color = FALSE,
  dot_size = 2,
  fontsize = 2.5
)
overview_crossplot(
  dat,
  id,
  time,
  cond1,
  cond2,
  threshold1,
  threshold2,
  xaxis = "Condition 1",
  yaxis = "Condition 2",
  label = FALSE,
  color = FALSE,
  dot_size = 2,
  fontsize = 2.5
)

Arguments

`dat`	Your data set
`id`	Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot.
`time`	Your time (e.g., time periods given by years, months, ...)
`cond1`	Variable that describes the first condition
`cond2`	Variable that describes the second condition
`threshold1`	A threshold for `cond1`
`threshold2`	A threshold for `cond2`
`xaxis`	Label of the x axis ("Condition 1" is default)
`yaxis`	Label of the y axis ("Condition 2" is default)
`label`	Label of the observations. Overlapping labels are avoided by using 'ggrepel'
`color`	Color of the different observation groups
`dot_size`	Option argument that defines the dot size (default is 2)
`fontsize`	If label is TRUE, the fontsize arguments allows to define the text of the labels (the default is 2.5)

Value

A ggplot figure that presents the sample information visually in a cross table

Examples

data(toydata)
overview_crossplot(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)
data(toydata)
overview_crossplot(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)

Sorts a data set conditionally in a cross table. This can be helpful to get a sense of the time and scope conditions of a data set. Note, if used with a data set that has multiple observations on the id-time unit, the function automatically aggregates this information using the mean.

Usage

overview_crosstab(dat, cond1, cond2, threshold1, threshold2, id, time)
overview_crosstab(dat, cond1, cond2, threshold1, threshold2, id, time)

Arguments

`dat`	A data set object
`cond1`	Variable that describes the first condition
`cond2`	Variable that describes the second condition
`threshold1`	A threshold for `cond1`
`threshold2`	A threshold for `cond2`
`id`	Scope (e.g., country codes or individual IDs)
`time`	Time (e.g., time periods given by years, months, ...)

Value

A data frame object that contains a summary of the data set that can later be converted to a 'LaTeX' output using overview_latex

Examples

data(toydata)
overview_crosstab(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)
data(toydata)
overview_crosstab(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)

overview_heat

Description

This function plots a heat map to visualize the coverage of the time-scope-units of the data. Options include total number of cases per time-scope-unit or relative number in percentage.

Usage

overview_heat(
  dat,
  id,
  time,
  perc = FALSE,
  exp_total = NULL,
  xaxis = "Time frame",
  yaxis = "Sample",
  col_low = "#dceaf2",
  col_high = "#2A5773",
  label = TRUE
)
overview_heat(
  dat,
  id,
  time,
  perc = FALSE,
  exp_total = NULL,
  xaxis = "Time frame",
  yaxis = "Sample",
  col_low = "#dceaf2",
  col_high = "#2A5773",
  label = TRUE
)

Arguments

`dat`	The data set
`id`	The scope (e.g., country codes or individual IDs). The axis is ordered in ascending order by default.
`time`	The time (e.g., time periods given by years, months, ...)
`perc`	If FALSE (default) plot returns the total number of observations per time-scope-unit. If TRUE, it returns the number of observations per time-scope-unit in percentage
`exp_total`	Expected total number of observations (i.e. maximum) for time unit.
`xaxis`	Label of your x axis ("Time frame" is default)
`yaxis`	Label of your y axis ("Sample" is default)
`col_low`	Hex color code for the lowest value (default is "#dceaf2")
`col_high`	Hex color code for the lowest value (default is "#2A5773")
`label`	If TRUE (default), the total number of observations/percentages of observations are displayed. If FALSE, it returns no labels.

Value

A ggplot figure that presents sample coverage visually

Examples

data(toydata)
overview_heat(toydata, ccode, year, perc = TRUE, exp_total = 12)
data(toydata)
overview_heat(toydata, ccode, year, perc = TRUE, exp_total = 12)

overview_latex

Description

Produces a 'LaTeX' output for output obtained via overview_tab and overview_crosstab

Usage

overview_latex(
  obj,
  title = "Time and scope of the sample",
  id = "Sample",
  time = "Time frame",
  crosstab = FALSE,
  cond1 = "Condition 1",
  cond2 = "Condition 2",
  save_out = FALSE,
  file_path,
  label = "tab:tab1",
  fontsize,
  file,
  path
)
overview_latex(
  obj,
  title = "Time and scope of the sample",
  id = "Sample",
  time = "Time frame",
  crosstab = FALSE,
  cond1 = "Condition 1",
  cond2 = "Condition 2",
  save_out = FALSE,
  file_path,
  label = "tab:tab1",
  fontsize,
  file,
  path
)

Arguments

`obj`	Overview object produced by overview_tab or overview_crosstab
`title`	Caption of the table (default is "Time and scope of the sample")
`id`	The name of the left column (default is "Sample"), will be ignored if crosstab is TRUE
`time`	The name of the right column (default is ("Time frame")), will be ignored if `crosstab` is TRUE
`crosstab`	Logical argument, if TRUE produces a `crosstab` output, default is FALSE
`cond1`	Description for the first condition (character), will be ignored if `crosstab` is FALSE. This should correspond to the input for `cond1` in `overview_crosstab`
`cond2`	Description for the second condition (character), will be ignored if `crosstab` is FALSE. This should correspond to the input for `cond2` in `overview_crosstab`
`save_out`	Optional argument, exports the output table as a .tex file, default is FALSE
`file_path`	Specifies the path and file name (.tex) where you store your output
`label`	Specifies the label (default is "tab:tab1")
`fontsize`	Specifies the font size (all 'LaTeX' font sizes such as "scriptsize" or "small" work)
`file`	This argument is deprecated. Please use "file_path" instead and add the full path.
`path`	This argument is deprecated. Please use "file_path" instead and add the full path.

Value

A 'LaTeX' output that can either be copy-pasted in a text document or exported directed as a .tex file

Examples

data(toydata)

overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
  obj = overview_object,
  title = "Some nice title",
  crosstab = FALSE
)

#' overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
  obj = overview_object,
  title = "Some nice title",
  file_path = "some/path_to/your_output_file.tex"
)

overview_ct_object <- overview_crosstab(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)
overview_latex(
  obj = overview_ct_object,
  title = "Some nice title for a cross tab",
  crosstab = TRUE,
  cond1 = "Name of first condition",
  cond2 = "Name of second condition"
)
data(toydata)

overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
  obj = overview_object,
  title = "Some nice title",
  crosstab = FALSE
)

#' overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
  obj = overview_object,
  title = "Some nice title",
  file_path = "some/path_to/your_output_file.tex"
)

overview_ct_object <- overview_crosstab(
  dat = toydata,
  cond1 = gdp,
  cond2 = population,
  threshold1 = 25000,
  threshold2 = 27000,
  id = ccode,
  time = year
)
overview_latex(
  obj = overview_ct_object,
  title = "Some nice title for a cross tab",
  crosstab = TRUE,
  cond1 = "Name of first condition",
  cond2 = "Name of second condition"
)

overview_na

Description

This function plots a ggplot to visualize the distribution of NAs across all variables in the data set.

Usage

overview_na(
  dat,
  yaxis = "Variables",
  perc = TRUE,
  row_wise = FALSE,
  add = FALSE
)
overview_na(
  dat,
  yaxis = "Variables",
  perc = TRUE,
  row_wise = FALSE,
  add = FALSE
)

Arguments

`dat`	Your data set
`yaxis`	Label of your y axis ("Variables" is default)
`perc`	If TRUE (default) plot returns the number of NAs in percentage
`row_wise`	If TRUE (FALSE is default) plot return the number of NAs per row
`add`	If TRUE (FALSE is default) it generates a new data frame with na_count and percentage share of NAs for each row

Value

Depending on the selection, the function returns a ggplot figure that presents the distribution of NAs in the data set or adds the information on the row-wise NA share

Examples

data(toydata)
overview_na(toydata, perc = FALSE)
data(toydata)
overview_na(toydata, perc = FALSE)

overview_overlap

Description

Provides an overview of the overlap of two data sets. Cautionary note: This function is currently only preliminary workable and can only capture 2 data sets. We are working on an extension that allows to compare multiple data sets.

Usage

overview_overlap(
  dat1,
  dat2,
  dat1_id,
  dat2_id,
  dat1_name = "Data set 1",
  dat2_name = "Data set 2",
  plot_type = "bar"
)
overview_overlap(
  dat1,
  dat2,
  dat1_id,
  dat2_id,
  dat1_name = "Data set 1",
  dat2_name = "Data set 2",
  plot_type = "bar"
)

Arguments

`dat1`	A first data set object
`dat2`	A second data set object
`dat1_id`	Scope (e.g., country codes or individual IDs) of dat1. It is important that both ID variables are exactly the same to generate the perfect match.
`dat2_id`	Scope (e.g., country codes or individual IDs) of dat2. It is important that both ID variables are exactly the same to generate the perfect match.
`dat1_name`	Name of dat1 ("Data set 1" is the default)
`dat2_name`	Name of dat2 ("Data set 2" is the default)
`plot_type`	Type of plot ("bar" and "venn" are the two options) "venn" relies on the ggvenn function

Value

A ggplot2 object (bar chart) that shows the overlap of two data sets.

Examples

## Not run: 
data(toydata)
toydata2 <- toydata[which(toydata$year > 1992), ]
overview_overlap(
  dat1 = toydata, dat2 = toydata2, dat1_id = ccode,
  dat2_id = ccode
)

## End(Not run)
## Not run: 
data(toydata)
toydata2 <- toydata[which(toydata$year > 1992), ]
overview_overlap(
  dat1 = toydata, dat2 = toydata2, dat1_id = ccode,
  dat2_id = ccode
)

## End(Not run)

overview_plot

Description

This function plots a ggplot to visualize the distribution of scope objects across the time frame.

Usage

overview_plot(
  dat,
  id,
  time,
  xaxis = "Time frame",
  yaxis = "Sample",
  asc = TRUE,
  color,
  dot_size = 2
)
overview_plot(
  dat,
  id,
  time,
  xaxis = "Time frame",
  yaxis = "Sample",
  asc = TRUE,
  color,
  dot_size = 2
)

Arguments

`dat`	Your data set
`id`	Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot.
`time`	Your time (e.g., time periods given by years, months, ...)
`xaxis`	Label of the x axis ("Time frame" is default)
`yaxis`	Label of the y axis ("Sample" is default)
`asc`	Sorting the y axis in ascending order ("TRUE" is default)
`color`	Optional argument that defines the color
`dot_size`	Option argument that defines the dot size (default is 2)

Value

A ggplot figure that presents the sample information visually

Examples

data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)
data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)

overview_plot_absolute

Description

Function used in 'overview_na' to plot the absolute share of NA values

Usage

overview_plot_absolute(
  dat_result = NULL,
  theme_plot = NULL,
  yaxis = NULL,
  xaxis = NULL
)
overview_plot_absolute(
  dat_result = NULL,
  theme_plot = NULL,
  yaxis = NULL,
  xaxis = NULL
)

Arguments

`dat_result`	Data frame
`theme_plot`	Theme for the plot (pre-defined)
`yaxis`	Name for yaxis
`xaxis`	Name for xaxix

Value

The function returns a ggplot

overview_plot_percentage

Description

Function used in 'overview_na' to plot the percentage share of NA values

Usage

overview_plot_percentage(
  dat_result = NULL,
  theme_plot = NULL,
  yaxis = NULL,
  xaxis = NULL
)
overview_plot_percentage(
  dat_result = NULL,
  theme_plot = NULL,
  yaxis = NULL,
  xaxis = NULL
)

Arguments

`dat_result`	Data frame
`theme_plot`	Theme for the plot (pre-defined)
`yaxis`	Name for yaxis
`xaxis`	Name for xaxix

Value

The function returns a ggplot

overview_tab

Description

Provides an overview table for the time and scope conditions of a data set. If a data.table object is provided, the function uses data.table's syntax to perform the evaluation

Usage

overview_tab(
  dat,
  id,
  time = list(year = NULL, month = NULL, day = NULL),
  complex_date = FALSE
)
overview_tab(
  dat,
  id,
  time = list(year = NULL, month = NULL, day = NULL),
  complex_date = FALSE
)

Arguments

`dat`	A data frame or data table object
`id`	Scope (e.g., country codes or individual IDs)
`time`	Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing one time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
`complex_date`	Boolean argument identifying if there is a more complex (list-wise) date_time parameter (FALSE is the default)

Value

A data frame object that contains a summary of a sample that can later be converted to a 'LaTeX' output using overview_latex

Examples

# With version 1 (and also 2):

data(toydata)
output_table <- overview_tab(dat = toydata, id = ccode, time = year)

# With version 3:
overview_tab(dat = toydata, id = ccode, time = list(
  year = toydata$year,
  month = toydata$month, day = toydata$day
), complex_date = TRUE)

# With version 1 (and also 2):

data(toydata)
output_table <- overview_tab(dat = toydata, id = ccode, time = year)

# With version 3:
overview_tab(dat = toydata, id = ccode, time = list(
  year = toydata$year,
  month = toydata$month, day = toydata$day
), complex_date = TRUE)

overview_tab_df

Description

Internal function that calculates the 'overview_tab' for data.frame objects

Usage

overview_tab_df(dat2 = NULL, dat = NULL, id = NULL, time = NULL)
overview_tab_df(dat2 = NULL, dat = NULL, id = NULL, time = NULL)

Arguments

`dat2`	Your data set
`dat`	Your data set
`id`	Scope (e.g., country codes or individual IDs)
`time`	Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing one time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').

Value

A data.frame

overview_tab_dt

Description

Internal function that calculates the 'overview_tab' for data.table objects

Usage

overview_tab_dt(dat = NULL, id = NULL, time = NULL, col_names = NULL)
overview_tab_dt(dat = NULL, id = NULL, time = NULL, col_names = NULL)

Arguments

`dat`	Your data set
`id`	Scope (e.g., country codes or individual IDs)
`time`	Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing one time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)').
`col_names`	The column names (containing id and time)

Value

A data.table

theme_heat_plot

Description

Defines the theme for the 'overview_heat' plot function

Usage

theme_heat_plot()
theme_heat_plot()

Value

A theme for the 'overview_heat' plot

theme_na_plot

Description

Defines the theme for the 'overview_na' plot function

Usage

theme_na_plot()
theme_na_plot()

Value

A theme for the 'overview_na' plot

Cross-sectional data for countries

Description

Small, artificially generated toy data set that comes in a cross-sectional format where the unit of analysis is either country-year or country-year-month. It provides artificial information for five countries (Angola, Benin, France, Rwanda, and the UK) for a time span from 1990 to 1999 to illustrate the use of the package.

Usage

data(toydata)
data(toydata)

Format

An object of class "data.frame"

ccode: ISO3 country code (as character) for the countries in the sample (Angola, Benin, France, Rwanda, and UK)
year: A value between 1990 and 1999
month: An abbreviation (MMM) for month (character)
gpd: A fake value for GDP (randomly generated)
population: A fake value for population (randomly generated)

References

This data set was artificially created for the overviewR package.

Examples


data(toydata)
head(toydata)
data(toydata)
head(toydata)

Package 'overviewR'

Help Index

.overview_tab

Description

Usage

Arguments

Value

.overview_tab

Description

Usage

Arguments

Value

calculate_share_non_row_wise

Description

Usage

Arguments

Value

calculate_share_row_wise

Description

Usage

Arguments

Value

find_int_runs

Description

Usage

Arguments

Value

overview_add_na_output

Description

Usage

Arguments

Value

overview_crossplot

Description

Usage

Arguments

Value

Examples

overview_crosstab

Description

Usage

Arguments

Value

Examples

overview_heat

Description

Usage

Arguments

Value

Examples

overview_latex

Description

Usage

Arguments

Value

Examples

overview_na

Description

Usage

Arguments

Value

Examples

overview_overlap

Description

Usage

Arguments

Value

Examples

overview_plot

Description

Usage

Arguments

Value

Examples

overview_plot_absolute

Description

Usage

Arguments

Value

overview_plot_percentage