Title: | Allocate Samples Among Strata |
---|---|
Description: | Functions for the design process of survey sampling, with specific tools for multi-wave and multi-phase designs. Perform optimum allocation using Neyman (1934) <doi:10.2307/2342192> or Wright (2012) <doi:10.1080/00031305.2012.733679> allocation, split strata based on quantiles or values of known variables, randomly select samples from strata, allocate sampling waves iteratively, and organize a complex survey design. Also includes a Shiny application for observing the effects of different strata splits. |
Authors: | Jasper Yang [aut, cre], Pamela Shaw [aut], Bryan Shepherd [ctb], Thomas Lumley [ctb], Gustavo Amorim [rev] |
Maintainer: | Jasper Yang <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1 |
Built: | 2025-02-23 21:24:23 UTC |
Source: | https://github.com/yangjasp/optimall |
Determines the adaptive optimum sampling allocation for a new sampling
wave based on results from previous waves. Using Neyman or
Wright (2014) allocation, allocate_wave
calculates the
optimum allocation for the total number of samples
across waves, determines how many were allocated to each strata
in previous waves, and allocates the remaining samples to make
up the difference.
allocate_wave( data, strata, y, already_sampled, nsample, allocation_method = c("WrightII", "WrightI", "Neyman"), method = c("iterative", "simple"), detailed = FALSE )
allocate_wave( data, strata, y, already_sampled, nsample, allocation_method = c("WrightII", "WrightI", "Neyman"), method = c("iterative", "simple"), detailed = FALSE )
data |
A data frame or matrix with one row for each
sampling unit, one column specifying each unit's stratum,
one column holding the value of the continuous variable for
which the variance should be minimized, and one column
containing a binary indicator, |
strata |
A character string or vector of character strings specifying the name of columns that indicate the stratum that each unit belongs to. |
y |
A character string specifying the name of the continuous variable for which the variance should be minimized. |
already_sampled |
A character string specifying the name of a
column that contains a binary ( |
nsample |
The desired sample size of the next wave. |
allocation_method |
A character string specifying the method of
optimum sample allocation to use. For details see
|
method |
A character string specifying the method to be used if at least one group was oversampled. Must be one of:
|
detailed |
A logical value indicating whether the output
dataframe should include details about each stratum including
the true optimum allocation without the constraint of
previous waves of sampling
and stratum standard deviations. Defaults to FALSE, unless called within
|
If the optimum sample size in a stratum is smaller than the
amount it was allocated in previous waves, that strata has been
oversampled. When oversampling occurs,
allocate_wave
"closes" the oversampled strata and
re-allocates the remaining samples optimally among the open
strata. Under these circumstances, the total sampling
allocation is no longer optimal, but optimall
will
output the most optimal allocation possible for the next wave.
Returns a dataframe with one row for each stratum and
columns specifying the stratum name ("strata"), population stratum size
("npop"
), cumulative sample in that strata
("nsample_actual"
), prior number sampled in that
strata ("nsample_prior"
), and the optimally allocated
number of units in each strata for the next wave ("n_to_sample"
).
McIsaac MA, Cook RJ. Adaptive sampling in two-phase designs: a biomarker study for progression in arthritis. Statistics in medicine. 2015 Sep 20;34(21):2899-912.
Reilly, M., & Pepe, M. S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika, 82(2), 299-314.
Wright, T. (2014). A Simple Method of Exact Optimal Sample Allocation under Stratification with any Mixed Constraint Patterns, Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Bureau of the Census, Washington, D.C.
# Create dataframe with a column specifying strata, a variable of interest # and an indicator for whether each unit was already sampled set.seed(234) mydata <- data.frame(Strata = c(rep(1, times = 20), rep(2, times = 20), rep(3, times = 20)), Var = c(rnorm(20, 1, 0.5), rnorm(20, 1, 0.9), rnorm(20, 1.5, 0.9)), AlreadySampled = rep(c(rep(1, times = 5), rep(0, times = 15)), times = 3)) x <- allocate_wave( data = mydata, strata = "Strata", y = "Var", already_sampled = "AlreadySampled", nsample = 20, method = "simple" )
# Create dataframe with a column specifying strata, a variable of interest # and an indicator for whether each unit was already sampled set.seed(234) mydata <- data.frame(Strata = c(rep(1, times = 20), rep(2, times = 20), rep(3, times = 20)), Var = c(rnorm(20, 1, 0.5), rnorm(20, 1, 0.9), rnorm(20, 1.5, 0.9)), AlreadySampled = rep(c(rep(1, times = 5), rep(0, times = 15)), times = 3)) x <- allocate_wave( data = mydata, strata = "Strata", y = "Var", already_sampled = "AlreadySampled", nsample = 20, method = "simple" )
Given a specified phase and wave of an object of class multiwave,
apply_multiwave
applies one of four optimall
functions
and returns an updated multiwave object with the output of the applied
function in its specified slot.
apply_multiwave(x, phase, wave, fun, ...) ## S4 method for signature 'Multiwave' apply_multiwave(x, phase, wave, fun, ...)
apply_multiwave(x, phase, wave, fun, ...) ## S4 method for signature 'Multiwave' apply_multiwave(x, phase, wave, fun, ...)
x |
An Object of class |
phase |
A numeric or character value specifying the phase of
|
wave |
A numeric or character value specifying the wave of |
fun |
A character value specifying the name of the
See documentation of these functions for more details on the specific uses and arguments. |
... |
Optional arguments to be given to |
The inputted multiwave object with one slot updated to include the output of the specified function.
Note that the phase and wave arguments specify where the function
output should be placed. apply_multiwave
will determine where
to get the input dataframes from (returning an error if those slots are
empty or invalid) given the specified wave for the output. For example, if
phase = 2, wave = 2, function = "allocate_wave"
, the data to
determine the optimum allocation will be taken from the previous wave
(phase 2, wave 1) and the output multiwave object will have an updated
"design"
slot of phase 2, wave 2.
library(datasets) MySurvey <- multiwave(phases = 2, waves = c(1, 3)) set_mw(MySurvey, phase = 1, slot = "data") <- dplyr::select(datasets::iris, -Sepal.Width) # Get Design by applying optimum_allocation MySurvey <- apply_multiwave(MySurvey, phase = 2, wave = 1, fun = "optimum_allocation", strata = "Species", y = "Sepal.Length", nsample = 15, method = "WrightII" ) # or, we can establish function args in the metadata set_mw(MySurvey, phase = 2, slot = "metadata") <- list( strata = "Species", nsample = 15, y = "Sepal.Length", method = "WrightII" ) # which allows the function to be run without specifying the args MySurvey <- apply_multiwave(MySurvey, phase = 2, wave = 1, fun = "optimum_allocation" )
library(datasets) MySurvey <- multiwave(phases = 2, waves = c(1, 3)) set_mw(MySurvey, phase = 1, slot = "data") <- dplyr::select(datasets::iris, -Sepal.Width) # Get Design by applying optimum_allocation MySurvey <- apply_multiwave(MySurvey, phase = 2, wave = 1, fun = "optimum_allocation", strata = "Species", y = "Sepal.Length", nsample = 15, method = "WrightII" ) # or, we can establish function args in the metadata set_mw(MySurvey, phase = 2, slot = "metadata") <- list( strata = "Species", nsample = 15, y = "Sepal.Length", method = "WrightII" ) # which allows the function to be run without specifying the args MySurvey <- apply_multiwave(MySurvey, phase = 2, wave = 1, fun = "optimum_allocation" )
get_mw
is the accessor function for objects of
class Multiwave
. It is used to get values from multiwave (mw) objects.
get_mw( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) get_data( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) get_data( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) <- value
get_mw( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) get_data( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) get_data( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) <- value
x |
an object of class |
phase |
a numeric value specifying the phase that should be accessed.
To access the overall metadata, set |
wave |
a numeric value specifying the wave that should be accessed.
Ta access phase metadata, set |
slot |
a character value specifying the name of the slot to be
accessed. Must be one of |
value |
value to assign to specified slot |
If accessing a multiwave object slot, returns the specified slot.
get_mw()
: access slot of multiwave object
get_data()
: access slot of multiwave object
get_data(
x,
phase = 1,
wave = NA,
slot = c("data", "design", "metadata", "samples", "sampled_data")
) <- value
: assign value to slot of a multiwave object
# Intiate multiwave object MySurvey <- multiwave(phases = 2, waves = c(1, 3)) # To access overall metadata get_mw(MySurvey, phase = NA, slot = "metadata") # To write overall metadata set_mw(MySurvey, phase = NA, slot = "metadata") <- list( title = "Maternal Weight Survey" ) # To access Phase 2 metadata get_mw(MySurvey, phase = 2, slot = "metadata") # To access Phase 2, Wave 2 design get_mw(MySurvey, phase = 2, wave = 2, slot = "design")
# Intiate multiwave object MySurvey <- multiwave(phases = 2, waves = c(1, 3)) # To access overall metadata get_mw(MySurvey, phase = NA, slot = "metadata") # To write overall metadata set_mw(MySurvey, phase = NA, slot = "metadata") <- list( title = "Maternal Weight Survey" ) # To access Phase 2 metadata get_mw(MySurvey, phase = 2, slot = "metadata") # To access Phase 2, Wave 2 design get_mw(MySurvey, phase = 2, wave = 2, slot = "design")
This SIMULATED dataset contains data on demographic characteristics and clinical data related to childhood obesity for 10335 mother-child pairs. It is used to generate the workflow in the main package vignette. It is based on a study that used multi-wave adaptive sampling to validate electronic health records that target factors related to childhood obesity (see https://www.pcori.org/research-results/2017/developin-methods-estimate-and-address-errors-studies-using-electronic-health).
MatWgt_Sim:
a data frame with 10335 rows and 6 columns
id
unique ID for each mother-child pair
mat_weight_true
true (but unknown in phase 1) mother weight change during pregnancy
mat_weight_est
estimated mother weight change during pregnancy based on error-prone phase-1 measurement
race
specifies mother's race
diabetes
binary indicator for diabetes in the mother
obesity
binary indicator for childhood obesity in child
See package vignettes for more details.
In an object of class "Mutiwave"
, merge_samples
creates
a dataframe in the "data"
slot of the specified wave by merging
the dataframe in the "sampled data"
slot with the dataframe in
the "data"
slot of the previous wave.
merge_samples( x, phase, wave, id = NULL, phase_sample_ind = "sampled_phase", wave_sample_ind = "sampled_wave", include_probs = NULL )
merge_samples( x, phase, wave, id = NULL, phase_sample_ind = "sampled_phase", wave_sample_ind = "sampled_wave", include_probs = NULL )
x |
an object of class |
phase |
A numeric value specifying the phase of the Multiwave object that the specified wave is in. Cannot be phase 1. |
wave |
A numeric value specifying the wave of the Multiwave
object that the merge should be
performed in. This wave must have a valid dataframe in the
|
id |
A character value specifying the name of the column holding unit
ids. Taken from wave, phase, or overall metadata (searched for in that
order) if |
phase_sample_ind |
a character value specifying the name of the column that should hold the indicator of whether each unit has already been sampled in the current phase. The specified phase number will be appended to the end of the given character name. Defaults to "sampled_phase". |
wave_sample_ind |
a character value specifying the name of the column that should hold the indicator of whether each unit has already been sampled in the current wave. The specified phase and wave numbers separated by "." will be appended o the end of the given character name. If FALSE, no such column is created. Defaults to "sampled_wave". |
include_probs |
A logical value. If TRUE, looks for "probs" in
the |
Columns in "sampled_data"
that do not match names of the
"data"
from the previous wave will be added as new columns in
the output dataframe. All ids that do not appear in
"sampled_data"
will receive NA values for these new variables.
If a column name in the "sampled_data"
matches a column name in
the "data"
slot of the previous wave, these columns will be
merged into one column with the same name in the output dataframe.
For ids that have non-missing values in both columns of the merge,
the value from "sampled_data"
will overwrite the previous value
and a warning will be printed. All ids present in the "data"
from the
previous wave but missing from "sampled_data"
will be given NA values
for the newly merged variables.
If columns with the name produced by phase_sample_ind
or
wave_sample_ind
already exist, they will be overwritten.
A Multiwave object with the merged dataframe in the
"data"
slot of the specified wave.
library(datasets) iris <- data.frame(iris, id = 1:150) MySurvey <- multiwave(phases = 2, waves = c(1, 3)) set_mw(MySurvey, phase = 1, slot = "data") <- data.frame(dplyr::select(iris, -Sepal.Width)) set_mw(MySurvey, phase = 2, wave = 1, slot = "sampled_data") <- dplyr::select(iris, id, Sepal.Width)[1:40, ] set_mw(MySurvey, phase = 2, wave = 1, slot = "samples") <- list(ids = 1:40) MySurvey <- merge_samples(MySurvey, phase = 2, wave = 1, id = "id")
library(datasets) iris <- data.frame(iris, id = 1:150) MySurvey <- multiwave(phases = 2, waves = c(1, 3)) set_mw(MySurvey, phase = 1, slot = "data") <- data.frame(dplyr::select(iris, -Sepal.Width)) set_mw(MySurvey, phase = 2, wave = 1, slot = "sampled_data") <- dplyr::select(iris, id, Sepal.Width)[1:40, ] set_mw(MySurvey, phase = 2, wave = 1, slot = "samples") <- list(ids = 1:40) MySurvey <- merge_samples(MySurvey, phase = 2, wave = 1, id = "id")
Merges multiple pre-defined sampling strata into a single stratum.
merge_strata(data, strata, merge, name = NULL)
merge_strata(data, strata, merge, name = NULL)
data |
a dataframe or matrix with one row for each sampling
unit, one column, |
strata |
a character string specifying the name of the column that defines each unit's current strata. |
merge |
the names of the strata to be merged, exactly as
they appear in |
name |
a character name for the new stratum. Defaults to NULL, which pastes the old strata names together to create the new stratum name. |
Returns the input dataframe with a new column named 'new_strata' that holds the name of the stratum that each sample belongs to after the merge. The column containing the previous strata names is retained and given the name 'old_strata'.
x <- merge_strata(iris, strata = "Species", merge = c("virginica", "versicolor"), name = "v_species" )
x <- merge_strata(iris, strata = "Species", merge = c("virginica", "versicolor"), name = "v_species" )
multiwave()
Creates an Object of Class Multiwave
with the
specified number
of phases and waves. All contents will be NULL upon initialization,
but the object contains a framework for contents to be added to
during the survey design and sample collection process. Currently,
multiwave objects may only have one wave in Phase 1.
multiwave(phases, waves, metadata = list(), phase1 = data.frame()) new_multiwave(phases, waves, metadata = list(), phase1 = data.frame())
multiwave(phases, waves, metadata = list(), phase1 = data.frame()) new_multiwave(phases, waves, metadata = list(), phase1 = data.frame())
phases |
A numeric value specifying the number of phases in the survey design. |
waves |
A vector of numeric values specifying the number of waves in
each phase of the survey design. Length must match the number of
|
metadata |
A list containing the survey metadata. Defaults to an empty list. |
phase1 |
A dataframe containing the phase 1 data of the survey. Defaults to an empty dataframe. |
Returns an object of class Multiwave
that stores all
relevant data from the survey design in an organized and easy-to-access
manner. See package vignettes or class documentation for more information.
# Initialize a multiwave object for a two-phase sampling design that will # sample over three waves in the second phase multiwave_object <- multiwave(phases = 2, waves = c(1, 3)) # If we already have the phase 1 data and want to add a title to the survey # metadata, we can initialize the object with these included. library(datasets) multiwave_object <- multiwave( phases = 2, waves = c(1, 3), metadata = list(title = "my two-phase survey"), phase1 = iris )
# Initialize a multiwave object for a two-phase sampling design that will # sample over three waves in the second phase multiwave_object <- multiwave(phases = 2, waves = c(1, 3)) # If we already have the phase 1 data and want to add a title to the survey # metadata, we can initialize the object with these included. library(datasets) multiwave_object <- multiwave( phases = 2, waves = c(1, 3), metadata = list(title = "my two-phase survey"), phase1 = iris )
Takes a multiwave object as input and plots a diagram of its structure
in the plotting window using grViz()
from the DiagrammeR
package. Red boxes indicate slots that have not yet been
filled, blue boxes indicate that the slot is filled.
multiwave_diagram(x, height = NULL, width = NULL)
multiwave_diagram(x, height = NULL, width = NULL)
x |
An object of class |
height |
The height in pixels of the diagram. Defaults to |
width |
The width in pixels of the diagram. Defaults to |
Returns an object of class htmlwidget
displaying the structure of the x
.
MySurvey <- multiwave(phases = 2, waves = c(1, 3)) multiwave_diagram(MySurvey)
MySurvey <- multiwave(phases = 2, waves = c(1, 3)) multiwave_diagram(MySurvey)
optimall defines three S4 classes for organizing the
multi-wave sampling workflow: Wave
, Phase
, and
Multiwave
.
An object of class Multiwave
holds metadata and a list of objects of
class Phase
, which in turn holds metadata and a list of
objects of class
Wave
. These three object classes are used together to organize the
workflow of multi-wave sampling designs.
metadata
A list of elements that describe the entire survey. The list is empty upon initialization of the multiwave object, but the user may add anything to it as they see fit. It may include a "title".
phases
A list of objects of class Phase
(see other class
documentation).
Launches an R Shiny application locally. This app can be used to interactively split strata and determine how the results affect optimum allocation of a fixed number of samples. It accepts .csv and .rds files as well as .rda files that contain a single dataset. See vignette titled "Splitting Strata with Optimall Shiny" for more information.
optimall_shiny(...)
optimall_shiny(...)
... |
Optional arguments to pass to |
Launches an R Shiny application locally.
Determines the optimum sampling fraction and sample size for each stratum in a stratified random sample, which minimizes the variance of the sample mean according to Neyman Allocation or Exact Optimum Sample Allocation (Wright 2014).
optimum_allocation( data, strata, y = NULL, sd_h = NULL, N_h = NULL, nsample = NULL, ndigits = 2, method = c("WrightII", "WrightI", "Neyman"), allow.na = FALSE )
optimum_allocation( data, strata, y = NULL, sd_h = NULL, N_h = NULL, nsample = NULL, ndigits = 2, method = c("WrightII", "WrightI", "Neyman"), allow.na = FALSE )
data |
A data frame or matrix with at least one column specifying
each unit's stratum, and either 1) a second column holding the value of the
continuous variable for which the sample mean variance should be minimized
( |
strata |
a character string or vector of character strings specifying the name(s) of columns which specify the stratum that each unit belongs to. If multiple column names are provided, each unique combination of values in these columns is taken to define one stratum. |
y |
a character string specifying the name of the
continuous variable for which the variance should be minimized.
Defaults to |
sd_h |
a character string specifying the name of the
column holding the within-stratum standard deviations for each stratum.
Defaults to |
N_h |
a character string specifying the name of the
column holding the population stratum sizes for each stratum.
Defaults to |
nsample |
the desired total sample size. Defaults to |
ndigits |
a numeric value specifying the number of digits to which the standard deviation and stratum fraction should be rounded. Defaults to 2. |
method |
a character string specifying the method of optimum sample allocation to use. Must be one of:
|
allow.na |
logical input specifying whether y should
be allowed to have NA values. Defaults to |
Returns a data frame with the number of samples allocated to each stratum, or just the sampling fractions if nsample is NULL.
Wright, T. (2014). A Simple Method of Exact Optimal Sample Allocation under Stratification with any Mixed Constraint Patterns, Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Bureau of the Census, Washington, D.C.
optimum_allocation( data = iris, strata = "Species", y = "Sepal.Length", nsample = 40, method = "WrightII" ) # Or if input data is summary of strata sd and N: iris_summary <- data.frame( strata = unique(iris$Species), size = c(50, 50, 50), sd = c(0.3791, 0.3138, 0.3225) ) optimum_allocation( data = iris_summary, strata = "strata", sd_h = "sd", N_h = "size", nsample = 40, method = "WrightII" )
optimum_allocation( data = iris, strata = "Species", y = "Sepal.Length", nsample = 40, method = "WrightII" ) # Or if input data is summary of strata sd and N: iris_summary <- data.frame( strata = unique(iris$Species), size = c(50, 50, 50), sd = c(0.3791, 0.3138, 0.3225) ) optimum_allocation( data = iris_summary, strata = "strata", sd_h = "sd", N_h = "size", nsample = 40, method = "WrightII" )
optimall defines three S4 classes for organizing the
multi-wave sampling workflow: Wave
, Phase
,
and Multiwave
.
An object of class Multiwave
holds metadata and a list of objects of
class Phase
, which in turn holds metadata and a list of
objects of class
Wave
. These three object classes are used together to organize the
workflow of multi-wave sampling designs.
metadata
A list containing the phase metadata
waves
A list of objects of class Wave
, each element
representing one wave of the phase
Requires two dataframes or matrices: data
with a column
strata
which specifies stratum membership for each unit in
the population and a second dataframe design_data
with one
row per strata level with a column design_strata
that
indicates the unique levels of strata
in data
and
n_allocated
that specifies the
number to be sampled from each stratum.
sample_strata
selects the units to sample by
selecting a random sample of the desired size within each
stratum. The second dataframe can be the output of allocate_wave()
or optimum_allocation()
.
sample_strata( data, strata, id, already_sampled = NULL, design_data, design_strata = "strata", n_allocated = "n_to_sample", probs = NULL, wave = NULL, warn_prob_overwrite = TRUE )
sample_strata( data, strata, id, already_sampled = NULL, design_data, design_strata = "strata", n_allocated = "n_to_sample", probs = NULL, wave = NULL, warn_prob_overwrite = TRUE )
data |
A data frame or matrix with one row for each sampling unit in the population, one column specifying each unit's stratum, and one column with a unique identifier for each unit. |
strata |
a character string specifying the name of column
in |
id |
a character string specifying the name of the column
in |
already_sampled |
a character sting specifying the name of the
column in |
design_data |
a dataframe or matrix with one row for each stratum that subdivides the population, one column specifying the stratum name, and one column indicating the number of samples allocated to each stratum. |
design_strata |
a character string specifying the name of the
column in |
n_allocated |
a character string specifying the name of the
column in |
probs |
a character string specifying the name of the column in
in |
wave |
A numeric value or character string indicating the
sampling wave. If specified, the input is appended to
"sample_indicator" in the new the sample indicator column name
(as long as such columns name do not already exist in |
warn_prob_overwrite |
Logical indicator for whether warning should
be printed if |
returns data
as a dataframe with a new column named
"sample_indicator" containing a binary (1/0) indicator of
whether each unit should be sampled. If wave
argument is
specified, then the given input is appended to the name "sample_indicator".
If probs
argument is specified, then the dataframe will also contain
a new column named "sampling_prob" holding the sampling probabilities for
each sampled element.
# Define a design dataframe design <- data.frame( strata = c("setosa", "virginica", "versicolor"), npop = c(50, 50, 50), n_to_sample = c(5, 5, 5) ) # Make sure there is an id column iris$id <- 1:nrow(iris) # Run sample_strata( data = iris, strata = "Species", id = "id", design_data = design, design_strata = "strata", n_allocated = "n_to_sample" ) # To include probs as a formula sample_strata( data = iris, strata = "Species", id = "id", design_data = design, design_strata = "strata", n_allocated = "n_to_sample", probs = ~n_to_sample/npop ) # If some units had already been sampled iris$already_sampled <- rbinom(nrow(iris), 1, 0.25) sample_strata( data = iris, strata = "Species", id = "id", already_sampled = "already_sampled", design_data = design, design_strata = "strata", n_allocated = "n_to_sample" )
# Define a design dataframe design <- data.frame( strata = c("setosa", "virginica", "versicolor"), npop = c(50, 50, 50), n_to_sample = c(5, 5, 5) ) # Make sure there is an id column iris$id <- 1:nrow(iris) # Run sample_strata( data = iris, strata = "Species", id = "id", design_data = design, design_strata = "strata", n_allocated = "n_to_sample" ) # To include probs as a formula sample_strata( data = iris, strata = "Species", id = "id", design_data = design, design_strata = "strata", n_allocated = "n_to_sample", probs = ~n_to_sample/npop ) # If some units had already been sampled iris$already_sampled <- rbinom(nrow(iris), 1, 0.25) sample_strata( data = iris, strata = "Species", id = "id", already_sampled = "already_sampled", design_data = design, design_strata = "strata", n_allocated = "n_to_sample" )
set_mw
is used to assign values (write to) slots of
Multiwave
class objects. It is used to set values of
multiwave (mw) objects.
set_mw( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) <- value
set_mw( x, phase = 1, wave = NA, slot = c("data", "design", "metadata", "samples", "sampled_data") ) <- value
x |
an object of class |
phase |
a numeric value specifying the phase that should be accessed.
To access the overall metadata, set |
wave |
a numeric value specifying the wave that should be accessed.
Ta access phase metadata, set |
slot |
a character value specifying the name of the slot to be
accessed. Must be one of |
value |
value to assign to specified slot |
# Intiate multiwave object MySurvey <- multiwave(phases = 2, waves = c(1, 3)) # To write overall metadata set_mw(MySurvey, phase = NA, slot = "metadata") <- list(title = "Maternal Weight Survey") # To write Phase 2 metadata set_mw(MySurvey, phase = 2, slot = "metadata") <- list(strata = "mystrata", id = "id")
# Intiate multiwave object MySurvey <- multiwave(phases = 2, waves = c(1, 3)) # To write overall metadata set_mw(MySurvey, phase = NA, slot = "metadata") <- list(title = "Maternal Weight Survey") # To write Phase 2 metadata set_mw(MySurvey, phase = 2, slot = "metadata") <- list(strata = "mystrata", id = "id")
Server logic for Interactive Shiny for Optimall.
shiny_server(input, output, session)
shiny_server(input, output, session)
input |
input for Shiny server. |
output |
output for by Shiny server. |
session |
session for Shiny server. |
Defines server logic for Shiny app that can be loaded with
optimall_shiny()
.
UI for Shiny App for Splitting Strata with Optimum Allocation
shiny_ui()
shiny_ui()
Creates the UI for the Shiny app that is loaded with
optimall_shiny
.
Splits pre-defined sampling strata based on values of a continuous or categorical variable.
split_strata( data, strata, split = NULL, split_var, type = "global quantile", split_at = 0.5, trunc = NULL )
split_strata( data, strata, split = NULL, split_var, type = "global quantile", split_at = 0.5, trunc = NULL )
data |
a dataframe or matrix with one row for each sampling unit, one column specifying each unit's current stratum, one column containing the continuous or categorical values that will define the split, and any other relevant columns. |
strata |
a character string specifying the name of the column that defines each unit's current strata. |
split |
the name of the stratum or strata to be split,
exactly as they appear in |
split_var |
a character string specifying the name of the column that should be used to define the strata splits. |
type |
a character string specifying how the function
should interpret the
|
split_at |
the percentile, value, or name(s) which
|
trunc |
A numeric or character value specifying how the
name of the |
For splits on continuous variables, the new strata are defined
on left-open intervals. The only exception is the first interval,
which must include the overall minimum value. The names of the newly
created strata for a split generated
from a continuous value are the split_var
column name with
the range of values defining that stratum appended to the
old strata name. For a categorical split, the new strata names
are the split_var
column name appended to the
1/0 logical flag specifying whether the unit is in split at
,
all appended to the old strata name.
If the split_var
column name is long,
the user can specify a value for trunc
to prevent the new
strata names from being inconveniently long.
Returns the input dataframe with a new column named 'new_strata' that holds the name of the stratum that each sample belongs to after the split. The column containing the previous strata names is retained and given the name "old_strata".
x <- split_strata(iris, "Sepal.Length", strata = c("Species"), split = "setosa", split_var = "Sepal.Width", split_at = c(0.5), type = "global quantile" ) # You can split at more than one quantile in one call. # The above call splits the "setosa" stratum into three of equal size x <- split_strata(iris, "Sepal.Length", strata = c("Species"), split = "setosa", split_var = "Sepal.Width", split_at = c(0.33, 0.66), type = "local quantile" ) # Manually select split values with type = "value" x <- split_strata(iris, "Sepal.Length", strata = "Species", split = "setosa", split_var = "Sepal.Width", split_at = c(3.1, 3.8), type = "value" ) # Perform a categorical split. iris$strata <- rep(c(rep(1, times = 25), rep(0, times = 25)), times = 3) x <- split_strata(iris, "Sepal.Length", strata = "strata", split = NULL, split_var = "Species", split_at = c("virginica", "versicolor"), type = "categorical" ) # Splits each initial strata 1 and 2 into one stratum with "virginia" # and "versicolor" species and one stratum with all of the other species # not specified in the split_at argument.
x <- split_strata(iris, "Sepal.Length", strata = c("Species"), split = "setosa", split_var = "Sepal.Width", split_at = c(0.5), type = "global quantile" ) # You can split at more than one quantile in one call. # The above call splits the "setosa" stratum into three of equal size x <- split_strata(iris, "Sepal.Length", strata = c("Species"), split = "setosa", split_var = "Sepal.Width", split_at = c(0.33, 0.66), type = "local quantile" ) # Manually select split values with type = "value" x <- split_strata(iris, "Sepal.Length", strata = "Species", split = "setosa", split_var = "Sepal.Width", split_at = c(3.1, 3.8), type = "value" ) # Perform a categorical split. iris$strata <- rep(c(rep(1, times = 25), rep(0, times = 25)), times = 3) x <- split_strata(iris, "Sepal.Length", strata = "strata", split = NULL, split_var = "Species", split_at = c("virginica", "versicolor"), type = "categorical" ) # Splits each initial strata 1 and 2 into one stratum with "virginia" # and "versicolor" species and one stratum with all of the other species # not specified in the split_at argument.
Method for summary for class Multiwave
## S4 method for signature 'Multiwave' summary(object)
## S4 method for signature 'Multiwave' summary(object)
object |
object of class "Multiwave" |
Prints a summary of the specified multiwave object in the console.
optimall defines three S4 classes for organizing the
multi-wave sampling workflow: Wave
, Phase
,
and Multiwave
.
An object of class Multiwave
holds metadata and a list of objects of
class Phase
, which in turn holds metadata and a list of
objects of class
Wave
. These three object classes are used together to organize the
workflow of multi-wave sampling designs.
metadata
A list containing the metadata for the wave.
design
a dataframe specifying the design of the wave.
Is often the output of allocate_wave
.
samples
A character vector containing the ids of the units sampled in the wave.
sampled_data
A dataframe holding the data, with ids, collected in this wave of sampling
data
A dataframe holding the updated full data set with all of the Phase 1 sampling units including the samples collected in this wave.