Package 'NestedCategBayesImpute' reference manual

Title:	Modeling, Imputing and Generating Synthetic Versions of Nested Categorical Data in the Presence of Impossible Combinations
Description:	This tool set provides a set of functions to fit the nested Dirichlet process mixture of products of multinomial distributions (NDPMPM) model for nested categorical household data in the presence of impossible combinations. It has direct applications in imputing missing values for and generating synthetic versions of nested household data.
Authors:	Quanli Wang, Olanrewaju Akande, Jingchen Hu, Jerry Reiter and Andres Barrientos
Maintainer:	Olanrewaju Akande <[email protected]>
License:	GPL (>= 3)
Version:	1.2.1
Built:	2025-03-06 04:18:05 UTC
Source:	https://github.com/cran/NestedCategBayesImpute

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Description

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Usage

checkconstraints(data, neededpossiblehh, hh_size)
checkconstraints(data, neededpossiblehh, hh_size)

Arguments

`data`	A household data matrix generated by calling `⁠samplinghouseholds⁠`.
`neededpossiblehh`	The number of possible households needed before checking is stopped.
`hh_size`	The household size for the households in `⁠data⁠`.

Details

Given an input household data matrix, these functions will check the possible/impossible status of each household and also output the desired number of possible and impossible households separately. ⁠checkconstraints⁠ checks constraints when the household head is included as an individual within the household.

The predefined list of structural zeros currently included should be viewed as an example of a system of constraints. It was derived by treating a subset of the 2012 American Community Survey as a population, and identifying combinations involving the relationship variable that do not appear in the data. This list should not be interpreted as a “true” list of impossible combinations in the target population. We force the combinations of variables in this list to have zero probability to be consistent with the 2012 ACS public use file that we used in the example.

The structural zeros included are:

Each household must contain exactly one head and he/she must be at least 16 years old.
Each household cannot contain more than one spouse and he/she must be at least 16 years old.
Married couples are of opposite sex, and age difference between individuals in the couples cannot exceed 49.
The household head must be older than the oldest child by at least 7.
The youngest parent must be older than the household head by at least 10.
The youngest parent-in-law must be older than the household head by at least 4.
The age difference between the household head and siblings cannot exceed 37.
The household head must be at least 34 years old. Also, the household head must be older than the oldest grandchild by at least 26.

Users can modify the list of structural zeros by downloading the package source, making changes only to the checkconstraints_imp.cpp file and re-building the package. Please note that the structural zeros have been specified according to the structure of our example data so that the specific column indexes and levels of age, gender and relationship to household head variables in subsequent data sets must match those in our example data. For more information on the structure of the data, see the documentation of the ⁠RunModel⁠ function.

Value

A list containing information on checking result.

`outcome`	An indicator vector for the possible/impossible household status under constraints.
`Households`	A data matrix for impossible households.
`Index`	A vector for the original indexes of households when possible households are found. Generally not to be used.
`synHouseholds`	A data matrix for possible households.
`possible`	The actual number of possible households returned.

Author(s)

Quanli Wang, Olanrewaju Akande

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Description

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Usage

checkconstraints_HHhead_at_group_level(data, neededpossiblehh, hh_size, parallel)
checkconstraints_HHhead_at_group_level(data, neededpossiblehh, hh_size, parallel)

Arguments

`data`	A household data matrix generated by calling `⁠samplinghouseholds⁠`.
`neededpossiblehh`	The number of possible households needed before checking is stopped.
`hh_size`	The household size for the households in `⁠data⁠`.
`parallel`	Logical indicator for running the function in parallel mode.

Details

Given an input household data matrix, these functions will check the possible/impossible status of each household and also output the desired number of possible and impossible households separately. ⁠checkconstraints_HHhead_at_group_level⁠ checks contraints when the household head is moved to the household level. For the list of structural zeros currently included, see the documentation for ⁠checkconstraints⁠.

Value

A list containing information on checking result.

`outcome`	An indicator vector for the possible/impossible household status under constraints.
`Households`	A data matrix for impossible households.
`Index`	A vector for the original indexes of households when possible households are found. Generally not to be used.
`synHouseholds`	A data matrix for possible households.
`possible`	The actual number of possible households returned.

Author(s)

Quanli Wang, Olanrewaju Akande

The new implementation of checkconstraints and will evently replace checkconstraints.

Description

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Usage

checkSZ(Data_to_check, h)
checkSZ(Data_to_check, h)

Arguments

`Data_to_check`	The household data matrix that is to be checked for structure zero constriants.
`h`	The household size for the households to be checked.

Details

The structural zeros included are:

Each household must contain exactly one head and he/she must be at least 16 years old.
Each household cannot contain more than one spouse and he/she must be at least 16 years old.
Married couples are of opposite sex, and age difference between individuals in the couples cannot exceed 49.
The household head must be older than the oldest child by at least 7.
The youngest parent must be older than the household head by at least 10.
The youngest parent-in-law must be older than the household head by at least 4.
The age difference between the household head and siblings cannot exceed 37.
The household head must be at least 34 years old. Also, the household head must be older than the oldest grandchild by at least 26.

Value

A list containing information on checking result.

`outcome`	An indicator vector for the possible/impossible household status under constraints.
`Households`	A data matrix for impossible households.
`Index`	A vector for the original indexes of households when possible households are found. Generally not to be used.
`synHouseholds`	A data matrix for possible households.
`possible`	The actual number of possible households returned.

Author(s)

Quanli Wang, Olanrewaju Akande

Michael: Edit here

Description

Michael: Edit here

Usage

checkSZ2(Data_to_check, h)
checkSZ2(Data_to_check, h)

Arguments

`Data_to_check`	Michael: Edit here
`h`	Michael: Edit here

Details

Michael: Edit here

Value

Michael: Edit here

Generate the desired number of impossible households required to observe a given number of possible households.

Description

Given model parameters, generate the desired number of impossible households required to observe a given number of possible households. Also generate synthetic (and valid) data of the same size as the observed data when required.

Usage

GetImpossibleHouseholds(d, n_star_h, lambda, omega, phi, pi, blocksize, n, synindex,
                        HHhead_at_group_level,Parallel)
GetImpossibleHouseholds(d, n_star_h, lambda, omega, phi, pi, blocksize, n, synindex,
                        HHhead_at_group_level,Parallel)

Arguments

`d`	Vector containing the number of levels for each individual-level variable.
`n_star_h`	Vector containing the number of observed households for the different household sizes in the original data.
`lambda`	Multinomial probabilities for each group-level variable.
`omega`	Latent class probabilities for the group-level and individual-level latent class pairs.
`phi`	Multinomial probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`pi`	Latent class probabilities for the group-level latent classes.
`blocksize`	Number of households to be generated at a time; batch sampling is used to improve computing speed.
`n`	Number of households in the original input data and the sum of `⁠n_star_h⁠`.
`synindex`	Logical indicator for sampling synthetic data. Set to TRUE when synthetic data is needed.
`HHhead_at_group_level`	Logical indicator for data structure with respect to the household head. Set to TRUE if the household head has been moved to the household level and FALSE otherwise.
`Parallel`	Logical indicator for running the function in parallel mode.

Value

`G_Individuals_and_M_extra`	A data matrix containing both the group-level (in long format) and individual-level latent classes for the impossible households.
`G_extra`	A vector containing the group-level latent classes for the impossible households.
`IndividualData_extra`	A data matrix containing the individual-level data for the impossible households.
`HHdata_extra`	A data matrix containing the group-level data for the impossible households.
`hh_size_new`	A vector for the number of impossible households for the different household sizes.
`synIndividuals_all`	Synthetic data when synindex is TRUE. NULL otherwise.

Author(s)

Quanli Wang

Generate 2D count table for two integer-valued vectors.

Description

Similar to 'table' function, this function builts a contingency table of the counts at each combination of all possible values from two integer-valued input vectors.

Usage

groupcount(g1, g2, n1, n2)
groupcount(g1, g2, n1, n2)

Arguments

`g1`	The first integer-valued input vector. The max value in g1 is n1.
`g2`	The second integer-valued input vector. The max value in g1 is n2.
`n1`	The maximum value in g1.
`n2`	The maximum value in g2.

Details

This is implemented as an utility function to build a 2D histogram count table. For efficiency, it does not check if the maximum values in input vectors exceed the maximum values specified.

Value

The count table.

Author(s)

Quanli Wang

Examples

n1 <- 20
n2 <- 10
g1 <- sample.int(n1,1000, replace = TRUE)
g2 <- sample.int(n2,1000, replace = TRUE)
counts <- groupcount(g1,g2,n1,n2)
n1 <- 20
n2 <- 10
g1 <- sample.int(n1,1000, replace = TRUE)
g2 <- sample.int(n2,1000, replace = TRUE)
counts <- groupcount(g1,g2,n1,n2)

Generate histogram count for an integer-valued vector.

Description

Generate histogram count for an integer-valued vector.

Usage

groupcount1D(g, n)
groupcount1D(g, n)

Arguments

`g`	An integer-valued input vector. The max value in g is n.
`n`	The max value in g.

Details

This is implemented as an utility function for 1D histgram count. For efficiency, it does not check if the maximum value in the input vector exceeds the maximum value specified.

Value

The count values.

Author(s)

Quanli Wang

Examples

n <- 20
g <- sample.int(n,1000, replace = TRUE)
counts <- groupcount1D(g,n)
n <- 20
g <- sample.int(n,1000, replace = TRUE)
counts <- groupcount1D(g,n)

Convert a household data matrix to the corresponding individual member data matrix.

Description

Convert a household data matrix to the corresponding individual member data matrix.

Usage

households2individuals(data, hh_size)
households2individuals(data, hh_size)

Arguments

`data`	Household data matrix.
`hh_size`	The household size for the households in `⁠data⁠`.

Value

Individual member data matrix.

Author(s)

Quanli Wang

Initialize the input data structure.

Description

Initialize the input data structure.

Usage

initData(md)
initData(md)

Arguments

`md`	A list holds all the input data with optional missing data info.

Value

A list object including all the necessary data variables needed by the sampler.

`origdata`	Original data.
`n_i`	Vector containing the number of individuals in each household in the data.
`n`	Number of households in the data
`HHdataorigT`	The transposed household level data – each column now represents each household.
`HHserial`	Vector containing the household index for each individual in the data.
`n_individuals`	The total number of individuals N across all n households in the input data.
`n_individuals_real`	The real total number of individuals N across all n households. The is the same as n_individuals if the household head hasn't been moved to the household level and different otherwise.
`p`	Number of individual-level variables.
`d`	Vector containing the number of levels for each of the `⁠p⁠` variables.
`dataT`	The transposed individual level data – each column now represents each individual.
`maxd`	The max value in `⁠d⁠`
`n_star_h`	Vector containing the number of observed households for the different household sizes in the original data.

Author(s)

Quanli Wang

Initilize the misising data structure from input data

Description

Initilize the misising data structure from input data

Usage

initMissing(data,struc_zero_variables,miss_batch)
initMissing(data,struc_zero_variables,miss_batch)

Arguments

`data`	A list that holds all input data info.
`struc_zero_variables`	column indexes for the variables that define structural zeros like age and relate (including those for the household head).
`miss_batch`	initial number of batches to sample for each household with missing data.

Set the output structure for saving posterior samples of parameters.

Description

Set the output structure for saving posterior samples of parameters.

Usage

initOutput(data, hyper, mc)
initOutput(data, hyper, mc)

Arguments

`data`	A list object including all the necessary data variables needed by the sampler.; output of the `⁠initData⁠` function.
`hyper`	Hyper parameters for priors.
`mc`	MCMC parameters.

Value

A list of output parameters to be saved.

`alphaout`	Vector of posterior samples for the concentration parameter in the Dirichlet process for the group-level latent classes.
`betaout`	Vector of posterior samples for the concentration parameter in the Dirichlet process for the individual-level latent classes. Currently, this is assumed to be the same within all group-level classes.
`piout`	Matrix of posterior samples for the vector of probabilities for the group-level latent classes.
`omegaout`	3D array of posterior samples for the matrix of probabilities for the group-level and individual-level latent class pairs.
`nout`	Vector of posterior samples for the total number of impossible households sampled.
`extrasize`	Matrix of posterior samples for the number of impossible households sampled, split by household size.
`F_occupied`	Vector of posterior samples for the number of occupied household-level latent classes.
`S_occupied_max`	Vector of posterior samples for the max number of occupied individual-level latent classes.
`elapsed_time`	Vector of time taken to run each iteration.
`newphiout`	3D array of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`lambdaout`	A list of an array of posterior samples for the group-level probabilities for each group-level variable. Each array in the list is for each group-level variable.

Author(s)

Quanli Wang, Olanrewaju Akande

Initialize the model parameters for the MCMC.

Description

Initialize the model parameters for the MCMC.

Usage

initParameters(data, hyper, HHhead_at_group_level)
initParameters(data, hyper, HHhead_at_group_level)

Arguments

`data`	A list object including all the necessary data variables needed by the sampler; output of the `⁠initData⁠` function.
`hyper`	Hyper parameters for the prior distributions.
`HHhead_at_group_level`	Logical indicator for data structure with respect to the household head. Set to TRUE if the household head has been moved to the household level and FALSE otherwise.

Value

A list of the initial values of the parameters.

`alpha`	Concentration parameter in the Dirichlet process for the group-level latent classes.
`beta`	Concentration parameter in the Dirichlet process for the individual-level latent classes. Currently, this is assumed to be the same within all group-level classes.
`phi`	Matrix of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`HHdata_all`	The transposed household level data – each column represents each household.
`lambda`	A list of matrices of the group-level probabilities for each group-level variable by the group-level latent classes. Each matrix in the list is for each group-level variable.
`u`	Vector of the beta-distributed variables in the stick breaking representation of the group-level latent classes.
`pi`	Vector of the probabilities for the group-level latent classes.
`v`	Matrix of the beta-distributed variables in the stick breaking representation of the individual-level latent classes by the group-level latent classes.
`omega`	Matrix of the probabilities for the individual-level latent classes by the group-level latent classes.

Author(s)

Quanli Wang

Run the mcmc sampler for the model.

Description

Run the mcmc sampler for the model.

Usage

RunModel(orig,mc,hyper,para,output,synindex,individual_variable_index,
    household_variable_index,HHhead_at_group_level,weight_option,struc_weight,MissData,
    Parallel)
RunModel(orig,mc,hyper,para,output,synindex,individual_variable_index,
    household_variable_index,HHhead_at_group_level,weight_option,struc_weight,MissData,
    Parallel)

Arguments

`orig`	A list object including all the necessary data variables needed by the sampler.; output of the `⁠initData⁠` function.
`mc`	A list specifying the number of mcmc iterations, burn-in, thinning and the effective sample size.
`hyper`	Hyper parameters for the prior distributions.
`para`	A list of the initial values of the parameters; output of the `⁠initParameters⁠` function.
`output`	A list of output parameters to be saved; output of the `⁠initOutput⁠` function.
`synindex`	A vector of iteration indexes for sampling synthetic data. length(`⁠synindex⁠`) is the number of synthetic data needed.
`individual_variable_index`	Vector of column indexes for the individual-level variables.
`household_variable_index`	Vector of column indexes for the group-level variables.
`HHhead_at_group_level`	Logical indicator for whether or not to move the household head to the household level. Set to TRUE to move the household head and FALSE otherwise.
`weight_option`	Logical indicator for whether or not to cap the number of impossible households to sample and re-weight the multinomial counts within each latent class back to the expected truth. Set to TRUE to use the weigthting option nd FALSE otherwise.
`struc_weight`	Vector specifying the weights to be used for each household size. The weights must be ordered by household sizes and no household must be excluded.
`MissData`	A list that stores all the info related to missing data. Default to NULL for no missing data.
`Parallel`	Logical indicator for running the function in parallel mode.

Details

This function runs the mcmc sampler for the NDPMPM model and generates posterior samples of parameters. It also generates synthetic data when needed.

Please note that:

The minimum household size for this mcmc sampler is 2 because households of size 1 do not violate the structural zeros specified in this package. Also, moving the household head to the household level is not possible for households of size 1.
Each variable included must be recoded to start from 1.
Moving the household head to the household level and setting the HHhead_at_group_level option to TRUE speeds up the sampler significantly.
Setting the weight_option to TRUE and specifying weights also speeds up the sampler but the exact rate of speedup depends on the specific weights.

Our example data set contains a sample of 2000 households and seven variables from the 2012 American Community Survey data. The variables are described below:

ownership (ownership of dwelling): 1 = owned or being bought (loan), 2 = rented.
householdsize (household size): 2 = 2 people, 3 = 3 people, 4 = 4 people, 5 = 5 people, 6 = 6 people.
sex (gender): 1 = male, 2 = female.
race: 1 = white, 2 = black, 3 = American Indian or Alaska Native, 4 = Chinese, 5 = Japanese, 6 = other Asian/Pacific Islander, 7 = other race, 8 = two major races, 9 = three/more major races.
hisp (Hispanic origin). 1 = not Hispanic, 2 = Mexican, 3 = Puerto Rican, 4 = Cuban, 5 = other.
age: 1 = 0 (less then one year old), 2 = 1, 3 = 2, . . . , 94 = 93
relate (relationship to the household head): 1 = head/householder, 2 = spouse, 3 = child, 4 = child-in-law, 5 = parent, 6 = parent-in- law, 7 = sibling, 8 = sibling-in-law, 9 = grandchild, 10 = other relatives, 11 = partner, friend, visitor, 12 = other non-relatives

Subsequent data sets must follow this structure because of the predefined list of structural zeros or users can modify the list of structural zeros by downloading the package source, making changes only to the checkconstraints_imp.cpp file and re-building the package.

Value

`synData`	The list of synthetic data when the length(`⁠synindex⁠`) > 0.
`output`	The list of posterior samples for the parameters included in `⁠output⁠`.

Author(s)

Quanli Wang, Olanrewaju Akande

Update household (group) level latent class indexes.

Description

Update household (group) level latent class indexes.

Usage

sampleG(phi, data, omega, pi, ni, HHdata, lambda, Parallel)
sampleG(phi, data, omega, pi, ni, HHdata, lambda, Parallel)

Arguments

`phi`	Matrix of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`data`	Individual level data.
`omega`	Matrix of the probabilities for the individual-level latent classes by the group-level latent classes.
`pi`	Vector of the probabilities for the group-level latent classes.
`ni`	Vector containing the number of individuals in each household in the data..
`HHdata`	Household level data.
`lambda`	A list of matrices of the group-level probabilities for each group-level variable by the group-level latent classes. Each matrix in the list is for each group-level variable.
`Parallel`	Logical indicator for running the function in parallel mode.

Details

Function for obtaining a posterior sample of the household-level latent class indexes for all households in the input data based on the corresponding full conditional distribution.

Value

A list with two variables.

`G`	A vector for the updated values of the household-level latent class indexes for all households in the input data.
`G_Individuals`	The vector `⁠G⁠` expanded to a long format to match the number of individuals in `⁠data⁠`.

Author(s)

Quanli Wang

Rcpp implementation for sampling household data without constraints.

Description

Rcpp implementation for sampling household data without constraints.

Usage

samplehouseholds(phi, omega, pi, d, lambda, currrentbatch, nHouseholds, householdsize,
      HeadAtGroupLevel, Parallel)
samplehouseholds(phi, omega, pi, d, lambda, currrentbatch, nHouseholds, householdsize,
      HeadAtGroupLevel, Parallel)

Arguments

`phi`	Matrix of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`omega`	Matrix of the probabilities for the individual-level latent classes by the group-level latent classes.
`pi`	Vector of the probabilities for the group-level latent classes.
`d`	Vector containing the number of levels for each of the indiviual-level variables.
`lambda`	A list of matrices of the group-level probabilities for each group-level variable by the group-level latent classes. Each matrix in the list is for each group-level variable.
`currrentbatch`	The current batch number for the household data to be generated. The household ID will be generated based on this batch number.
`nHouseholds`	The number of households to be generated by one call to this function.
`householdsize`	The size of the households to be generated.
`HeadAtGroupLevel`	Logical indicator for running the model that codes household head at the group level.
`Parallel`	Logical indicator for running the function in parallel mode.

Details

This function allows the model to generate a batch of ⁠nHouseholds⁠ with each household of size ⁠householdsize⁠. The generated household data will include both possible and impossible households. Use ⁠samplehouseholds⁠ when the household head is included as an individual within the household.

Value

A data matrix with each row for one household.

Author(s)

Quanli Wang

Update individual level latent class indexes.

Description

Update individual level latent class indexes.

Usage

sampleM(phi, data, omega, G, serial,  Parallel)
sampleM(phi, data, omega, G, serial,  Parallel)

Arguments

`phi`	Matrix of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes.
`data`	Input individual-level data.
`omega`	Matrix of the probabilities for the individual-level latent classes by the group-level latent classes.
`G`	Household-level latent class indexes.
`serial`	Vector containing the household index for each individual in the data.
`Parallel`	Logical indicator for running the function in parallel mode.

Details

Function for obtaining a posterior sample of the individual-level latent class indexes for all individuals in the input data based on the corresponding full conditional distribution.

Value

A vector for the updated values of the individual-level latent class indexes for all individuals in the input data.

Author(s)

Quanli Wang

Sample and update missing data

Description

Sample and update missing data if missing data are presented in the input

Usage

SampleMissing(MissData, para, orig, G_household, M, hyper)
SampleMissing(MissData, para, orig, G_household, M, hyper)

Arguments

`MissData`	The missing data structure that provides all infro related to missing data
`para`	A list of the initial values of the parameters; output of the `⁠initParameters⁠` function.
`orig`	A list object including all the necessary data variables needed by the sampler.
`G_household`	group level household index
`M`	individual level latent class indexes
`hyper`	Hyper parameters for the prior distributions.

Update alpha.

Description

Update alpha – the concentration parameter in the Dirichlet process for the group-level latent classes.

Usage

UpdateAlpha(aa, ab, u)
UpdateAlpha(aa, ab, u)

Arguments

`aa`	Hyper-parameter a for alpha.
`ab`	Hyper-parameter b for alpha.
`u`	Vector of the beta-distributed variables in the stick breaking representation of the group-level latent classes.

Value

Updated (posterior) value for alpha based on the corresponding full conditional distribution.

Author(s)

Quanli Wang

Update beta.

Description

Update beta – the concentration parameter in the Dirichlet process for the individual-level latent classes. Currently, this is assumed to be the same within all group-level classes.

Usage

UpdateBeta(ba, bb, v)
UpdateBeta(ba, bb, v)

Arguments

`ba`	Hyper-parameter a for beta.
`bb`	Hyper-parameter b for beta.
`v`	Matrix of the beta-distributed variables in the stick breaking representation of the individual-level latent classes by the group-level latent classes.

Value

Updated (posterior) value for beta based on the corresponding full conditional distribution..

Author(s)

Quanli Wang

Update lambda.

Description

Update lambda – the list of matrices of the group-level probabilities for each group-level variable by the group-level latent classes when the weighting/capping option is not used. Each matrix in the list is for each group-level variable.

Usage

UpdateLambda(HHdata_all, G_all, dHH, FF)
UpdateLambda(HHdata_all, G_all, dHH, FF)

Arguments

`HHdata_all`	Data matrix for the household-level data from both the original data and the sampled impossible households.
`G_all`	A vector of the household-level latent class indexes for all households both in the original data and the sampled impossible households.
`dHH`	A vector containing the number of levels for each household-level variable.
`FF`	Maximum number of household-level latent classes allowed.

Details

Function for obtaining a posterior sample of lambda when the weighting/capping option is not used.

Value

Updated (posterior) value for lambda based on the corresponding full conditional distribution.

Author(s)

Quanli Wang

Update lambda.

Description

Update lambda – the list of matrices of the group-level probabilities for each group-level variable by the group-level latent classes – when the weighting/capping option is used. The weighting options allows capping the number of impossible households to sample and re-weight the multinomial counts within each latent class back to the expected truth. Each matrix in the list is for each group-level variable.

Usage

UpdateLambdaWeighted(HHdata_all, G_all, dHH, FF,struc_weight)
UpdateLambdaWeighted(HHdata_all, G_all, dHH, FF,struc_weight)

Arguments

`HHdata_all`	Data matrix for the household-level data from both the original data and the sampled impossible households.
`G_all`	A vector of the household-level latent class indexes for all households both in the original data and the sampled impossible households.
`dHH`	A vector containing the number of levels for each household-level variable.
`FF`	Maximum number of household-level latent classes allowed.
`struc_weight`	A vector of weights by household sizes used in capping the number of sampled impossible households.

Details

Function for obtaining a posterior sample of lambda when the weighting/capping option is used.

Value

Updated (posterior) value for lambda based on the corresponding full conditional distribution.

Author(s)

Quanli Wang, Olanrewaju Akande

Update omega and v.

Description

Usage

UpdateOmega(beta, M_all, FF, SS)
UpdateOmega(beta, M_all, FF, SS)

Arguments

`beta`	Concentration parameter in the Dirichlet process for the individual-level latent classes. Currently, this is assumed to be the same within all group-level classes.
`M_all`	A vector of both the household-level and individual-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.
`SS`	Maximum number of individual-level latent classes allowed.

Value

A list containing the updated (posterior) values for omega and v based on the corresponding full conditional distributions.

Author(s)

Quanli Wang

Update omega and v.

Description

Update omega – the matrix of the probabilities for the individual-level latent classes by the group-level latent classes – and v – the matrix of the beta-distributed variables in the stick breaking representation of the individual-level latent classes by the group-level latent classes – when the weighting/capping option is used. The weighting options allows capping the number of impossible households to sample and re-weight the multinomial counts within each latent class back to the expected truth.

Usage

UpdateOmegaWeighted(beta, M_all, FF, SS, struc_weight)
UpdateOmegaWeighted(beta, M_all, FF, SS, struc_weight)

Arguments

`beta`	Concentration parameter in the Dirichlet process for the individual-level latent classes. Currently, this is assumed to be the same within all group-level classes.
`M_all`	A vector of both the household-level and individual-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.
`SS`	Maximum number of individual-level latent classes allowed.
`struc_weight`	A vector of weights by household sizes used in capping the number of sampled impossible households.

Value

A list containing the updated (posterior) values for omega and v based on the corresponding full conditional distributions.

Author(s)

Quanli Wang, Olanrewaju Akande

Update phi.

Description

Usage

UpdatePhi(data, M_all, FF, SS, d, maxd)
UpdatePhi(data, M_all, FF, SS, d, maxd)

Arguments

`data`	Data matrix for the individual-level data from both the original data and the sampled impossible households.
`M_all`	A vector of both the household-level and individual-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.
`SS`	Maximum number of individual-level latent classes allowed.
`d`	A vector for the number of levels of each individual-level variable.
`maxd`	Maximum value in `⁠d⁠`.

Details

Function for obtaining a posterior sample of phi when the weighting/capping option is not used.

Value

Updated (posterior) value for phi based on the corresponding full conditional distribution.

Author(s)

Quanli Wang

Update phi.

Description

Update phi – the matrix of posterior samples for the individual-level probabilities for each individual-level variable by each pair of group-level and individual-level latent classes – when the weighting/capping option is used. The weighting options allows capping the number of impossible households to sample and re-weight the multinomial counts within each latent class back to the expected truth.

Usage

UpdatePhiWeighted(data, M_all, FF, SS, d, maxd, struc_weight)
UpdatePhiWeighted(data, M_all, FF, SS, d, maxd, struc_weight)

Arguments

`data`	Data matrix for the individual-level data from both the original data and the sampled impossible households.
`M_all`	A vector of both the household-level and individual-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.
`SS`	Maximum number of individual-level latent classes allowed.
`d`	A vector for the number of levels of each individual-level variable.
`maxd`	Maximum value in `⁠d⁠`.
`struc_weight`	A vector of weights by household sizes used in capping the number of sampled impossible households.

Details

Function for obtaining a posterior sample of phi when the weighting/capping option is used.

Value

Updated (posterior) value for phi based on the corresponding full conditional distribution.

Author(s)

Quanli Wang, Olanrewaju Akande

Update pi and u.

Description

Usage

UpdatePi(alpha, G_all, FF)
UpdatePi(alpha, G_all, FF)

Arguments

`alpha`	Concentration parameter in the Dirichlet process for the group-level latent classes
`G_all`	A vector of the household-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.

Details

Function for obtaining a posterior sample of pi when the weighting/capping option is not used.

Value

A list containing the updated (posterior) values for pi and u based on the corresponding full conditional distributions.

Author(s)

Quanli wang

Update pi and u.

Description

Update pi – the vector of the probabilities for the group-level latent classes – and u – the vector of the beta-distributed variables in the stick breaking representation of the group-level latent classes when the weighting/capping option is used. The weighting options allows capping the number of impossible households to sample and re-weight the multinomial counts within each latent class back to the expected truth.

Usage

UpdatePiWeighted(alpha, G_all, FF, struc_weight)
UpdatePiWeighted(alpha, G_all, FF, struc_weight)

Arguments

`alpha`	Concentration parameter in the Dirichlet process for the group-level latent classes
`G_all`	A vector of the household-level latent class indexes for all households both in the original data and the sampled impossible households.
`FF`	Maximum number of household-level latent classes allowed.
`struc_weight`	A vector of weights by household sizes used in capping the number of sampled impossible households.

Details

Function for obtaining a posterior sample of pi when the weighting/capping option is used.

Value

A list containing the updated (posterior) values for pi and u based on the corresponding full conditional distributions.

Author(s)

Quanli wang, Olanrewaju Akande

Package 'NestedCategBayesImpute'

Help Index

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Description

Usage

Arguments

Details

Value

Author(s)

Checking a data matrix of households for the possible/impossible status under a predefined set of structural zeros.

Description

Usage

Arguments

Details

Value

Author(s)

The new implementation of checkconstraints and will evently replace checkconstraints.

Description

Usage

Arguments

Details

Value

Author(s)

Michael: Edit here

Description

Usage

Arguments

Details

Value

Generate the desired number of impossible households required to observe a given number of possible households.

Description

Usage

Arguments

Value

Author(s)

Generate 2D count table for two integer-valued vectors.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Generate histogram count for an integer-valued vector.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Convert a household data matrix to the corresponding individual member data matrix.

Description

Usage

Arguments

Value

Author(s)

Initialize the input data structure.

Description

Usage

Arguments

Value

Author(s)

Initilize the misising data structure from input data

Description

Usage

Arguments

Set the output structure for saving posterior samples of parameters.

Description

Usage

Arguments

Value

Author(s)

Initialize the model parameters for the MCMC.

Description

Usage

Arguments

Value

Author(s)

Run the mcmc sampler for the model.