Package 'plfm' reference manual

Title:	Probabilistic Latent Feature Analysis
Description:	Functions for estimating probabilistic latent feature models with a disjunctive, conjunctive or additive mapping rule on (aggregated) binary three-way data.
Authors:	Michel Meulders [aut, cre], Philippe De Bruecker [ctb]
Maintainer:	Michel Meulders <[email protected]>
License:	GPL (>= 2)
Version:	2.2.6
Built:	2025-03-18 05:18:35 UTC
Source:	https://github.com/cran/plfm

Probabilistic Latent Feature Analysis

Description

Functions for estimating disjunctive, conjunctive or additive probabilistic latent feature models on (aggregated) binary three-way data

Details

Probabilistic latent feature models can be used to model three-way three-mode binary observations (e.g. persons who indicate for each of a number of products and for each of a set of attributes whether a product has a certain attribute). A basic probabilistic feature model (referred to as plfm) uses aggregated three-way three-mode binary data as input, namely the two-way two-mode frequency table that is obtained by summing the binary three-way three-mode data across persons. The basic probabilistic feature model (Maris, De Boeck and Van Mechelen, 1996) is based on the assumption that observations are statistically independent and that model parameters are homogeneous across persons. The plfm function can be used to locate the posterior mode(s) of basic probabilistic feature models, and to compute information criteria for model selection, and measures of statistical and descriptive model fit. The stepplfm function can be used to fit a series of disjunctive, conjunctive or additive basic probabilistic feature models with different number of latent features. In addition, the bayesplfm function can be used to compute a sample of the posterior distribution of the basic probabilistic feature model in the neigbourhood of a specific posterior mode.

Latent class extensions of the probabilistic feature model (referred to as LCplfm) take binary three-way three-mode observations as input. In contrast to the basic probabilistic feature model, latent class probabilistic feature models allow to model dependencies between (subsets of) observations (Meulders, De Boeck and Van Mechelen, 2003) and/or to account for heterogeneity in model parameters across persons (Meulders, Tuerlinckx, and Vanpaemel, 2013). The LCplfm function can be used to compute posterior mode estimates (of different types of) latent class probabilistic feature models as well as to compute information criteria for model selection, and measures of descriptive model fit. The stepLCplfm function can be used to compute a series of latent class probabilistic feature models with different numbers of latent features and latent classes.

To see the preferable citation of the package, type citation("plfm").

Author(s)

Michel Meulders

Maintainer: <[email protected]>

References

Candel, M. J. J. M., and Maris, E. (1997). Perceptual analysis of two-way two-mode frequency data: probability matrix decomposition and two alternatives. International Journal of Research in Marketing, 14, 321-339.

Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., and Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics, 61, 74-85.

Maris, E., De Boeck, P., and Van Mechelen, I. (1996). Probability matrix decomposition models. Psychometrika, 61, 7-29.

Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies. Journal of Statistical Software, 54(14), 1-29. URL http://www.jstatsoft.org/v54/i14/.

Meulders, M., De Boeck, P., Kuppens, P., and Van Mechelen, I. (2002). Constrained latent class analysis of three-way three-mode data. Journal of Classification, 19, 277-302.

Meulders, M., De Boeck, P., and Van Mechelen, I. (2001). Probability matrix decomposition models and main-effects generalized linear models for the analysis of replicated binary associations. Computational Statistics and Data Analysis, 38, 217-233.

Meulders, M., De Boeck, P., and Van Mechelen, I. (2003). A taxonomy of latent structure assumptions for probability matrix decomposition models. Psychometrika, 68, 61-77.

Meulders, M., De Boeck, P., Van Mechelen, I., and Gelman, A. (2005). Probabilistic feature analysis of facial perception of emotions. Applied Statistics, 54, 781-793.

Meulders, M., De Boeck, P., Van Mechelen, I., Gelman, A., and Maris, E. (2001). Bayesian inference with probability matrix decomposition models. Journal of Educational and Behavioral Statistics, 26, 153-179.

Meulders, M. and De Bruecker, P. (2018). Latent class probabilistic latent feature analysis of three-way three-mode binary data. Journal of Statistical Software, 87(1), 1-45.

Meulders, M., Gelman, A., Van Mechelen, I., and De Boeck P. (1998). Generalizing the probability matrix decomposition model: An example of Bayesian model checking and model expansion. In J. Hox, and E. De Leeuw (Eds.), Assumptions, robustness, and estimation methods in multivariate modeling (pp. 1-19). TT Publicaties: Amsterdam.

Meulders, M., Tuerlinckx, F., and Vanpaemel, W. (2013). Constrained multilevel latent class models for the analysis of three-way three-mode binary data. Journal of Classification, 30 (3), 306-337.

Situational determinants of anger-related behavior

Description

The raw data consist of the binary judgments of 101 first-year psychology students who indicated whether or not they would display each of 8 anger-related behaviors when being angry at someone in each of 6 situations. The 8 behaviors consist of 4 pairs of reactions that reflect a particular strategy to deal with situations in which one is angry at someone, namely, (1) fighting (fly off the handle, quarrel), (2) fleeing (leave, avoid), (3) emotional sharing (pour out one's heart, tell one's story), and (4) making up (make up, clear up the matter). The six situations are constructed from two factors with three levels: (1) the extent to which one likes the instigator of anger (like, dislike, unfamiliar), and (2) the status of the instigator of anger (higher, lower, equal). Each situation is presented as one level of a factor, without specifying a level for the other factor.

Usage

data(anger)data(anger)

Format

The data consist of a list of 5 objects:

freq1: A 6 X 8 matrix of frequencies. The frequency in cell (j,k) indicates how many of 101 respondents would display reaction k in situation j.
freqtot: A 6 X 8 matrix of frequencies. The frequency in cell (j,k) indicates the total number of respondents who judged the situation-response pair (j,k).
rowlabels: A vector of labels for the situations.
columnlabels: A vector of labels for the anger-related reactions.
data: A 101 X 6 X 8 array of binary (0/1) values. the value in cell (i,j,k) equals 1 if person i would display behavior k in situation j, and 0 otherwise.

Source

Meulders, M., De Boeck, P., Kuppens, P., and Van Mechelen, I. (2002). Constrained latent class analysis of three-way three-mode data. Journal of Classification, 19, 277-302.

References

Kuppens, P., Van Mechelen, I., and Meulders, M. (2004). Every cloud has a silver lining: Interpersonal and individual differences determinants of anger-related behaviors. Personality and Social Psychology Bulletin, 30, 1550-1564.

Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies. Journal of Statistical Software, 54(14), 1-29. URL http://www.jstatsoft.org/v54/i14/.

Vermunt, J. K. (2007). A hierarchical mixture model for clustering three-way data sets. Computational Statistics and Data Analysis, 51, 5368-5376.

Situational determinants of anger-related behavior

Description

The raw data consist of the binary judgments of 115 first-year psychology students who indicated whether or not they would display each of 14 anger-related behaviors when being angry at someone in each of 9 situations. The 14 behaviors consist of 7 pairs of reactions that reflect a particular strategy to deal with situations in which one is angry at someone:

Anger-out: (a) You flew off the handle, (b) You started a fight
Avoidance: (a) You avoided a confrontation, (b) You went out of the other's way
Social sharing (a) You unburdened your heart to others, (b) You told others what had happened
Assertive behavior: (a) You said what was bothering you in a direct and sober way, (b) You calmly explained what was bothering you
Indirect behavior (a) You showed something was bothering you without saying anything, (b) You started to sulk
Anger-in: (a) You suppressed your anger, (b) You bottled up your anger
Reconciliation (a) You reconciled, (b) You talked things out

The six situations are constructed by crossing the levels of two factors with three levels: (1) the extent to which one likes the instigator of anger (like, unfamilar, dislike), and (2) the status of the instigator of anger (lower status, equal status, higher status)

Usage

data(anger2)data(anger2)

Format

The data consist of a list of 5 objects:

data: A 115 X 9 X 14 matrix of binary observations (0/1). The observation in cell (i,j,k) equals 1 if person i would display behavior k in situation j and 0 otherwise.
freq1: A 9 X 14 matrix of frequencies. The frequency in cell (j,k) indicates the number of respondents who indicate that they would display behavior k in situation j.
freqtot: A 9 X 14 matrix of frequencies. The frequency in cell (j,k) indicates the total number of respondents who judged the situation-response pair (j,k).
rowlabels: A vector of labels for the situations.
columnlabels: A vector of labels for the anger-related behaviors.

Source

References

Meulders, M. and De Bruecker, P. (2018). Latent class probabilistic latent feature analysis of three-way three-mode binary data. Journal of Statistical Software, 87(1), 1-45, 1-29.

Bayesian analysis of probabilistic latent feature models for two-way two-mode frequency data

Description

Computation of a sample of the posterior distribution for disjunctive or conjunctive probabilistic latent feature models with F features.

Usage

bayesplfm(data,object,attribute,rating,freq1,freqtot,F,
          Nchains=2,Nburnin=0,maxNiter=4000,
          Nstep=1000,Rhatcrit=1.2,maprule="disj",datatype="freq",
          start.bayes="best",fitted.plfm=NULL)

bayesplfm(data,object,attribute,rating,freq1,freqtot,F,
          Nchains=2,Nburnin=0,maxNiter=4000,
          Nstep=1000,Rhatcrit=1.2,maprule="disj",datatype="freq",
          start.bayes="best",fitted.plfm=NULL)

Arguments

`data`	A data frame that consists of three components: the variables `object`, `attribute` and `rating`. Each row of the data frame describes the outcome of a binary rater judgement about the association between a certain object and a certain attribute.
`object`	The name of the `object` component in the data frame `data`. The values of the vector `data$object` should be (non-missing) numeric or character values.
`attribute`	The name of the `attribute` component in the data frame `data`. The values of the vector `data$attribute` should be (non-missing) numeric or character values.
`rating`	The name of the `rating` component in the data frame `data`. The elements of the vector `data$rating` should be the numeric values 0 (no association) or 1 (association), or should be specified as missing (NA).
`freq1`	A J X K matrix of observed association frequencies.
`freqtot`	A J X K matrix with the total number of binary ratings in each cell (j,k). If the total number of ratings is the same for all cells of the matrix it is sufficient to enter a single numeric value rather than a matrix. For instance, if N raters have judged J X K associations, one may specify `freqtot`=N
`F`	The number of latent features included in the model.
`Nchains`	The number of Markov-chains that are simulated using a data-augmented Gibbs sampling algorithm.
`Nburnin`	The number of burn-in iterations.
`maxNiter`	The maximum number of iterations that will be computed for each chain.
`Nstep`	The convergence of the chains to the true posterior will be checked for each parameter after c*`Nstep` iterations with c=1,2,... The convergence will only be checked when `Nchains`>1.
`Rhatcrit`	The estimation procedure will be stopped if the Rhat convergence diagnostic is smaller than `Rhatcrit` for each object- and attribute parameter. By default `Rhatcrit`=1.2.
`maprule`	Disjunctive (maprule="disj") or conjunctive (maprule="conj") mapping rule of the probabilistic latent feature model.
`datatype`	The type of data used as input. When `datatype`="freq" one should specify frequency data `freq1` and `freqtot`, and when `datatype`="dataframe" one should specify the name of the data frame `data`, and its components, `object`, `attribute` and `rating`.
`start.bayes`	This argument can be used to define the type of starting point for the Bayesian analysis. If `start.bayes`="best" the best solution of a `plfm` analysis is used as the starting point for the Bayesian analysis, and if `start.bayes` = "fitted.plfm", the starting point is read from the (`plfm`) object assigned to the argument `fitted.plfm`. If `start.bayes`="random", a random starting point is used for the Bayesian analysis.
`fitted.plfm`	The name of the `plfm` object that contains posterior mode estimates for the specified model.

Details

The function bayesplfm can be used to compute a sample of the posterior distribution of disjunctive or conjunctive probabilistic latent feature models with a particular number of features using a data-augmented Gibbs sampling algorithm (Meulders, De Boeck, Van Mechelen, Gelman, and Maris, 2001; Meulders, De Boeck, Van Mechelen, and Gelman, 2005; Meulders, 2013).

By specifying the parameter Nchains the function can be used to compute one single chain, or multiple chains. When only one chain is computed, no convergence measure is reported. When more than one chain is computed, for each parameter, convergence to the true posterior distribution is assessed using the Rhat convergence diagnostic proposed by Gelman and Rubin (1992).

When using bayesplfm for Bayesian analysis the same starting point will be used for each simulated chain. The reason for using the same starting point for each of the chains is that the posterior distribution of probabilistic feature models with F>2 is always multimodal (local maxima may exist, and one may switch feature labels), so that the aim of the Bayesian analysis is to compute a sample in the neigbourhood of one specific posterior mode. It is recommended to use the best posterior mode obtained with the plfm function as a starting point for the Bayesian analysis (use start.bayes="best", or specify start.bayes="fitted.plfm" and fitted.plfm=object) with "object" being a plfm object that contains posterior mode estimates for the specified model. As an alternative to using the plfm(), function one may use random starting points for the Bayesian analysis (start.bayes="random") to explore the posterior distribution.

The function bayesplfm() will converge well if the distinct posterior modes are well-separated and if the different chains only visit the same mode during the estimation process. However, if the posterior distribution is multimodal, it may fail to converge if the Gibbs sampler starts visiting different posterior modes within one chain, or if different chains sample from distinct posterior modes.

Value

`call`	Parameters used to call the function.
`sample.objpar`	A J X F X Niter X Nchains array with parameter values for the object parameters. The matrix `sample.objpar[,,i,c]` contains the draw of the object parameters in iteration i of chain c. Note: when `Nchains`=1 the chain length Niter equals `maxNiter`, and when `Nchains`>1 the chain length Niter equals the number of iterations required to obtain convergence.
`sample.attpar`	A K X F X Niter X Nchains array with parameter values for the attribute parameters. The matrix `sample.attpar[,,i,c]` contains the draw of the attribute parameters in iteration i of chain c. Note: when `Nchains`=1 the chain length Niter equals `maxNiter`, and when `Nchains`>1 the chain length Niter equals the number of iterations required to obtain convergence.
`pmean.objpar`	A J X F matrix with the posterior mean of the object parameters computed on all iterations and chains in the sample.
`pmean.attpar`	A K X F matrix with the posterior mean of the attribute parameters computed on all iterations and chains in the sample.
`p95.objpar`	A 3 X J X F array which contains for each object parameter the percentiles 2.5, 50 and 97.5.
`p95.attpar`	A 3 X K X F array which contains for each attribute parameter the percentiles 2.5, 50 and 97.5.
`Rhat.objpar`	A J X F matrix with Rhat convergence values for the object parameters.
`Rhat.attpar`	A K X F matrix with Rhat convergence values for the attribute parameters.
`fitmeasures`	A list with two measures of descriptive fit on the J X K table: (1) the correlation between observed and expected frequencies, and (2) the proportion of the variance in the observed frequencies accounted for by the model. The association probabilities and corresponding expected frequencies are computed using the posterior mean of the parameters.
`convstat`	The number of object-and attribute parameters that do not meet the convergence criterion.

Author(s)

Michel Meulders

References

Gelman, A., and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7 , 457-472.

Meulders, M., De Boeck, P., Van Mechelen, I., and Gelman, A. (2005). Probabilistic feature analysis of facial perception of emotions. Applied Statistics, 54, 781-793.

Meulders, M. and De Bruecker, P. (2018). Latent class probabilistic latent feature analysis of three-way three-mode binary data. Journal of Statistical Software, 87(1), 1-45.

Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies. Journal of Statistical Software, 54(14), 1-29. URL http://www.jstatsoft.org/v54/i14/.

Examples


## Not run: 
## example 1: Bayesian analysis using data generated under the model

## define number of objects
J<-10
## define number of attributes
K<-10
## define number of features
F<-2

## generate true parameters
set.seed(43565)
objectparameters<-matrix(runif(J*F),nrow=J)
attributeparameters<-matrix(runif(K*F),nrow=K)

## generate data for conjunctive model using N=100 replications
gdat<-gendat(maprule="conj",N=100,
              objpar=objectparameters,attpar=attributeparameters)

## Use stepplfm to compute posterior mode(s) for 1 up to 3 features 

conj.lst<-stepplfm(minF=1,maxF=3,maprule="conj",freq1=gdat$freq1,freqtot=100,M=5)


## Compute a sample of the posterior distribution 
## for the conjunctive model with two features
## use the posterior mode obtained with stepplfm as starting point
conjbayes2<-bayesplfm(maprule="conj",freq1=gdat$freq1,freqtot=100,F=2,
                      maxNiter=3000,Nburnin=0,Nstep=1000,Nchains=2,
                      start.bayes="fitted.plfm",fitted.plfm=conj.lst[[2]])


## End(Not run)

## Not run: 
## example 2: Bayesian analysis of situational determinants of anger-related behavior

## load data
data(anger)

## Compute one chain of 500 iterations (including 250 burn-in iterations) 
## for the disjunctive model with two features
## use a random starting point

bayesangerdisj2a<-bayesplfm(maprule="disj",freq1=anger$freq1,freqtot=anger$freqtot,F=2,
                      maxNiter=500,Nstep=500,Nburnin=250,Nchains=1,start.bayes="random")

##print a summary of the output 
summary(bayesangerdisj2a)


## Compute a sample of the posterior distribution 
## for the disjunctive model with two features
## compute starting points with plfm
## run 2 chains with a maximum length of 10000 iterations
## compute convergence after each 1000 iterations

bayesangerdisj2b<-bayesplfm(maprule="disj",freq1=anger$freq1,freqtot=anger$freqtot,F=2,
                      maxNiter=10000,Nburnin=0,Nstep=1000,Nchains=2,start.bayes="best")


## print the output of the disjunctive 2-feature model for the anger data
print(bayesangerdisj2b)


## print a summary of the output of the disjunctive 2-feature model 
##for the anger data
summary(bayesangerdisj2b)

## End(Not run)


## Not run: 
## example 1: Bayesian analysis using data generated under the model

## define number of objects
J<-10
## define number of attributes
K<-10
## define number of features
F<-2

## generate true parameters
set.seed(43565)
objectparameters<-matrix(runif(J*F),nrow=J)
attributeparameters<-matrix(runif(K*F),nrow=K)

## generate data for conjunctive model using N=100 replications
gdat<-gendat(maprule="conj",N=100,
              objpar=objectparameters,attpar=attributeparameters)

## Use stepplfm to compute posterior mode(s) for 1 up to 3 features 

conj.lst<-stepplfm(minF=1,maxF=3,maprule="conj",freq1=gdat$freq1,freqtot=100,M=5)


## Compute a sample of the posterior distribution 
## for the conjunctive model with two features
## use the posterior mode obtained with stepplfm as starting point
conjbayes2<-bayesplfm(maprule="conj",freq1=gdat$freq1,freqtot=100,F=2,
                      maxNiter=3000,Nburnin=0,Nstep=1000,Nchains=2,
                      start.bayes="fitted.plfm",fitted.plfm=conj.lst[[2]])


## End(Not run)

## Not run: 
## example 2: Bayesian analysis of situational determinants of anger-related behavior

## load data
data(anger)

## Compute one chain of 500 iterations (including 250 burn-in iterations) 
## for the disjunctive model with two features
## use a random starting point

bayesangerdisj2a<-bayesplfm(maprule="disj",freq1=anger$freq1,freqtot=anger$freqtot,F=2,
                      maxNiter=500,Nstep=500,Nburnin=250,Nchains=1,start.bayes="random")

##print a summary of the output 
summary(bayesangerdisj2a)


## Compute a sample of the posterior distribution 
## for the disjunctive model with two features
## compute starting points with plfm
## run 2 chains with a maximum length of 10000 iterations
## compute convergence after each 1000 iterations

bayesangerdisj2b<-bayesplfm(maprule="disj",freq1=anger$freq1,freqtot=anger$freqtot,F=2,
                      maxNiter=10000,Nburnin=0,Nstep=1000,Nchains=2,start.bayes="best")


## print the output of the disjunctive 2-feature model for the anger data
print(bayesangerdisj2b)


## print a summary of the output of the disjunctive 2-feature model 
##for the anger data
summary(bayesangerdisj2b)

## End(Not run)

Ratings of associations between car models and car attributes

Description

The data describe the ratings of 78 respondents about the association between each of 14 car models and each of 27 car attributes.

Usage

data(car)data(car)

Format

The data consist of a list of 4 objects:

datalongformat: A data frame that consists of 6 components: Each row of the data frame describes the outcome of a binary rater judgement about the association between a certain car and a certain attribute. The components IDobject and objectlabel contain an ID and label for the car models, the components IDattribute and attributelabel contain an ID and label for the attributes, the component IDrater contains a rater ID, and the component rating contains the binary ratings (1 if the car model has the attribute according to the rater, and 0 otherwise).
data3w: A 78 X 14 X 27 array of binary judgements. The observation in cell (i,j,k) equals 1 if respondent i indicates that car j has attribute k, and 0 otherwise.
freq1: A 14 X 27 matrix of frequencies. The frequency in cell (j,k) indicates how many of 78 respondents indicate an association between car model j and attribute k.
freqtot: A 14 X 27 matrix of frequencies. The frequency in cell (j,k) indicates the total number of respondents who judged the car-attribute pair (j,k).

Source

Van Gysel, E. (2011). Perceptuele analyse van automodellen met probabilistische feature modellen. [translation from Dutch: Perceptual analysis of car models with probabilistic feature models] Master thesis. Hogeschool-Universiteit Brussel.

References

Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies. Journal of Statistical Software, 54(14), 1-29. URL http://www.jstatsoft.org/v54/i14/.

Judgements on associations between car models and car attributes

Description

The data consist of the binary judgements of 147 respondents about the association between each of 12 car models and each of 23 car attributes.

Usage

data(car2)data(car2)

Format

The data consist of a list of 5 objects:

data3w: A 147 X 12 X 23 array of binary judgements. The observation in cell (i,j,k) equals 1 if respondent i indicates that car j has attribute k, and 0 otherwise.
freq1: A 12 X 23 matrix of frequencies. The frequency in cell (j,k) indicates how many of 147 respondents indicate an association between car model j and attribute k.
freqtot: A 12 X 23 matrix of frequencies. The frequency in cell (j,k) indicates the total number of respondents who judged the car-attribute pair (j,k).

Source

M\"ahler, R. (2014). Analyse van perceptie en preferentie van middelgrote wagens. [translation from Dutch: Analysis of perception and preference for midsize cars.] Master thesis. KU Leuven.

References

Meulders, M. and De Bruecker, P. (2018). Latent class probabilistic latent feature analysis of three-way three-mode binary data. Journal of Statistical Software, 87(1), 1-45.

Data generation

Description

Computation of association probabilities and data generation for disjunctive, conjunctive or additive probabilistic latent feature models.

Usage

gendat(maprule="disj", N, objpar, attpar)gendat(maprule="disj", N, objpar, attpar)

Arguments

`maprule`	Disjunctive (`maprule="disj"`), conjunctive (`maprule="conj"`) or additive (`maprule="add"`) mapping rule of the probabilistic latent feature model.
`N`	Number of replications for which binary associations are generated.
`objpar`	True objectparameters. As object parameters are probabilities they should be between 0 and 1.
`attpar`	True attributeparameters. As attribute parameters are probabilities they should be between 0 and 1.

Details

The function gendat computes for all pairs of J objects and K attributes association probabilities and it generates association frequencies (i.e. the number of replications N for which an object is associated to an attribute), according to a disjunctive, conjunctive or additive probabilistic latent feature model. In addition, the function computes a matrix with in each cell the total number of replications N. If the requested number of replications N equals 0, the function only computes association probabilities and does not generate new data.

To compute association probabilities the function gendat uses a J X F matrix of object parameters and a K X F matrix of attribute parameters as input. The F object parameters of object j represent, for each of F features, the probability that object j has feature f. Similarly, the F attribute parameters of attribute k reflect, for each of F features, the probability that attribute k is linked to feature f.

According to the disjunctive probabilistic latent feature model, object j is associated to attribute k if the object and the attribute have at least one feature in common. More specifically, the association probability in cell (j,k) for the disjunctive model can be computed as:

$p(j,k)=1-\prod_f(1-objpar[j,f]*attpar[k,f]).$

According to the conjunctive probabilistic latent feature model, object j and attribute k are associated if object j has all the features that are linked to attribute k. For the conjunctive model the association probability in cell (j,k) is computed as:

$p(j,k)=\prod_f(1-(1-objpar[j,f])*attpar[k,f]).$

The additive mapping rule states that an object and attribute are more likely to associated if they have more common features. More specifically, the association probability for the additive model is computed as:

$p(j,k)= \frac{1}{F}*\sum_f (objpar[j,f])*attpar[k,f]).$

Value

`call`	Parameters used to call the function.
`prob1`	J X K matrix of association probabilities.
`freq1`	J X K matrix of association frequencies.
`freqtot`	J X K matrix with number of replications.

Author(s)

Michel Meulders

References

Maris, E., De Boeck, P., and Van Mechelen, I. (1996). Probability matrix decomposition models. Psychometrika, 61, 7-29.

Meulders, M., De Boeck, P., Van Mechelen, I., & Gelman, A. (2005). Probabilistic feature analysis of facial perception of emotions. Applied Statistics, 54, 781-793.

Examples

## define constants
J<-20
K<-15
F<-2

## generate true parameters
set.seed(43565)
objectparameters<-matrix(runif(J*F),nrow=J)
attributeparameters<-matrix(runif(K*F),nrow=K)

## compute association probabilities for a conjunctive model
probconj<-gendat(maprule="conj",N=0,
             objpar=objectparameters,attpar=attributeparameters)

## generate data for a disjunctive model using N=200 replications
gdat<-gendat(maprule="disj",N=200,
             objpar=objectparameters,attpar=attributeparameters)

## generate data for a additive model using N=200 replications
gdat<-gendat(maprule="add",N=200,
             objpar=objectparameters,attpar=attributeparameters)
## define constants
J<-20
K<-15
F<-2

## generate true parameters
set.seed(43565)
objectparameters<-matrix(runif(J*F),nrow=J)
attributeparameters<-matrix(runif(K*F),nrow=K)

## compute association probabilities for a conjunctive model
probconj<-gendat(maprule="conj",N=0,
             objpar=objectparameters,attpar=attributeparameters)

## generate data for a disjunctive model using N=200 replications
gdat<-gendat(maprule="disj",N=200,
             objpar=objectparameters,attpar=attributeparameters)

## generate data for a additive model using N=200 replications
gdat<-gendat(maprule="add",N=200,
             objpar=objectparameters,attpar=attributeparameters)

Data generation

Description

Data generation for disjunctive, conjunctive and additive latent class probabilistic latent feature models.

Usage

gendatLCplfm(N,objpar,attpar,sizepar,maprule="disj",model=1)gendatLCplfm(N,objpar,attpar,sizepar,maprule="disj",model=1)

Arguments

`N`	Number of replications (e.g. persons) for which binary object-attribute associations are generated.
`objpar`	True objectparameters. If `model`=1, `model`=3, `model`=4, or `model`=6 `objpar` is a J X F X T array, if `model`=2 or `model`=5 `objpar` is a J X F matrix. As object parameters are probabilities they should be between 0 and 1.
`attpar`	True attributeparameters. If `model`=2, `model`=3, `model`=5, or `model`=6 `attpar` is a K X F X T array, if `model`=1 or `model`=4 `attpar` is a K X F matrix. As attribute parameters are probabilities they should be between 0 and 1.
`sizepar`	A T-vector of true class size parameters.
`maprule`	Disjunctive (`maprule="disj"`), conjunctive (`maprule="conj"`) or additive (`maprule="add"`) mapping rule of the latent class probabilistic latent feature model.
`model`	The type of dependency and heterogeneity assumption included in the model. `model`=1, `model`=2, `model`=3 represent models with a constant object-feature classification per person and with, respectively, class-specific object parameters, class-specific attribute parameters, and class-specific object- and attribute parameters. `model`=4, `model`=5, `model`=6 represent models with a constant attribute-feature classification per person and with, respectively, class-specific object parameters, class-specific attribute parameters, and class-specific object- and attribute parameters.

Details

The function gendatLCplfm generates binary object-attribute associations for N replications according to a disjunctive, conjunctive or additive latent class probabilistic latent feature model of a specific model type. In addition, the function computes the J X K matrix of marginal object-attribute association probabilities and a J X K X T array of class-specific object-attribute association probabilities. To compute association probabilities the function gendatLCplfm uses a vector of class size parameters (sizepar) a matrix or array of object parameters (objpar) and a matrix or array of true attribute parameters (attpar) as input.

According to the disjunctive probabilistic latent feature model, object j is associated to attribute k if the object and the attribute have at least one feature in common. More specifically, for model=1 the class-specific object-attribute association probability in cell (j,k) for the disjunctive model can be computed as:

$p(j,k|t)=1-\prod_f(1-objpar[j,f,t]*attpar[k,f]).$

According to the conjunctive probabilistic latent feature model, object j and attribute k are associated if object j has all the features that are linked to attribute k.

In particular, for model=1,the class-specific object-attribute association probability in cell (j,k) for the conjunctive model can be computed as:

$p(j,k|t)=\prod_f(1-(1-objpar[j,f,t])*attpar[k,f]).$

According to the additive probabilistic latent feature model, an object and an attribute are more likely to be associated if they have more features in common.

In particular, for model=1,the class-specific object-attribute association probability in cell (j,k) for the additive model can be computed as:

$p(j,k|t)= \frac{1}{F} * \sum_f(objpar[j,f,t])*attpar[k,f]).$

The marginal object-attribute association probability can be computed as follows:

$p(j,k)=\sum_t sizepar[t]*p(j,k|t).$

Value

`call`	Parameters used to call the function.
`data`	I X J X K matrix of association probabilities.
`class`	I-vector that contains latent class membership of each replication.
`condprob.JKT`	J X K X T array of class-specific conditional object-attribute association probabilities.
`margprob.JK`	J X K matrix of marginal object-attribute association probabilities.

Author(s)

Michel Meulders

References

Meulders, M., Tuerlinckx, F., and Vanpaemel, W. (2013). Constrained multilevel latent class models for the analysis of three-way three-mode binary data. Journal of Classification, 30 (3), 306-337.

Examples

## Not run: 
# define constants
I<-500
J<-10
K<-8
F<-2
T<-2

# model 1

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-matrix(runif(K*F),c(K,F))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=1)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=1,maprule="conj")


# model 2

# generate true parameters
objpar<-matrix(runif(J*F),nrow=J)
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=2)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=2,maprule="conj")

# model 3

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=3)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=3,maprule="conj")

# model 4

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-matrix(runif(K*F),c(K,F))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=4)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=4,maprule="conj")

# model 5

# generate true parameters
objpar<-matrix(runif(J*F),nrow=J)
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=5)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=5,maprule="conj")


# model 6
# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=6)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=6,maprule="conj")

## End(Not run)
## Not run: 
# define constants
I<-500
J<-10
K<-8
F<-2
T<-2

# model 1

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-matrix(runif(K*F),c(K,F))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=1)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=1,maprule="conj")


# model 2

# generate true parameters
objpar<-matrix(runif(J*F),nrow=J)
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=2)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=2,maprule="conj")

# model 3

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=3)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=3,maprule="conj")

# model 4

# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-matrix(runif(K*F),c(K,F))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=4)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=4,maprule="conj")

# model 5

# generate true parameters
objpar<-matrix(runif(J*F),nrow=J)
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=5)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=5,maprule="conj")


# model 6
# generate true parameters
objpar<-array(runif(J*F*T),c(J,F,T))
attpar<-array(runif(K*F*T),c(K,F,T))
sizepar<-rep(1/T,T)
# generate data
d<-gendatLCplfm(N=I,objpar=objpar,attpar=attpar,sizepar=sizepar,maprule="conj",model=6)
# estimate parameters of true model
res<-LCplfm(data=d$data,F=2,T=2,model=6,maprule="conj")

## End(Not run)

self-reported hostile behavior in frustrating situations

Description

The data consist of the judgments of 316 first-year psychology students who indicated on a three point scale the extent to which they would display each of 4 hostile behaviors in each of 14 frustrating situations (0= you do not display this response in this situation, 1= you display this response to a limited extent in this situation, 2= you display this response to a strong extent in this situation).

Usage

data(hostility)data(hostility)

Format

The data consist of a list of 6 objects:

data: A 316 X 14 X 4 array of dichotomized judgements (0 versus 1 or 2). The observation in cell (i,j,k) equals 1 if person i would display behavior k in situation j to a limited or strong extent and 0 if person i would not display behavior k in situation j.
freq1: A 14 X 4 matrix of frequencies. The frequency in cell (j,k) indicates the number of respondents who indicate that they would display behavior k in situation j.
freqtot: A 14 X 4 matrix of frequencies. The frequency in cell (j,k) indicates the total number of respondents who judged the situation-response pair (j,k).
situation: A vector with descriptions of the situations.
rowlabels: A vector of labels for the situations.
columnlabels: A vector of labels for the anger-related behaviors.

Source

Vansteelandt, K. (1999). A formal model for the competency-demand hypothesis. European Journal of Personality, 13, 429-442.

References

Vansteelandt, K. and Van Mechelen, I. (1998). Individual differences in situation-behavior profiles: A triple typology model. Journal of Personality and Social Psychology, 75, 751-765.

Latent class probabilistic feature analysis of three-way three-mode binary data

Description

Computation of parameter estimates, standard errors, criteria for model selection, and measures of descriptive fit for disjunctive, conjunctive and additive latent class probabilistic feature models.

Usage

  
LCplfm(data,F=2,T=2,M=5,maprule="disj",emcrit1=1e-3,emcrit2=1e-8,
       model=1,start.objectparameters=NULL,start.attributeparameters=NULL,
       start.sizeparameters=NULL,delta=0.0001,printrun=FALSE,
       update.objectparameters=NULL,update.attributeparameters=NULL,
       Nbootstrap=2000)
LCplfm(data,F=2,T=2,M=5,maprule="disj",emcrit1=1e-3,emcrit2=1e-8,
       model=1,start.objectparameters=NULL,start.attributeparameters=NULL,
       start.sizeparameters=NULL,delta=0.0001,printrun=FALSE,
       update.objectparameters=NULL,update.attributeparameters=NULL,
       Nbootstrap=2000)

Arguments

`data`	A I X J X K data array of binary observations. Observation (i,j,k) (i=1,..,I; j=1,..,J; k=1,..,K) indicates whether object j is associated to attribute k according to rater i.
`F`	The number of latent features included in the model.
`T`	The number of latent classes included in the model.
`M`	The number of times a particular model is estimated using random starting points.
`maprule`	Disjunctive (`maprule`="disj"), conjunctive (`maprule`="conj") or additive (`maprule`="add") mapping rule of the probabilistic latent feature model.
`emcrit1`	Convergence criterion to be used for the estimation of candidate models.
`emcrit2`	Convergence criterion to be used for the estimation of the best model.
`model`	The type of dependency and heterogeneity assumption included in the model. `model`=1, `model`=2, `model`=3 represent models with a constant object-feature classification per person and with, respectively, class-specific object parameters, class-specific attribute parameters, and class-specific object- and attribute parameters. `model`=4, `model`=5, `model`=6 represent models with a constant attribute-feature classification per person and with, respectively, class-specific object parameters, class-specific attribute parameters, and class-specific object- and attribute parameters.
`start.objectparameters`	An array of object parameters to be used as starting value for each run. The size of the array equals J x F x T x M when `model = 1,4,3,6` and J x F x M when `model = 2,5`. If `start.objectparameters=NULL` randomly generated object parameters are used as starting values.
`start.attributeparameters`	An array of attribute parameters to be used as starting value for each run. The size of the array equals K x F x T x M when `model = 2,5,3,6` and K x F x M when `model = 1,3`. If `start.attributeparameters=NULL` randomly generated attribute parameters are used as starting values.
`start.sizeparameters`	A T x M matrix of latent class size parameters to be used as starting value for each run. If `start.sizeparameters=NULL` randomly generated class size parameters are used as starting values.
`delta`	The precision used to compute standard errors of the parameters with the method of finite differences.
`printrun`	`printrun`=TRUE prints the analysis type (disjunctive, conjunctive, additive), the number of features (F), the number of latent classes (T) and the number of the run to the output screen, whereas `printrun`=FALSE suppresses the printing.
`update.objectparameters`	A binary valued array that indicates for each object parameter whether it has to be estimated from the data or constrained to the starting value. A value of 1 means that the corresponding object parameter is estimated and a value of 0 means that the corresponding object parameter is constrained to the starting value provided by the user. The size of the array equals J x F x T when `model = 1,4,3,6` and J x F when `model = 2,5`. If `update.objectparameters` `= NULL` all object parameters are estimated from the data.
`update.attributeparameters`	A binary valued array that indicates for each attribute parameter whether it has to be estimated from the data or constrained to the starting value. A value of 1 means that the corresponding attribute parameter is estimated and a value of 0 means that the corresponding attribute parameter is constrained to the starting value provided by the user. The size of the array equals K x F x T when `model = 2,5,3,6` and K x F when `model = 1,3`. If `update.attributeparameters` `= NULL` all attribute parameters are estimated from the data.
`Nbootstrap`	Number of bootstrap iterations to be used for simulating the reference distribution of odds-ratio dependency measures.

Details

Estimation The estimation algorithm includes two steps. In a first exploratory step an EM algorithm is used to conduct M runs using random starting points. Each exploratory run is terminated if the convergence criterium (i.e., the sum of absolute differences between parameter values in subsequent iterations) is smaller than emcrit1. In a second step, the best solution among the M runs (i.e., with the highest posterior density) is used as the starting point of the EM algorithm for conducting a final analysis. The final analysis is terminated if the convergence criterion emcrit2 is smaller than the convergence criterion.

Model selection criteria, goodness-of-fit and statistical dependency measures

To choose among models with different numbers of features, or with different mapping rules, one may use information criteria such as the Akaike Information Criterion (AIC, Akaike, 1973, 1974), or the Schwarz Bayesian Information Criterion (BIC, Schwarz, 1978). AIC and BIC are computed as -2*loglikelihood+k*Npar. For AIC k equals 2 and for BIC k equals log(N), with N the observed number of replications (I) for which object-attribute associations are collected. Npar represents the number of model parameters. Models with the lowest value for AIC or BIC should be selected.

The descriptive goodness-of-fit of the model is assessed with the correlation between observed and expected frequencies in the J X K table, and the proportion of the variance in the observed frequencies accounted for by the model (VAF) (i.e. the squared correlation between observed and expected frequencies).

To assess to which extent the model can capture observed statistical dependencies between object-attribute pairs with a common object or attribute, a parametric bootstrap procedure is used to evaluate whether observed dependencies are within the simulated 95 or 99 percent confidence interval. Let D(i,j,k) be equal to 1 if rater i indicates that object j is associated to attribute k. The statistical dependency between pairs (j,k) and (j*,k*) is measured with the odds ratio (OR) statistic:

$OR(j,k,j^{*},k^{*})=\mbox{log}\left\lbrack\frac{N_{11}*N_{00}}{N_{10}*N_{01}}\right\rbrack$

with

$N_{11}=\sum_i D(i,j,k) D(i,j^{*},k^{*}) +0.5$

$N_{00}=\sum_i (1-D(i,j,k)) (1-D(i,j^{*},k^{*})) +0.5$

$N_{10}=\sum_i D(i,j,k) (1-D(i,j^{*},k^{*})) +0.5$

$N_{01}=\sum_i (1-D(i,j,k)) D(i,j^{*},k^{*}) +0.5$

The model selection criteria AIC and BIC, the descriptive goodness-of-fit measures (correlation observed and expected frequencies, and VAF) and a summary of the OR dependency measures (i.e., proportion of observed OR dependencies of a certain type that are in the simulated 95 or 99 percent confidence interval) are stored in the object fitmeasures of the output list

Value

`call`	Parameters used to call the function.
`logpost.runs`	A list with the logarithm of the posterior density for each of the M computed models.
`best`	An index which indicates the model with the highest posterior density among each of the M computed models.
`objpar`	Estimated object parameters for the best model.
`attpar`	Estimated attribute parameters for the best model.
`sizepar`	A vector of `T` class size parameters for the best model.
`SE.objpar`	Estimated standard errors for the object parameters of the best model.
`SE.attpar`	Estimated standard errors for the attribute parameters of the best model.
`SE.sizepar`	Estimated standard errors for the class size parameters of the best model.
`gradient.objpar`	Gradient of the object parameters for the best model.
`gradient.attpar`	Gradient of the attribute parameters for the best model.
`gradient.sizepar`	Gradient of the class size parameters for the best model.
`fitmeasures`	A list of model selection criteria, goodness-of-fit measures and OR dependency measures for the model with the highest posterior density.
`postprob`	A I X T matrix of posterior probabilities for the best model.
`margprob.JK`	A J X K matrix of marginal object-attribute association probabilities.
`condprob.JKT`	A J X K X T array of conditional object-attribute association probabilities (i.e., probability of object-attribute association given latent class membership).
`report.OR.attpair`	A matrix that contains for all attribute pairs per object the observed OR dependency (OR.obs), the expected OR dependency (OR.mean) and the upper and lower bounds of the corresponding simulated 95 and 99 percent confidence interval (OR.p025, OR.p975, OR.p005, OR.p995).
`report.OR.objpair`	A matrix that contains for all object pairs per attribute the observed OR dependency (OR.obs), the expected OR dependency (OR.mean) and the upper and lower bounds of the corresponding simulated 95 and 99 percent confidence interval (OR.p025, OR.p975, OR.p005, OR.p995).

Author(s)

Michel Meulders and Philippe De Bruecker

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Csaki (Eds.), Second international symposium on information theory (p. 271-283). Budapest: Academiai Kiado.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723.

Louis, T. A. (1982). Finding observed information using the em algorithm. Journal of the Royal Statistical Society, Series B, 44, 98-130.

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212.

Maris, E., De Boeck, P., and Van Mechelen, I. (1996). Probability matrix decomposition models. Psychometrika, 61, 7-29.

Meulders, M., De Boeck, P., Van Mechelen, I., and Gelman, A. (2005). Probabilistic feature analysis of facial perception of emotions. Applied Statistics, 54, 781-793.

Meulders, M. and De Bruecker, P. (2018). Latent class probabilistic latent feature analysis of three-way three-mode binary data. Journal of Statistical Software, 87(1), 1-45.

Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way Two-Mode Frequencies. Journal of Statistical Software, 54(14), 1-29. URL http://www.jstatsoft.org/v54/i14/.

Meulders, M., Tuerlinckx, F., and Vanpaemel, W. (2013). Constrained multilevel latent class models for the analysis of three-way three-mode binary data. Journal of Classification, 30 (3), 306-337.

Tanner, M. A. (1996). Tools for statistical inference: Methods for the exploration of posterior distributions and likelihood functions (Third ed.). New York: Springer-Verlag.

Tatsuoka, K. (1984). Analysis of errors in fraction addition and subtraction problems. Final Report for NIE-G-81-0002, University of Illinois, Urbana-Champaign.

Schwarz, G. (1978). Estimating the dimensions of a model. Annals of Statistics, 6, 461-464.

Examples


## Not run: 

# example 1: analysis on determinants of anger-related behavior

# load anger data
data(anger)

# estimate a disjunctive LCplfm model with F=2 and T=2 
# assume constant situation-feature classification
# and class-specific situation parameters (i.e. model 1)
# use 10 exploratory runs with random starting points 
anger.LCplfm.disj<-LCplfm(data=anger$data,F=2, T=2, M=10)

# print the output of the model 
print (anger.LCplfm.disj)


# estimate an additive LCplfm model with F=2 and T=2 
# assume constant situation-feature classification
# and class-specific situation parameters (i.e. model 1)
# use 10 exploratory runs with random starting points 
anger.LCplfm.add<-LCplfm(data=anger$data,F=2, T=2, M=10, maprule="add")

# print the output of the model 
print (anger.LCplfm.add)


# estimate a disjunctive LCplfm model with F=4 and T=2
# assume constant situation-feature classifications
# and class-specific situation parameters (i.e. model 1)
# use 20 exploratory runs with random starting points (M=20)
# constrain parameters of subsequent behavior pairs to "load"
# on only one feature

# specify which attribute parameters have to be estimated from the data
update.attribute<-matrix(rep(0,8*4),ncol=4)
update.attribute[1:2,1]<-c(1,1)
update.attribute[3:4,2]<-c(1,1)
update.attribute[5:6,3]<-c(1,1)
update.attribute[7:8,4]<-c(1,1)

# specify starting values for attribute parameters in each of M=20 runs
# for parameters with update.attribute==0 starting values are constrained to 1e-6
# for parameters with update.attribute==1 starting values are sampled from a unif(0,1)
start.attribute<-array(runif(8*4*20),c(8,4,20))
start.attribute[update.attribute%o%rep(1,20)==0]<-1e-6 

# estimate the constrained model
anger.LCplfm.constr<-LCplfm(data=anger$data,F=4, T=2, M=20, 
                     update.attributeparameters=update.attribute,
                     start.attributeparameters=start.attribute)

# estimate a disjunctive LCplfm model with F=4 and T=2
# assume constant situation-feature classifications
# class-specific situation and bahavior parameters (i.e. model 3)
# use 20 exploratory runs with random starting points (M=20)
# constrain parameters of subsequent behavior pairs to "load"
# on only one feature

# specify which attribute parameters have to be estimated from the data
 
update.attribute<-matrix(rep(0,8*4),ncol=4)
update.attribute[1:2,1]<-c(1,1)
update.attribute[3:4,2]<-c(1,1)
update.attribute[5:6,3]<-c(1,1)
update.attribute[7:8,4]<-c(1,1)
update.attribute<-update.attribute%o%rep(1,2)

# specify starting values for attribute parameters in each of M=20 runs
# for parameters with update.attribute==0 starting values are constrained to 1e-6
# for parameters with update.attribute==1 starting values are sampled from a unif(0,1)
start.attribute<-array(runif(8*4*2*20),c(8,4,2,20))
start.attribute[update.attribute%o%rep(1,20)==0]<-1e-6 

# estimate the constrained model
anger.LCplfm.m3.constr<-LCplfm(data=anger$data,F=4, T=2, M=20, model=3, 
                     update.attributeparameters=update.attribute,
                     start.attributeparameters=start.attribute)


## End(Not run)

## Not run: 
# example 2: analysis of car perception data

# load car data
data(car)

# estimate a disjunctive LCplfm with F=3 and T=2
# assume constant attribute-feature classification
# and class-specific car parameters (i.e. model 4)
# use 10 exploratory runs with random starting points 
car.LCplfm.disj<-LCplfm(data=car$data3w,F=3, T=2, M=10,model=4)

# print the output of the model 
print(car.LCplfm.disj)

# estimate an additive LCplfm with F=3 and T=2
# assume constant attribute-feature classification
# and class-specific car parameters (i.e. model 4)
# use 10 exploratory runs with random starting points 
car.LCplfm.add<-LCplfm(data=car$data3w,F=3, T=2, M=10, model=4, maprule="add")

# print the output of the model 
print(car.LCplfm.add)


## End(Not run)

## Not run: 

# example 3: estimation of multiple classification latent class 
# model (Maris, 1999) for cognitive diagnosis


# load subtraction data
library(CDM)
data(fraction.subtraction.data)
data(fraction.subtraction.qmatrix)


# create three-way data as input for LCplfm
I<-536
J<-1
K<-20
data3w<-array(c(as.matrix(fraction.subtraction.data)),c(I,J,K))

# add item labels

itemlabel<-c("5/3 - 3/4", 
"3/4 - 3/8", 
"5/6 - 1/9",
"3 1/2 - 2 3/2", 
"4 3/5 - 3 4/10", 
"6/7 - 4/7", 
"3 - 2 1/5", 
"2/3 - 2/3", 
"3 7/8 - 2", 
"4 4/12 - 2 7/12", 
"4 1/3 - 2 4/3", 
"1 1/8 - 1/8", 
"3 3/8 - 2 5/6", 
"3 4/5 - 3 2/5", 
"2 - 1/3", 
"4 5/7 - 1 4/7", 
"7 3/5 - 4/5", 
"4 1/10 - 2 8/10", 
"4 - 1 4/3", 
"4 1/3 - 1 5/3") 

dimnames(data3w)[[3]]<-itemlabel

# estimate multiple classification latent class model (Maris, 1999)

set.seed(537982)
subtract.m1.lst<-stepLCplfm(data3w,minF=3,maxF=5,minT=1,maxT=3,model=1,M=20,maprule="conj")


# print BIC values
sumar<-summary(subtract.m1.lst)
as.matrix(sort(sumar[,5]))

# print output best model
subtract.m1.lst[[5,2]]

# correlation between extracted skills and qmatrix
round(cor(fraction.subtraction.qmatrix,subtract.m1.lst[[5,2]]$attpar),2)

## End(Not run)
## Not run: 

# example 1: analysis on determinants of anger-related behavior

# load anger data
data(anger)

# estimate a disjunctive LCplfm model with F=2 and T=2 
# assume constant situation-feature classification
# and class-specific situation parameters (i.e. model 1)
# use 10 exploratory runs with random starting points 
anger.LCplfm.disj<-LCplfm(data=anger$data,F=2, T=2, M=10)

# print the output of the model 
print (anger.LCplfm.disj)


# estimate an additive LCplfm model with F=2 and T=2 
# assume constant situation-feature classification
# and class-specific situation parameters (i.e. model 1)
# use 10 exploratory runs with random starting points 
anger.LCplfm.add<-LCplfm(data=anger$data,F=2, T=2, M=10, maprule="add")

# print the output of the model 
print (anger.LCplfm.add)


# estimate a disjunctive LCplfm model with F=4 and T=2
# assume constant situation-feature classifications
# and class-specific situation parameters (i.e. model 1)
# use 20 exploratory runs with random starting points (M=20)
# constrain parameters of subsequent behavior pairs to "load"
# on only one feature

# specify which attribute parameters have to be estimated from the data
update.attribute<-matrix(rep(0,8*4),ncol=4)
update.attribute[1:2,1]<-c(1,1)
update.attribute[3:4,2]<-c(1,1)
update.attribute[5:6,3]<-c(1,1)
update.attribute[7:8,4]<-c(1,1)

# specify starting values for attribute parameters in each of M=20 runs
# for parameters with update.attribute==0 starting values are constrained to 1e-6
# for parameters with update.attribute==1 starting values are sampled from a unif(0,1)
start.attribute<-array(runif(8*4*20),c(8,4,20))
start.attribute[update.attribute%o%rep(1,20)==0]<-1e-6 

# estimate the constrained model
anger.LCplfm.constr<-LCplfm(data=anger$data,F=4, T=2, M=20, 
                     update.attributeparameters=update.attribute,
                     start.attributeparameters=start.attribute)

# estimate a disjunctive LCplfm model with F=4 and T=2
# assume constant situation-feature classifications
# class-specific situation and bahavior parameters (i.e. model 3)
# use 20 exploratory runs with random starting points (M=20)
# constrain parameters of subsequent behavior pairs to "load"
# on only one feature

# specify which attribute parameters have to be estimated from the data
 
update.attribute<-matrix(rep(0,8*4),ncol=4)
update.attribute[1:2,1]<-c(1,1)
update.attribute[3:4,2]<-c(1,1)
update.attribute[5:6,3]<-c(1,1)
update.attribute[7:8,4]<-c(1,1)
update.attribute<-update.attribute%o%rep(1,2)

# specify starting values for attribute parameters in each of M=20 runs
# for parameters with update.attribute==0 starting values are constrained to 1e-6
# for parameters with update.attribute==1 starting values are sampled from a unif(0,1)
start.attribute<-array(runif(8*4*2*20),c(8,4,2,20))
start.attribute[update.attribute%o%rep(1,20)==0]<-1e-6 

# estimate the constrained model
anger.LCplfm.m3.constr<-LCplfm(data=anger$data,F=4, T=2, M=20, model=3, 
                     update.attributeparameters=update.attribute,
                     start.attributeparameters=start.attribute)


## End(Not run)

## Not run: 
# example 2: analysis of car perception data

# load car data
data(car)

# estimate a disjunctive LCplfm with F=3 and T=2
# assume constant attribute-feature classification
# and class-specific car parameters (i.e. model 4)
# use 10 exploratory runs with random starting points 
car.LCplfm.disj<-LCplfm(data=car$data3w,F=3, T=2, M=10,model=4)

# print the output of the model 
print(car.LCplfm.disj)

# estimate an additive LCplfm with F=3 and T=2
# assume constant attribute-feature classification
# and class-specific car parameters (i.e. model 4)
# use 10 exploratory runs with random starting points 
car.LCplfm.add<-LCplfm(data=car$data3w,F=3, T=2, M=10, model=4, maprule="add")

# print the output of the model 
print(car.LCplfm.add)


## End(Not run)

## Not run: 

# example 3: estimation of multiple classification latent class 
# model (Maris, 1999) for cognitive diagnosis


# load subtraction data
library(CDM)
data(fraction.subtraction.data)
data(fraction.subtraction.qmatrix)


# create three-way data as input for LCplfm
I<-536
J<-1
K<-20
data3w<-array(c(as.matrix(fraction.subtraction.data)),c(I,J,K))

# add item labels

itemlabel<-c("5/3 - 3/4", 
"3/4 - 3/8", 
"5/6 - 1/9",
"3 1/2 - 2 3/2", 
"4 3/5 - 3 4/10", 
"6/7 - 4/7", 
"3 - 2 1/5", 
"2/3 - 2/3", 
"3 7/8 - 2", 
"4 4/12 - 2 7/12", 
"4 1/3 - 2 4/3", 
"1 1/8 - 1/8", 
"3 3/8 - 2 5/6", 
"3 4/5 - 3 2/5", 
"2 - 1/3", 
"4 5/7 - 1 4/7", 
"7 3/5 - 4/5", 
"4 1/10 - 2 8/10", 
"4 - 1 4/3", 
"4 1/3 - 1 5/3") 

dimnames(data3w)[[3]]<-itemlabel

# estimate multiple classification latent class model (Maris, 1999)

set.seed(537982)
subtract.m1.lst<-stepLCplfm(data3w,minF=3,maxF=5,minT=1,maxT=3,model=1,M=20,maprule="conj")


# print BIC values
sumar<-summary(subtract.m1.lst)
as.matrix(sort(sumar[,5]))

# print output best model
subtract.m1.lst[[5,2]]

# correlation between extracted skills and qmatrix
round(cor(fraction.subtraction.qmatrix,subtract.m1.lst[[5,2]]$attpar),2)

## End(Not run)

Probabilistic latent feature analysis of two-way two-mode frequency data

Description

Computation of parameter estimates, standard errors, criteria for model selection, and goodness-of-fit criteria for disjunctive, conjunctive or additive probabilistic latent feature models with F features.

Usage

plfm(data,object,attribute,rating,freq1,freqtot,F,
     datatype="freq",maprule="disj",M=5,emcrit1=1e-2,
     emcrit2=1e-10,printrun=TRUE)
plfm(data,object,attribute,rating,freq1,freqtot,F,
     datatype="freq",maprule="disj",M=5,emcrit1=1e-2,
     emcrit2=1e-10,printrun=TRUE)

Arguments

`data`	A data frame that consists of three components: the variables `object`, `attribute` and `rating`. Each row of the data frame describes the outcome of a binary rater judgement about the association between a certain object and a certain attribute.
`object`	The name of the `object` component in the data frame `data`. The values of the vector `data$object` should be (non-missing) numeric or character values.
`attribute`	The name of the `attribute` component in the data frame `data`. The values of the vector `data$attribute` should be (non-missing) numeric or character values.
`rating`	The name of the `rating` component in the data frame `data`. The elements of the vector `data$rating` should be the numeric values 0 (no association) or 1 (association), or should be specified as missing (NA).
`freq1`	A J X K matrix of observed association frequencies.
`freqtot`	A J X K matrix with the total number of binary ratings in each cell (j,k). If the total number of ratings is the same for all cells of the matrix it is sufficient to enter a single numeric value rather than a matrix. For instance, if N raters have judged J X K associations, one may specify `freqtot`=N
`F`	The number of latent features included in the model.
`datatype`	The type of data used as input. When `datatype`="freq" one should specify frequency data `freq1` and `freqtot`, and when `datatype`="dataframe" one should specify the name of the data frame `data`, and its components, `object`, `attribute` and `rating`.
`maprule`	Disjunctive (`maprule`="disj"), conjunctive (`maprule`="conj") or additive (`maprule`="add") mapping rule of the probabilistic latent feature model.
`M`	The number of times a particular model is estimated using random starting points.
`emcrit1`	Convergence criterion which indicates when the estimation algorithm should switch from Expectation-Maximization (EM) steps to EM+Newton-Rhapson steps.
`emcrit2`	Convergence criterion which indicates final convergence to a local maximum.
`printrun`	`printrun`=TRUE prints the analysis type (disjunctive, conjunctive or additive), the number of features (F) and the number of the run to the output screen, whereas `printrun`=FALSE suppresses the printing.

Details

Estimation

The function plfm uses an accelerated EM-algorithm to locate the posterior mode(s) of the probabilistic latent feature model. The algorithm starts with a series of Expectation-Maximization (EM) steps until the difference between subsequent values of the logarithm of the posterior density becomes smaller than the convergence criterion emcrit1, and then switches to an accelerated algorithm which consists of EM + Newton-Rhapson steps. The accelerated algorithm stops when the difference between subsequent values of the logarithm of the posterior density becomes smaller than the convergence criterion emcrit2. Computational details about the implementation of the EM-steps for PLFMs are described in Maris, De Boeck, and Van Mechelen (1996). The general scheme of the accelerated algorithm is described in Louis (1982) and Tanner (1996). Computational details about implementing the accelerated algorithm for PLFMs are described in Meulders (2013).

When using the function plfm to estimate a particular PLFM (i.e. with a certain number of latent features and specific mapping rule), one may locate the distinct posterior mode(s) by running the algorithm M times using random starting points. The estimated object-and attribute parameters of each run are stored in the objects objpar.runs and attpar.runs of the output list. Next, a number of additional statistics (estimated object- and attribute parameters, asymptotic standard errors of object- and attribute parameters, model selecion criteria and goodness-of-fit measures) are computed for the best model (i.e. the model among M runs with the highest posterior density).

Model selection criteria and goodness-of-fit measures

To choose among models with different numbers of features, or with different mapping rules, one may use information criteria such as the Akaike Information Criterion (AIC, Akaike, 1973, 1974), or the Schwarz Bayesian Information Criterion (BIC, Schwarz, 1978). AIC and BIC are computed as -2*loglikelihood+k*Npar. For AIC k equals 2 and for BIC k equals log(N), with N the observed number of replications for which object-attribute associations are collected. Npar represents the number of model parameters; for probabilistic latent feature models this equals (J+K)F. Models with the lowest value for AIC or BIC should be selected.

To assess the statistical fit of the probabilistic feature model one may use a Pearson chi-square measure on the J X K frequency table to evaluate whether predicted frequencies deviate significantly from observed frequencies (see Meulders et al., 2001). In addition, one may assess the descriptive fit of the model using the correlation between observed and expected frequencies in the J X K table, and the proportion of the variance in the observed frequencies accounted for by the model (VAF) (i.e. the squared correlation between observed and expected frequencies).

The model selection criteria AIC and BIC, the results of the Pearson goodness-of fit test, and the descriptive fit measures (correlation observed and expected frequencies, and VAF) are stored in the object fitmeasures of the output list

Value

`call`	Parameters used to call the function.
`objpar`	A J X F matrix of object parameters.
`attpar`	A K X F matrix of attribute parameters.
`fitmeasures`	A list of model selection criteria and goodness-of-fit criteria for the model with the highest posterior density.
`logpost.runs`	A list with the logarithm of the posterior density for each of the M computed models.
`objpar.runs`	A M X J X F array which contains the object parameters for each of the M computed models.
`attpar.runs`	A M X K X F array which contains the attribute parameters for each of the M computed models.
`bestsolution`	An index which indicates the model with the highest posterior density among each of the M computed models.
`gradient.objpar`	A J X F gradient matrix for the object parameters in the best solution.
`gradient.attpar`	A K X F gradient matrix for the attribute parameters in the best solution.
`SE.objpar`	A J X F matrix of asymptotic standard errors for the object parameters in the best solution.
`SE.attpar`	A K X F matrix of asymptotic standard errors for the attribute parameters in the best solution.
`prob1`	A J X K matrix of expected association probabilities for the best solution.