多层模型及其交互效应入门教程

2023年09月02日

文章目录

1. 教程目标
2. 教程用到的库（请自行安装）
3. 数据介绍
4. 数据整理
1. 4.0.0.1. 感知压力的反向编码
2. 4.0.0.2. 整理变量，划分为个体间和个体内变量

5. 混合效应模型

5.1. 零模型
5.2. 混合效应模型的构建
5.3. 可视化
5.4. 增加神经质变量作为预测变量
1. 5.4.1. 结果解读
  1. 5.4.1.1. 固定效应
  2. 5.4.1.2. 随机效应
5.5. 调节效应的绘制
5.6. 选点法

6. 一些有用函数

7. 数据下载

本教程介绍了如何使用多层模型来分析嵌套数据，以及调节效应在跨层结构中的分析方法。
我们的案例数据是日记数据（嵌套在个人中的重复事件），意思是每个人都会重复测量多次，但也适用于其他类型的嵌套数据。

教程目标

掌握嵌套数据的数据结构，并且如何构建这样的数据
构建多层模型（multilevel model），有人也叫多水平回归、多水平模型，但是都属于混合效应模型（Mixed effect model）
用多层模型分析个体间变量关系以及个体内部变量关系
交互效应的可视化

教程用到的库（请自行安装）

library(ggplot2)       # for 数据可视化
library(lme4)          # for 混合效应模型（多层模型）的构建
library(lmerTest)      # for 计算P值
library(psych)         # for 描述统计
library(plyr)          # for 数据整理
library(effects)       # for 检验调节效应
library(interactions)  # for 调节效应可视化

数据介绍

我们的数据来自于一个重复测量的调查研究，虽然我们没有找到关于数据的详细介绍，但是我们至少清楚我们用到的这几个变量是什么意义。
数据包含两个文件”daily-data.csv”和”person-data.csv”，这两个文件的意义就是文件名所揭示的，第一个文件是重复测量的数据，
被试的每天汇报的数据，第二个文件是个体的特征数据，它只采集了一次。

我们用到的变量:

negaff: 英文全称是 daily negative affect ，来自于重复测量的数据，是每天采集的消极情绪数据
stress: 变量pss的反向编码，代表被试每天的压力，是重复测量的数据
bfi_n：这是大五人格量表中的神经质变量，它是稳定的人格特征，所以不是重复测量数据，属于个体层面的变量

下面我们加载两个数据，取出用到的变量：

pdata = read.csv("person-data.csv")
ddata = read.csv("daily-data.csv")
pdata <- pdata[ ,c("id","bfi_n")]
ddata <- ddata[ ,c("id","day","negaff","pss")]
head(pdata)

输出(html):

A data.frame: 6 × 2
	id	bfi_n
	<int>	<dbl>
1	101	2.0
2	102	2.0
3	103	2.5
4	104	2.5
5	105	3.5
6	106	1.5

我们有必要介绍一下日测数据，变量day记录的是第几日，每一行数据不是被试样本，是被试每天的数据，你可以看下面的数据，
这种格式的数据叫做长格式，用于存储重测数据的常用格式。

1	head(ddata)

输出(html):

A data.frame: 6 × 4
	id	day	negaff	pss
	<int>	<int>	<dbl>	<dbl>
1	101	0	3.0	2.50
2	101	1	2.3	2.75
3	101	2	1.0	3.50
4	101	3	1.3	3.00
5	101	4	1.1	2.75
6	101	5	1.0	2.75

数据整理

感知压力的反向编码

pss是感知压力，但是数据中是分数越大压力越小，所以我们反向编码一下，
使得分数越大压力越大。

1
2
3

ddata$stress <- 4 - ddata$pss

psych::describe(ddata$stress)

输出(html):

A psych: 1 × 13
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
X1	1	1445	1.385525	0.6843377	1.25	1.36344	0.7413	0	4	4	0.3549276	0.1266323	0.01800266

所有的变量都建议检查一下它的分布，我们可以使用直方图来看一下：

1
2
3

ggplot(data=ddata, aes(x=stress)) +
  geom_histogram(fill="white", color="black",bins=19) +
  labs(x = "压力")

输出(stream):
Warning message: "[1m[22mRemoved 13 rows containing non-finite values (`stat_bin()`)."

整理变量，划分为个体间和个体内变量

现在我们分清楚个体间变量和个体内变量，个体间变量就是不随时间变化的，同一个个体相同的值，不同的个体不同的值，这些变量可以称为人的特质；
个体内变量是随时间变化的，同一个被试随着时间的变化量，这种变量可以称为状态，因为状态是随时间变化的。

我们先要要处理的是压力（stress）这个变量，显然它是状态变量，但是每个人的状态都是围绕的自己的均值变化的，
所以我们可以从状态中计算个体均值，代表他的特质，所以新生成的变量 stress_trait 就是压力特质，不随时间变化的变量。我们可以这样计算这个变量：

# 计算个体均值
# negaff_trait虽然不是模型中的变量， 但是我们后期可能会用到它，所以也在这里一并计算了
personmeans <- ddply(ddata, "id", summarize,
                       stress_trait = mean(stress, na.rm=TRUE),
                       negaff_trait = mean(negaff, na.rm=TRUE))
head(personmeans)
describe(personmeans)

输出(html):

A data.frame: 6 × 3
	id	stress_trait	negaff_trait
	<int>	<dbl>	<dbl>
1	101	1.06250	1.500000
2	102	0.78125	2.218750
3	103	1.25000	2.416667
4	104	1.81250	1.550000
5	105	1.75000	2.612500
6	106	1.12500	2.071875

输出(html):

A psych: 3 × 13
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
	<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
id	1	190	318.294737	130.4413245	321.50000	318.993421	151.2252000	101.0000	532.0000	431.000	-0.04393582	-1.0945398	9.46320829
stress_trait	2	190	1.395533	0.4788391	1.40625	1.394792	0.5096437	0.1875	2.5625	2.375	-0.04026841	-0.2337593	0.03473864
negaff_trait	3	190	2.478309	0.7335710	2.41250	2.429841	0.7227675	1.1125	5.0875	3.975	0.67593800	0.4505905	0.05321883

# 将新生成的个体特质变量合并到pdata数据框, pdata 是被试样本
pdata <- merge(pdata, personmeans, by="id")                                              
# 后面我们会分析调节效应， 对变量中心化是必然的步骤，这一步就是变量中心化
pdata$bfi_n_c <- scale(pdata$bfi_n,center=TRUE,scale=FALSE)
pdata$stress_trait_c <- scale(pdata$stress_trait,center=TRUE,scale=FALSE)
# 个体数据的描述性统计
describe(pdata)

输出(html):

A psych: 6 × 13
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
	<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
id	1	190	3.182947e+02	130.4413245	321.50000000	3.189934e+02	151.2252000	101.000000	532.000000	431.000	-0.04393582	-1.0945398	9.46320829
bfi_n	2	190	2.981579e+00	0.9558661	3.00000000	2.996711e+00	1.4826000	1.000000	5.000000	4.000	-0.09238813	-0.8173050	0.06934582
stress_trait	3	190	1.395533e+00	0.4788391	1.40625000	1.394792e+00	0.5096437	0.187500	2.562500	2.375	-0.04026841	-0.2337593	0.03473864
negaff_trait	4	190	2.478309e+00	0.7335710	2.41250000	2.429841e+00	0.7227675	1.112500	5.087500	3.975	0.67593800	0.4505905	0.05321883
bfi_n_c	5	190	1.683047e-16	0.9558661	0.01842105	1.513158e-02	1.4826000	-1.981579	2.018421	4.000	-0.09238813	-0.8173050	0.06934582
stress_trait_c	6	190	1.927149e-17	0.4788391	0.01071742	-7.409148e-04	0.5096437	-1.208033	1.166967	2.375	-0.04026841	-0.2337593	0.03473864

将个体数据合并到重复测量的日测数据


ddata_long <- merge(ddata,pdata,by="id")

# 计算状态变量，即减去个体均值
ddata_long$stress_state <- ddata_long$stress - ddata_long$stress_trait
ddata_long$negaff_state <- ddata_long$negaff - ddata_long$negaff_trait

# 查看数据
head(ddata_long)

输出(html):

A data.frame: 6 × 12
	id	day	negaff	pss	stress	bfi_n	stress_trait	negaff_trait	bfi_n_c	stress_trait_c	stress_state	negaff_state
	<int>	<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl[,1]>	<dbl[,1]>	<dbl>	<dbl>
1	101	0	3.0	2.50	1.50	2	1.0625	1.5	-0.9815789	-0.3330326	0.4375	1.5
2	101	1	2.3	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	0.8
3	101	2	1.0	3.50	0.50	2	1.0625	1.5	-0.9815789	-0.3330326	-0.5625	-0.5
4	101	3	1.3	3.00	1.00	2	1.0625	1.5	-0.9815789	-0.3330326	-0.0625	-0.2
5	101	4	1.1	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.4
6	101	5	1.0	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.5

我们取出前25个被试的数据，通过绘制每个被试的日次数据，我们可以大概了解到被试之间的差异有多大，
比如看下面的图，每个被试的拟合回归线的斜率都有很大差异，着预示着压力和消极情绪的关系是因人而异的。

#faceted plot
ggplot(data=ddata_long[which(ddata_long$id <= 125),], aes(x=stress_state,y=negaff)) +
  geom_point() +
  stat_smooth(method="lm", fullrange=TRUE) +
  xlab("Stress State") + ylab("Negative Affect (Continuous)") + 
  facet_wrap( ~ id) +
  theme(axis.title=element_text(size=16),
        axis.text=element_text(size=14),
        strip.text=element_text(size=14))

输出(stream):
[1m[22m`geom_smooth()` using formula = 'y ~ x' Warning message: "[1m[22mRemoved 4 rows containing non-finite values (`stat_smooth()`)." Warning message: "[1m[22mRemoved 4 rows containing missing values (`geom_point()`)."

混合效应模型

我们使用 “lme4” 包来拟合混合效应模型，以及一些辅助的R包： “lmerTest” 提供了用于获取参数检验的 p-vlaues 的工具；
“effects”包提供了用于计算和绘制基于模型的预测的工具；”interactions” 提供了绘制和探测交互效应的工具。

lme4 提供了函数 lmer ，它用于拟合多层数据模型，或者是混合效应模型。它的第一个参数是data，输入你的原始数据，
na.action 参数用于指定对缺失值的处理方法。

零模型

模型里面没有自变量，或者没有我们关心的变量，所以这种模型叫0模型。
零模型可以被称为 unconditional means model ，它用于考察因变量的方差有多少来自被试内，有多少来自被试间。

#unconditional means model
model0_fit <- lmer(formula = negaff ~ 1 + (1|id), 
              data=ddata_long,
              na.action=na.exclude)
summary(model0_fit)

输出(plain):
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: negaff ~ 1 + (1 | id)
Data: ddata_long

REML criterion at convergence: 3833.5

Scaled residuals:
Min 1Q Median 3Q Max
-3.8739 -0.6123 -0.1608 0.4658 3.9394

Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.4270 0.6535
Residual 0.6627 0.8141
Number of obs: 1441, groups: id, 190

Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.46368 0.05229 185.80793 47.12 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1
2
3


# 我们使用 VarCorr 函数提取方差 ， id这一行对应的是被试内方差， Residual对应的是残差方差（不可以被被试差异所解释的方差
VarCorr(model0_fit)

输出(plain):
Groups Name Std.Dev.
id (Intercept) 0.65347
Residual 0.81408

我们来计算组间相关性（ICC），它指的是被试内方差占总方差的比率，如公式：

$ ICC_{between} = \frac{\sigma^{2}_{u0}}{\sigma^{2}_{u0} + \sigma^{2}_{e}} $

我们首先取药提取得到随机效应的方差，我们将结果保存到数据框中，方便提取数据，毕竟数据框是我们在R中最熟悉的数据给格式。

1 2	randEffs <- as.data.frame(VarCorr(model0_fit)) randEffs

输出(html):

A data.frame: 2 × 5
grp	var1	var2	vcov	sdcor
<chr>	<chr>	<chr>	<dbl>	<dbl>
id	(Intercept)	NA	0.4270294	0.6534749
Residual	NA	NA	0.6627260	0.8140798

根据上面的公式，可以计算得到ICC：

u0v = randEffs[1,4]
ev = randEffs[2,4]

icc <- u0v / (u0v+ev)
icc

输出(html):
0.391858025596418

根据无条件均值模型计算ICC，结果表明，在负面情绪的总方差中，39.19%归因于人与人之间的差异，60.81%归因于人内差异。
这意味着使用随时间变化的变量作为预测变量时，存在很大一部分人内方差，着意味着我们构建多层模型、混合效应模型是非常有必要的。
专家建议ICC超过0.05（5%）时就应该考虑使用混合效应模型。

混合效应模型的构建

我们在模型里纳入了很多自变量，我们一一解释：

1 : 这是常数项
day ：我们的因变量时随时间变化的，所以纳入了时间day
stress_trait_c ：这是压力特质变量，其实是被试每天的压力的平均值来代表它的压力特质，而 stress_trait_c 是压力特质中心化的变量，为什么要做中心化？因为我们需要把它作为调节变量
stress_state ：这个变量也是经过处理的，是每日的压力值减去了被试均值，意思是这个变量是经过了被试内的中心化，所以这个变量代表了被试压力偏离他的均值的多少
stress_state:stress_trait_c ：调节项， stress_state对消极情绪的影响大小可能受到stress_trait_c的影响
(1 + stress_state|id) ：这是混合效应模型中定义随机效应的方法，这种写法含义是模型的截距和stress_state的效应都随id的不同而不同， id是被试id

model1 <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      stress_state + stress_state:stress_trait_c + 
                      (1 + stress_state|id), 
                    data=ddata_long,
                    na.action=na.exclude)
summary(model1)

输出(plain):
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
negaff ~ 1 + day + stress_trait_c + stress_state + stress_state:stress_trait_c +
(1 + stress_state | id)
Data: ddata_long

REML criterion at convergence: 3162.4

Scaled residuals:
Min 1Q Median 3Q Max
-3.5368 -0.6127 -0.0729 0.5093 4.4164

Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 0.2135 0.4621
stress_state 0.1257 0.3546 0.53
Residual 0.4038 0.6355
Number of obs: 1438, groups: id, 190

Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.695e+00 4.583e-02 3.922e+02 58.806 <2e-16
day -6.580e-02 7.552e-03 1.250e+03 -8.713 <2e-16
stress_trait_c 1.038e+00 7.946e-02 1.859e+02 13.067 <2e-16
stress_state 7.647e-01 4.561e-02 1.664e+02 16.765 <2e-16
stress_trait_c:stress_state 1.550e-01 9.780e-02 1.584e+02 1.585 0.115

(Intercept) ***
day ***
stress_trait_c ***
stress_state ***
stress_trait_c:stress_state
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr) day strs__ strss_
day -0.569
strss_trt_c 0.004 0.007
stress_stat 0.216 0.012 0.002
strss_tr_:_ 0.034 -0.057 0.268 -0.118

结果解读

固定效应 Fixed Effects:

(Intercept): 他是模型中所有变量取0时，因变量的值，具体到这个模型，你可以说在第0天被试的平均消极情是2.695
day: 在调研的这几天里，被试的消极情绪随时间逐渐降低，每过一天，消极情绪降低-6.580e-02
stress_trait_c：压力特质比较高的被试具有较多的消极情绪，压力特质增加一个单位消极情绪增加1.308
stress_state : 被试当天感受到的压力越多他的消极情绪越多，压力增加1个单位消极情绪增加0.76
stress_trait_c:stress_state ：交互效应不显著(0.16, p = 0.11)，这意味着stress_trait的效应量不受stress_trait_c的影响，意味着被试的压力特质对压力状态和消极情绪的调节效应不显著

随机效应 Random Effects:

sd((Intercept)): 每个被试的模型截距不同，而这个截距的方差就是 0.2135
sd(stress_state)： stress_state 对因变量消极情绪的效应不是固定不变的，这个效应随被试不同而不同，而这种被试导致的效应的方差是 0.1257
Corr ： Intercept 和 stress_state 是两个随机变量，这两个随机变量的相关系数是 0.53，这个相关系数较大，意味着被试预期的消极情绪越大， stress_state 对消极情绪的影响也越大

预测值

基于样本数据和已有模型，可以估计因变量的值，也叫因变量的预测值，如何获取因变量预测值：

1
2
3

# 保存模型的预测结果
ddata_long$pred_m1 <- predict(model1)
head(ddata_long)

输出(html):

A data.frame: 6 × 13
	id	day	negaff	pss	stress	bfi_n	stress_trait	negaff_trait	bfi_n_c	stress_trait_c	stress_state	negaff_state	pred_m1
	<int>	<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl[,1]>	<dbl[,1]>	<dbl>	<dbl>	<dbl>
1	101	0	3.0	2.50	1.50	2	1.0625	1.5	-0.9815789	-0.3330326	0.4375	1.5	2.122456
2	101	1	2.3	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	0.8	1.908520
3	101	2	1.0	3.50	0.50	2	1.0625	1.5	-0.9815789	-0.3330326	-0.5625	-0.5	1.398305
4	101	3	1.3	3.00	1.00	2	1.0625	1.5	-0.9815789	-0.3330326	-0.0625	-0.2	1.628788
5	101	4	1.1	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.4	1.711131
6	101	5	1.0	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.5	1.645335

获取参数的置信区间

可以使用confint函数获取参数的置信区间，但是结果比较难看懂，因为里面很多参数是我们不熟悉的（sig01， sig02 等），
因为这些参数的名字其实是来自于模型的推导中用到的，你没有深入了解过混合效应模型的参数估计方法，你可能看不懂，
不过，我们只能试着让你有一个感性认识：

.sig01 随机截距的标准差，如果你想获得方差的置信区间，只需要把这个值平方一下
sig02 随机截距和随机斜率的相关系数
sig03 随机斜率的标准差
sigma 残差标准差

1	confint(model1)

输出(stream):
Computing profile confidence intervals ...

输出(html):

A matrix: 9 × 2 of type dbl
	2.5 %	97.5 %
.sig01	0.40390297	0.52245693
.sig02	0.27635505	0.77129752
.sig03	0.25230527	0.45069358
.sigma	0.60962930	0.66244352
(Intercept)	2.60530435	2.78468853
day	-0.08058845	-0.05099124
stress_trait_c	0.88249219	1.19390741
stress_state	0.67339998	0.85449747
stress_trait_c:stress_state	-0.03788643	0.34703329

可视化

先看被试层面的变量， negaff_trait 是消极情绪特质，其实就是每天的消极情绪取被试内的均值，代表了被试每天的平均消极情绪，
这个变量很可能受到压力特质（被试每天压力状态的一个被试内均值）的影响，因此可以绘制这两个变量的关系：

ggplot(data=personmeans, aes(x=stress_trait, y=negaff_trait, group=factor(id)), legend=FALSE) +
  geom_point(colour="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
  xlab("Trait Stress") + ylab("Trait Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=16),
        axis.text=element_text(size=12),
        plot.title=element_text(size=16, hjust=.5)) +
  ggtitle("Between-Person Association Plot\nTrait Stress & Negative Affect")

输出(stream):
Warning message: "[1m[22mUsing `size` aesthetic for lines was deprecated in ggplot2 3.4.0. [36mℹ[39m Please use `linewidth` instead." [1m[22m`geom_smooth()` using formula = 'y ~ x'

下面绘制negaff_state和stress_state的关系，因为这两个变量都是日测数据，因此每个被试都应当是不同的。

ggplot(data=ddata_long, aes(x=stress_state, y=negaff_state, group=factor(id), colour="gray"), legend=FALSE) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, size=.5, color="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
  xlab("Stress State") + ylab("Predicted State Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=18),
        axis.text=element_text(size=14),
        plot.title=element_text(size=18, hjust=.5)) +
  ggtitle("Within-Person Association Plot\nPerceived Stress & Negative Affect")

输出(stream):
[1m[22m`geom_smooth()` using formula = 'y ~ x' Warning message: "[1m[22mRemoved 20 rows containing non-finite values (`stat_smooth()`)." [1m[22m`geom_smooth()` using formula = 'y ~ x' Warning message: "[1m[22mRemoved 20 rows containing non-finite values (`stat_smooth()`)."

增加神经质变量作为预测变量

bfi_n_c 是大五人格量表中的神经质变量，它经过处理得到了中心化后的变量，
这个变量中心化的目的也是因为这个变量会参与调节效应，构建模型如下：

# fit model
model2 <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      bfi_n_c + stress_trait_c:bfi_n_c +
                      stress_state + stress_state:stress_trait_c + 
                      stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c + 
                      (1 + stress_state|id),
                    data=ddata_long,
                    na.action=na.exclude)
#Look at results
summary(model2)

输出(plain):
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
Data: ddata_long

REML criterion at convergence: 3161.8

Scaled residuals:
Min 1Q Median 3Q Max
-3.4271 -0.6011 -0.0749 0.5045 4.4732

Random effects:
Groups Name Variance Std.Dev. Corr
id (Intercept) 0.1955 0.4422
stress_state 0.1238 0.3518 0.51
Residual 0.4040 0.6356
Number of obs: 1438, groups: id, 190

Fixed effects:
Estimate Std. Error df t value
(Intercept) 2.690e+00 4.556e-02 3.944e+02 59.055
day -6.545e-02 7.572e-03 1.247e+03 -8.644
stress_trait_c 9.695e-01 7.878e-02 1.835e+02 12.307
bfi_n_c 1.543e-01 3.917e-02 1.809e+02 3.939
stress_state 7.687e-01 4.682e-02 1.673e+02 16.418
stress_trait_c:bfi_n_c 3.715e-02 7.832e-02 1.824e+02 0.474
stress_trait_c:stress_state 1.254e-01 1.015e-01 1.692e+02 1.235
bfi_n_c:stress_state 7.595e-02 4.845e-02 1.549e+02 1.568
stress_trait_c:bfi_n_c:stress_state -3.167e-02 1.024e-01 1.722e+02 -0.309
Pr(>|t|)
(Intercept) < 2e-16 ***
day < 2e-16 ***
stress_trait_c < 2e-16 ***
bfi_n_c 0.000117 ***
stress_state < 2e-16 ***
stress_trait_c:bfi_n_c 0.635819
stress_trait_c:stress_state 0.218473
bfi_n_c:stress_state 0.119006
stress_trait_c:bfi_n_c:stress_state 0.757565
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr) day strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
day -0.576
strss_trt_c 0.003 0.005
bfi_n_c -0.008 0.007 -0.224
stress_stat 0.204 0.004 0.000 0.001
strss_t_:__ -0.179 0.008 0.008 0.040 -0.054
strss_tr_:_ 0.042 -0.073 0.248 -0.056 -0.087 0.003
bf_n_c:str_ -0.033 0.058 -0.058 0.258 0.016 0.009 -0.244
strs__:__:_ -0.061 0.032 0.004 0.008 -0.233 0.243 -0.103 -0.059

结果解读

固定效应

(Intercept): 截距项， stress_trait_c 和 bfi_n_c 都取0的时候（均值），因变量的期望是 2.69
stress_trait_c ：被试的压力特质越高，感受到的消极情绪越多
bfi_n_c ：神经质得分越高的人感受到的消极情绪越多(0.15, p = 0)
stress_state ：当天感受压力越大，当天的消极情绪越多（0.77， p=0）
stress_trait_c:bfi_n_c ：神经质对压力特质的调节效应不显著(0.04, p = 0.64)
stress_trait_c:stress_state ：压力特质对日间压力状态没有调节效应(0.13, p = 0.22)
bfi_n_c:stress_state: 神经质对日间压力状态没有调节效应(0.08, p = 0.12)
stress_trait_c:bfi_n_c:stress_state：神经质对压力特质和压力状态的调节效应的调节作用不显著

随机效应

与之前的结论一致，在此略去

获取模型预测值

1 2	ddata_long$pred_m2 <- predict(model2) head(ddata_long)

输出(html):

A data.frame: 6 × 14
	id	day	negaff	pss	stress	bfi_n	stress_trait	negaff_trait	bfi_n_c	stress_trait_c	stress_state	negaff_state	pred_m1	pred_m2
	<int>	<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl[,1]>	<dbl[,1]>	<dbl>	<dbl>	<dbl>	<dbl>
1	101	0	3.0	2.50	1.50	2	1.0625	1.5	-0.9815789	-0.3330326	0.4375	1.5	2.122456	2.097006
2	101	1	2.3	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	0.8	1.908520	1.888382
3	101	2	1.0	3.50	0.50	2	1.0625	1.5	-0.9815789	-0.3330326	-0.5625	-0.5	1.398305	1.393416
4	101	3	1.3	3.00	1.00	2	1.0625	1.5	-0.9815789	-0.3330326	-0.0625	-0.2	1.628788	1.614307
5	101	4	1.1	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.4	1.711131	1.692027
6	101	5	1.0	2.75	1.25	2	1.0625	1.5	-0.9815789	-0.3330326	0.1875	-0.5	1.645335	1.626575

调节效应的绘制

根据上面的结果，stress_state:bfi_n_c的调节效应是显著的，意味着我们有必要进一步将调节效应可视化，
通常我们有两种可视化调节效应的方法， 1是选点法，就是自变量和调节变量选择 M±SD 的值作为点，绘制在不同调节变量取值下，自变量与因变量的关系；
另一种方法是绘制JN图，它可以看到调节变量取值什么范围内，自变量对因变量的效应是显著的。

选点法

我们先看下自变量和调节变量的描述性统计，因为我们需要用到均值和标准差。

1 2	describe(ddata_long$bfi_n_c) describe(ddata_long$stress_state)

输出(html):

A psych: 1 × 13
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
X1	1	1458	-0.01210021	0.9564619	0.01842105	0.002582012	1.4826	-1.981579	2.018421	4	-0.07586913	-0.7908849	0.02504891

输出(html):

A psych: 1 × 13
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
X1	1	1445	2.378342e-18	0.4943735	-0.03125	-0.01547413	0.4633125	-1.75	2.125	3.875	0.3580108	0.7924172	0.01300533

我们使用 effect 函数来计算不同自变量和调节变量取值下因变量的值， effect 函数的term参数就是你关注的交互项，即自变量和调节变量的乘积。
mod 参数就是我们之前拟合得到的模型； xlevels 用于设置选点值，比如自变量stress_state的正负一个标准差的值就是c(-0.49,+0.49) 。

1
2
3

#calculate effect
effects_model2 <- effect(term="bfi_n_c*stress_state", mod=model2,xlevels=list(bfi_n_c=c(-0.96, +0.96), stress_state=c(-0.49,+0.49)))
summary(effects_model2)

输出(stream):
NOTE: bfi_n_c:stress_state is not a high-order term in the model Warning message in Analyze.model(focal.predictors, mod, xlevels, default.levels, : "the predictors stress_trait_c, bfi_n_c are one-column matrices that were converted to vectors"

输出(plain):

bfi_n_c*stress_state effect
stress_state
bfi_n_c -0.49 0.49
-0.96 1.964450 2.644774
0.96 2.188157 3.011998

Lower 95 Percent Confidence Limits
stress_state
bfi_n_c -0.49 0.49
-0.96 1.858012 2.510688
0.96 2.080619 2.876440

Upper 95 Percent Confidence Limits
stress_state
bfi_n_c -0.49 0.49
-0.96 2.070889 2.778860
0.96 2.295694 3.147556

这个结果输出的是自变量和调节变量不同取值下，因变量的值；同时输出了因变量值的95%置信区间。有了这些数据，我们就可以绘制简单效应图。

#convert to dataframe
effectsdata <- as.data.frame(effects_model2)
#plotting the effect evaluation (with standard error ribbon)
ggplot(data=effectsdata, aes(x=stress_state, y=fit, group=bfi_n_c), legend=FALSE) + 
  geom_point() +
  geom_line() +
  #geom_ribbon(aes(ymin=lower, ymax=upper), alpha=.3) +
  geom_errorbar(aes(ymin=lower, ymax=upper), width=.15) +
  xlab("Stress State") + xlim(-2,2) +
  ylab("Predicted Negative Affect") + ylim(1,7) +
  ggtitle("Differences in Stress Reactivity across Neuroticism")

johnson_neyman 这个函数可以用于绘制 JN图，关于这个图的原理可以看这篇文章《Johnson-Neyman图原理和制作Excel工具分享》，并且这篇文章介绍了如何使用excel绘制JN图。

1	johnson_neyman(model=model2, pred=stress_state, modx=bfi_n_c)

输出(plain):
[1m[4mJOHNSON-NEYMAN INTERVAL[24m[22m

When bfi_n_c is [7mINSIDE[27m the interval [-4.45, 40.14], the slope of
stress_state is p < .05.

[3mNote: The range of observed values of bfi_n_c is [23m[-1.98, 2.02]

一些有用函数

1	BIC(logLik(model2))

输出(html):
3256.29897530418

1	logLik(logLik(model2))

输出(plain):
'log Lik.' -1580.888 (df=13)

1	BIC(logLik(model2))

输出(html):
3256.29897530418

数据下载

本教程所有用到的代码和数据都可以在这里下载。

注意
统计咨询请加QQ 2726725926, 微信 shujufenxidaizuo, SPSS统计咨询是收费的, 不论什么模型都可以, 只限制于1个研究内.

#R语言 #调节效应 #多层次模型 #混合效应

多层模型及其交互效应入门教程

教程目标

教程用到的库（请自行安装）

数据介绍

数据整理

感知压力的反向编码

整理变量，划分为个体间和个体内变量

混合效应模型

零模型

混合效应模型的构建

结果解读

固定效应 Fixed Effects:

随机效应 Random Effects:

预测值

获取参数的置信区间

可视化

增加神经质变量作为预测变量

结果解读

固定效应

随机效应

调节效应的绘制

选点法

一些有用函数

数据下载

赞助

赞助推荐

常用工具

R语言

调节效应

多层次模型

混合效应

友商赞助