Smallholder farmers make decisions to adopt SAPs in response to external shocks such as drought, erosion, perceived decline in soil fertility, weeds, pests, and diseases. Both observed factors (e.g., age, gender, education and farm size) and unobserved factors (e.g., farmers’ innate abilities and motivations) may affect their decisions when choosing to adopt a single SAP or a package (Kassie et al. 2013; Teklewold et al. 2013a; Manda et al. 2016; Ehiakpor et al. 2021). Due to the self-selection nature of technology adoption, farmers without adopting any SAPs and those adopting a single SAP or package may be systematically different. The fact results in a selection bias issue, which should be addressed for consistently estimating the effects of SAP adoption.
When technology adoption has more than two options, previous studies have used either the multi-valued treatment effects (MVT) model (Cattaneo 2010; Ma et al. 2021; Czyżewski et al. 2022) or the multinomial endogenous switching regression (MESR) model (Kassie et al. 2015; Oparinde 2021; Ahmed 2022) to address the selection bias issues. For example,Czyżewski et al. (2022) estimated the long-term impacts of political orientation (economic views and individual value systems) on the environment using the MVT model. They confirmed that local orientation is conducive to long-term environmental care. Using the MESR model, Ahmed (2022) evaluated the impact of improved maize varieties and inorganic fertilizer on productivity and wellbeing. He found that combining the two technologies significantly boosts maize yield and consumption expenditure than adopting the technologies in isolation. Because of the non-parametric nature, the MVT model can only address the observed selection bias and does not account for unobserved section bias. In comparison, the MESR model can help mitigate selection bias issues arising from both observed and unobserved factors, and thus, it is employed in this study.
Multinomial endogenous switching regression
The MESR model estimate three stages. The first stage models factors affecting smallholder farmers’ decisions to adopt a specific SAP technology or a package. Following Teklewold et al. (2013a), this study focuses on three main SAP technologies, namely improved seeds (I), fertilizer (F), and soil and water conservation (cereal-legume rotation/cereal – legume intercropping, manure use, organic input use) (S). The three categories result in eight possible choices of SAPs. It bears an emphasis here that because of the small number of observations in the group that captures the combination of improved seed and fertilizer (26 samples) and the group that captures the combination of improved seed and soil and water conservation (9 samples), we combined them in empirical estimations. Also, it is worth noting here that no household has only adopted improved seed. These facts indicate that there are six mutually exclusive choices of SAP technology, including (1) non-adoption (I0F0S0); (2) fertilizer only (I0F1S0); (3) soil and water conservation only (I0F0S1); (4) combination of improved seed and fertilizer and combination of improved seed and soil and water conservation (I1F1S0); (5) combination of fertilizer and soil and water conservation (I0F1S1); (6) combination of improved seed, fertilizer, and soil and water conservation (I1F1S1). Farmers choose one of the six possible choices to maximize the expected benefit.
The study assumes that the error terms are identical and independently Gumbel distributed, the probability that farmer i, with X characteristics will choose package j, is specified using a multinomial logit model (McFadden 1973; Teklewold et al. 2013a; Zhou et al. 2020; Ma et al. 2022b). It is specified as follows:
$$P_{ij} = \Pr \left( {\eta_{ij} < 0|X_{i} } \right) = \frac{{\exp \left( {X_{i} \beta_{j} } \right)}}{{\mathop \sum \nolimits_{m = 1}^{J} \exp \left( {X_{i} \beta_{m} } \right)}}$$
(1)
where Pij represents the probability that a farmer i chooses to adopt SAP technology j. Xi is a vector of observed exogenous variables that capture household, plot, and location-level characteristics. βj is a vector of parameters to be estimated. The maximum likelihood estimation is used to estimate the parameters of the latent variable model.
In the second stage, the ordinary least square (OLS) model is used to establish the relationship between the outcome variables (farm income and food security) and a set of exogenous variables denoted by Z for the chosen SAP technology. Non-adoption of SAPs (i.e., base category, I0F0S0) is denoted as j = 1, with the other combinations denoted as (j = 2 …, 6). The possible equations for each regime is specified as:
$$\left\{ {\begin{array}{*{20}l} {{\text{Regime}}\,1:Q_{i1} = Z_{i} \alpha_{1} + u_{i1} \quad {\text{if}}\quad I = 1\quad \quad \quad \quad \quad \quad \,\,\,(2{\text{a}})} \hfill \\ \vdots \hfill \\ {{\text{Regime}}\,J: Q_{iJ} = Z_{i} \alpha_{J} + u_{iJ} \quad {\text{if}}\quad I = J\quad \quad \quad \quad \quad \quad (2{\text{b}})} \hfill \\ \end{array} } \right.$$
where I is an index that denotes farmer i’s choice of adopting a type of SAP technology; Qiis the outcome variables for the i-th farmer; Zi is a vector of exogenous variables; α1 and αJ are parameters to be estimated; ui1 and uiJ are the error terms.
Relying on a vector of observed covariates, captured by Zi, Eqs. (2a) and (2b) can help address the observed selection bias issue. However, if the same unobserved factors (e.g., farmers’ motivations to adopt SAPs) simultaneously influence farmers’ decisions to adopt SAPs and outcome variables, the error terms in Eqs. (2a) and (2b) and the error term in Eq. (1) would be correlated. In this case, unobserved selection bias occurs. Failing to address such type of selection bias would generate biased estimates. Within the MESR framework, the selectivity correction terms are calculated after estimating Eq. (1) and then included into Eqs. (2a) and (2b) to mitigate unobserved selection bias. Formally, Eqs. (2a) and (2b) can be rewritten as follows:
$$\left\{ {\begin{array}{*{20}l} {{\text{Regime}}\,1{:}\,Q_{i1} = Z_{i} \alpha_{1} + \lambda_{1} \sigma_{1} + \omega_{i1} \quad {\text{if}}\quad I = 1\quad \quad \quad \quad \quad \quad \,(3{\text{a}})} \hfill \\ \vdots \hfill \\ {{\text{Regime}}\,J{:}\, Q_{iJ} = Z_{i} \alpha_{J} + \lambda_{J} \sigma_{J} + \omega_{iJ} \quad {\text{if}}\quad I = J\quad \quad \quad \quad \quad \quad (3{\text{b}})} \hfill \\ \end{array} } \right.$$
where Qi and Zi are defined earlier; λ1 and λJ are selectivity correction terms used to address unobserved selection bias issues; σ1 and σJ are covariance between error terms in Eqs. (1), (2a) and (2b). In the multinomial choice setting, there are J − 1 selectivity-correction terms, one for each alternative SAP combination.
For consistently estimating the MESR model, at least one instrumental variable (IV) should be included in Xi in the MNL model but not in the Zi in the outcome equations. In this study, two distance variables, distance to weekly market and minutes 30 to the plot, are employed as IVs for model identification purposes. Distance to the weekly market is measured as a continuous variable, measured in minutes. The variable representing minutes 30 to plot is a dummy variable, which equals 1 if the plot is within 30 min from the homestead and 0 otherwise. The two IVs are not expected to affect farm income and food security directly. We checked the validity of the IVs by running the Falsification test and conducting the correlation coefficient analysis (Pizer 2016; Liu et al. 2021; Ma et al. 2022a). For the sake of simplicity, we did not report the results.
The average treatment effect on the treated (ATT) is calculated at the third step. This involves comparing the expected outcomes (farm income and food security) of SAP adopters and non-adopters, with and without adoption. Using experimental data, it is easier to establish impacts; however, this study is based on observational cross-sectional data, thus making impact evaluation a bit challenging. The challenge is mainly estimating the counterfactual outcome, i.e. the outcome of SAP adopters if they had not adopted the SAP technology. Following previous studies (Kassie et al. 2015; Oparinde 2021; Ahmed 2022), the study estimates ATT in the actual and the counterfactual scenarios using the following equations:
The outcome variables for SAP adopters with adoption (observed):
$$\left\{ {\begin{array}{*{20}l} {E\left( {Q_{i2} |I = 2} \right) = Z_{i} \alpha_{2} + \sigma_{2} \lambda_{2} \quad \quad \quad \quad \quad \quad (4{\text{a}})} \hfill \\ \vdots \hfill \\ {E\left( {Q_{iJ} |I = J} \right) = Z_{i} \alpha_{J} + \sigma_{J} \lambda_{J} \quad \quad \quad \quad \quad \quad (4{\text{b}})} \hfill \\ \end{array} } \right.$$
The outcome variables for SAP adopters had they decided not to adopt (Counterfactual):
$$\left\{ {\begin{array}{*{20}l} {E\left( {Q_{i1} {|}I = 2} \right) = Z_{i} \alpha_{1} + \sigma_{1} \lambda_{2} \quad \quad \quad \quad \quad \quad (5{\text{a}})} \hfill \\ \vdots \hfill \\ {E\left( {Q_{i1} {|}I = J} \right) = Z_{i} \alpha_{1} + \sigma_{1} \lambda_{J} \quad \quad \quad \quad \quad \quad (5{\text{b}})} \hfill \\ \end{array} } \right.$$
The difference between Eqs. (4a) and (5a) or Eqs. (4b) and (5b) is the ATT. For example, the difference between Eqs. (4a) and (5a) is given as:
$${\text{ATT}} = E\left[ {Q_{i2} |I = 2} \right] - E[Q_{i1} | I = 2] = Z_{i} \left( {\alpha_{2} - \alpha_{1} } \right) + \lambda_{2} \left( {\sigma_{2} - \sigma_{1} } \right)$$
(6)
Data and variables
The study used data collected by IITA for their Africa RISING project (https://africa-rising.net/) in the three northern regions, namely, Northern, Upper East, and Upper West regions. The data was collected in 2014 from 1284 households operating approximately 5500 plots in 50 rural communities in northern Ghana. The baseline survey used a stratified two-stage sampling technique, and data was collected using Computer Assisted Personal Interviewing (CAPI) supported by Survey CTO software on tablets (Tinonin et al. 2016). A structured questionnaire was used to conduct the household interviews. The data covers the various SAP technologies, demographic characteristics, agricultural land holdings, crop outputs and sales, livestock production, farmers’ access to agricultural information and knowledge, access to credit and markets, household assets, and income.
The outcome variables for this study are farm income and food security. The farm income of crops cultivated is obtained by valuing the yield of crops at market price and deducting the costs of all variable inputs. Two variables capture food security, including reduced coping strategy index (rCSI) and household dietary diversity (HDD). Specifically, the rCSI is an index that is measured by scoring coping strategies households use (and frequency of use) when they experience food insecurity. rCSI is an index with five standardized questions on the coping strategies used when faced with food insecurity, the more strategies used, and food insecure the household is. The rCSI score ranges from 0 to 63. A higher level of rCSI score means a higher level of food insecurity. The HDD variable is based on the diverse food groups a household consumes. The higher the score, the more diverse the diet of a household, and the more food secure the household is. Drawing upon previous empirical studies on the adoption of SAPs and related agricultural innovations (Kassie et al. 2013; Teklewold et al. 2013a; Manda et al. 2016; Bopp et al. 2019; Oyetunde Usman et al. 2020; Ma and Wang 2020; Ehiakpor et al. 2021; Pham et al. 2021), we have identified and selected a range of control variables that may influence the adoption of SAPs. These include age, gender, education, marital status, household size, farm size, off-farm income, Africa RISING member, extension, extension satisfaction, number of crops, drought and floods, market access, sandy soil, clay soil, flat slope, moderate to steep, and location dummies.