The population of Sub-Saharan Africa is growing fast, and 70% of the population is in rural areas that depend on the agricultural sector as a source of livelihood. The sector is not growing fast enough to meet food adequacy. Much of the agricultural growth achieved to date is by the expansion of agricultural land area. In the face of an increasing population, agricultural land expansion has reached its geographical limits and has become a leading cause of soil fertility decline and environmental degradation (Wiggins 2000; Breisinger et al. 2011). The agricultural sector is still an important economic sector, and it employs over 50% of working adults and over 65% of the labor force (Gollin, Parente, and Rogerson 2002). Improving agricultural production and productivity through adoption of improved agricultural technologies is an important pathway that will improve livelihoods of the majority and enhance food security. Adoption of new and improved practices, expansion of rural financial markets, increased capital and equipment ownership, and development of research and extension linkages could all contribute to increases in productivity, which is a prerequisite for poverty alleviation and enhanced food security (Von Braun, Ruel, and Gillespie 2010; Wesley and Faminow 2014). While many countries in Asia, the Caribbean, and Latin America have registered production and productivity gains from adopting agricultural technologies such as hybrid seeds, inorganic fertilizer, and irrigation, in Sub-Saharan Africa, the adoption of promising agricultural technologies has been far from ubiquitous and has remained particularly low. For example, Gollin, Morris, and Byerlee (2005) show that improved maize varieties accounted for 17% of the total area harvested in Sub-Saharan Africa compared to 90% in East and South East Asia and the Pacific and 57% in Latin America and the Caribbean. Primarily cultivated by smallholder farmers for domestic consumption, sorghum thrives in harsh climates, is drought resistant, and can improve food security and mitigate the influence of climate change especially among vulnerable populations (Ahmed, Sanders, and Nell 2000). The sorghum crop is an important source of protein and nutrients for millions of people. In West Africa, sorghum accounts for 70% of total cereal production (Atokple 2003). The adoption rates of improved sorghum varieties (ISVs) vary significantly within Sub-Saharan Africa, with Southern Africa having higher adoption rates than other parts of the region. The sorghum crop consistently accounts for more than 30% of the total cultivated land, and 23% of the total sorghum crop area is planted with improved varieties. In most parts of West Africa, the area with ISVs is less than 2% of the total cultivated land (Cline 2007; Burke, Lobell, and Guarino 2009). As discussed in Gollin, Lagakos, and Waugh (2014), there is also a large gap between what the sub-Saharan farmer produces per unit area and production potential with the available technology.
Worldwide, recent research and extension efforts have resulted in better agricultural practices, new and improved crops varieties, and improvements in soil and water management practices. However, Meinzen-Dick et al. (2004) argue that the only way for sub-Saharan farmers to gain from these new agricultural technologies is through adoption, after perceiving them to be beneficial and profitable. To enhance the adoption, there are several studies that focus on mapping agricultural technology adoption patterns and on finding variables associated with adopters of these technologies. This study extends the latter category by using a two-step cluster analysis to group farmers into subgroups with similar adoption patterns. The generated knowledge is important in terms of formulating specific policies and/or targeting specific groups of farmers to promote the adoption of ISVs in Tanzania and giving feedback to institutions involved in agricultural research and extension in similar regions in Sub-Saharan Africa.
One of the goals of this study was to quantify the factors influencing the adoption of ISVs developed by the International Crop Research Institute for Semi-Arid Tropics (ICRISAT) and tested by the Department of Research and Development (DRD) of Tanzania’s Ministry of Agriculture, Livestock, and Fisheries. The results from this study will allow ICRISAT and DRD to test the validity of their new research strategies and to suggest an efficient mechanism and adoption pathways for other crops. In addition, the present study adds to the literature about the role of a lack of information and capital constraints on the adoption of ISVs. The analysis illustrates how access to information and the availability of capital jointly affect the adoption behavior of sorghum producers. We go beyond the traditional approach of assessing factors affecting adoption by using a two-step cluster analysis and t-distributed stochastic neighbor embedding (t-SNE) that allows visualization of the underlying relationships among farmers with similar adoption patterns (Burke, Lobell, and Guarino 2009). The results are key for good decision-making process in terms of designing cost-effective agricultural research prorates and extension advisory services.
In the following section, we present an overview of sorghum research and development in Tanzania followed by a description of the source of the data analyzed in this study. Then, we present a conceptual framework for technology adoption in the presence of multiple binding constraints, the empirical specifications of a multiple-hurdle Tobit model, and a brief review of two-step cluster analysis. In the last two sections, we present key findings and the policy implications for scaling up the adoption of improved ISVs in Tanzania.
Sorghum research in Tanzania
Sorghum (Sorghum bicolor (L.) Moench or Mtama in Swahili is one of the five most important cereal crops in the world, and because of its broad adaptation, it is one of the climate-ready crops (Association for Strengthening Agricultural Research in East and Central Africa 2013). In Tanzania, sorghum is the second most important staple food after maize, supporting more than 80% of the population (Rohrbach et al. 2002). Most farming systems in Tanzania are increasingly cultivating sorghum as the main crop to address recurring food shortages resulting from other crop failures (Kombe 2012). Sorghum research and development activities in Tanzania trace back to the early 1980s. During that period, ICRISAT began collaborating with DRD as well as some non-governmental organizations (NGOs) to test improved sorghum varieties using both on-station and on-farm trials. Early efforts led to the release of three sorghum varieties: Tegemeo, Pato, and Macia in 1978, 1997, and 1998, respectively (Mgonja et al. 2005). In 2002, they released the Wahi and Hakika varieties, and in 2008, they released NARCO Mtama 1. Seed Co Tanzania Limited also released the Sila variety in 2008 (Monyo et al. 2004). Kilimo (2008), Kanyeka, Kamala, and Kasuga (2007), and Association for Strengthening Agricultural Research in East and Central Africa (2013) summarize agronomic and physical characteristics of these varieties. The varieties are drought-tolerant and are for human consumption. Agro-pastoralists use crop residues as animal fodder (Rohrbach and Kiriwaggulu 2007; Kombe 2012). Over the past decade, sorghum is slowing entering the nonfood and value-add markets with use in the baking, brewery, and animal feed industries. The focus of current research and extension efforts is on linking farmers to this nonfood market to stimulate production and scale up ISV adoption in Tanzania (Monyo et al. 2004).
Source of data
The data for this analysis are from a survey conducted by Selian Agricultural Research Institute (SARI), Arusha, Tanzania, in collaboration with ICRISAT, Nairobi, Kenya. The first author of the present study developed the structured questionnaire. A 2-day enumerator-training workshop, organized by the main author, was conducted in May 2013 to review the questionnaire. Twenty-five extension agents working in major sorghum farming systems and three scientists from ICRISAT participated in the workshop. After the workshop, the questionnaire was pre-tested in the Singida Rural and Rombo Districts. Issues found during the questionnaire pre-test provided guidance for refinement of the final survey instrument used in the study.
We considered the intensity of sorghum production and importance of sorghum in the farming system to select participating regions and districts. The sample area included the Iramba, Singida, and Manyoni Districts (Singida Region, 435 sample households), Kondoa District (Dodoma Region, 102 sample households), Babati District (Manyara Region, 110 sample households), Rombo District (Kilimanjaro Region, 57 sample households), and Kishapu District (Shinyanga Region,118 sample households). We randomly selected two sample wards and one village from each ward from each district. Administrative subdivisions in Tanzania include regions, districts, wards, and villages. Therefore, the village is the lowest administrative unit (Map 1).
To create a counterfactual (for impact assessment in another study), 60% of the responding households were an adopter, that is, planted at least one improved sorghum variety during the 2013/2014 farming season. For statistical analysis, the sample size per village was at least 50 households. The survey covered 822 households, of which 505 were adopters (61.44%) and 317 were non-adopters (38.56%). At the village level, we first grouped farmers into adopters and non-adopters using the village register and then randomly selected sample households from each group. Previously trained enumerators collected the data from the respondents, who were knowledgeable farmers at the household level.
Modeling adoption under information and capital constraints
Theoretically, the adoption of agricultural technology occurs when the expected utility from the technology exceeds that of non-adoption (Huffman 1974; Rahm and Huffman 1984). Since utility is not observable, single, or multivariate limited dependent models have been a workhorse for estimating factors affecting adoption (Huffman and Mercier 1991; Grabowski and Kerr 2013). Cragg’s double-hurdle model (Cragg 1971) extends these models if a farmer faces two hurdles while deciding to adopt. Croppenstedt, Demeke, and Meschi (2003) modified Cragg’s model to directly model imperfections that create multiple hurdles during the adoption process.
In this study, there are three groups of farmers. The first group passed all hurdles and adopted the improved seeds. The second group had a desired demand but lacked either information or capital. In this group, there were farmers with limited information on ISVs not constrained by capital and farmers with enough information on ISVs but not enough capital to buy improved seeds and/or complementary inputs. The third group consisted of farmers who were non-adopters with access to both information and capital, but they did not adopt ISVs due to other unknown constraints. Given the standard utility maximization condition for the adoption process and letting \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{T}} \)stand for a binary variable for the adoption decision (where adoption = 1 and 0 otherwise), \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{c}\mathbf{1}} \) is a binary variable representing information constraint, and \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{c}\mathbf{2}} \)is a binary variable standing for capital constraint. The multiple-hurdle Tobit model is:
$$ {D}_i^{\ast }={D}_i^T{D}_i^{c\mathbf{1}}{D}_i^{c\mathbf{2}}=\left\{\begin{array}{l}>\mathbf{0},\kern0.5em \mathbf{if}\ \mathbf{ISVs}\ \mathbf{is}\ \mathbf{adopted}\\ {}\kern0.5em \mathbf{0},\mathbf{if}\ \mathbf{ISVs}\ \mathbf{is}\ \mathbf{not}\ \mathbf{adopted}\end{array}\right. $$
(1)
In this equation, \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\ast} \) is a latent variable standing for the unobservable intensity of adoption measured as the proportion of cropland allotted to ISVs. The variable is positive for adopters and zero for non-adopters. Adoption occurs when three factors hold simultaneously: the discounted expected utility of profit from ISVs adoption is positive, the farmer is sufficiently aware of ISVs, and the farmer has access to capital to invest in the new sorghum enterprise (Grabowski and Kerr 2013). Each constraint is independent. The probability of allotting land to ISVs is the multiple of the probability of each constraint. We could estimate Eq. (1) using a joint maximum likelihood as in Jones (1992), Smith (2003), Moffatt (2005), Teklewold et al. (2006), Shiferaw et al. (2015), and Burke, Myers, and Jayne (2015). The underlying assumption is that a binomial probability model governs the binary outcome of whether an outcome variable has a zero or a positive realization. The likelihood function is therefore separable with respect to the different parameters and is the sum of the log likelihoods from two separate models—a binomial probability and a zero-truncated model. The maximization of different components of the log-likelihood function generates consistent, efficient, and unbiased estimates. Expressions defining farmer groups with desired demand but constrained by a lack of information and capital are as follows:
$$ {D}_i^{\ast }={\beta}^T{X}_i+{\mu}_i;\kern1em {I}_i^{\ast }={G}^{c\mathbf{1}}={\alpha}^T{z}_i+{\omega}_i;\kern1em \mathbf{and}\kern1em {S}_i^{\ast }={G}^{c\mathbf{2}}={\delta}^T{h}_i+{\varepsilon}_i. $$
(2)
In Eq. (2), \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\ast} \) is the observed demand that is truncated at zero, excluding non-adopters (Tobin 1958); I* and S* are the unobservable demand constrained by a lack of information and capital, respectively; z and h are the vectors of covariates that affect access to agricultural information and capital, respectively; and α and δ are the parameter vectors of the model. The random variable μi is N(0, σ2), and the random variables ωi and εi are N(0, 1).
Estimating Eq. (2) using a multiple-hurdle Tobit (Tobin 1958) framework as explained in Feder, Just, Zilberman (1985), Roodman (2011) and Croissant, Carlevaro, Hoareau (2016) allows the prediction of both intensity and probability of adoption. The first hurdle defining adoption and non-adoption is modeled as a probability choice where adoption occurs with probability \( \boldsymbol{P}\left({\boldsymbol{D}}_{\boldsymbol{i}}=\mathbf{1}\right)=\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}>\mathbf{0}\right) \) and non-adoption with probability \( P\left({\boldsymbol{D}}_{\boldsymbol{i}}=\mathbf{0}\right)=\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}\boldsymbol{\le}\mathbf{0}\right)=\mathbf{1}-\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}>\mathbf{0}\right) \), where P(.) is the probability function and \( {\boldsymbol{y}}_{\boldsymbol{i}}^{\ast} \) is the latent variable representing the intensity of adoption. In the second and third hurdles, singular probability choice models replace the second and the third expression such that \( P\left({\boldsymbol{I}}_{\boldsymbol{i}}^{\ast}=\mathbf{1}\right)=\mathbf{1} \) and \( P\left({\boldsymbol{S}}_{\boldsymbol{i}}^{\ast}=\mathbf{1}\right)=\mathbf{1} \). To estimate Eq. (2), Smith (2003) suggests setting zero correlations between random disturbances. The Voong test (Vuong 1989) tests the hypothesis of no correlation between incidence and intensity of adoption.
The four subgroups of farmers discussed above included adopter (505 sample households), non-adopter with desired demand and without capital constraint but lacked enough information (150 sample households), non-adopter with capital constraints (85 sample households), and non-adopter with no desire to adopt improved sorghum varieties and no capital or information constraints (82 sample households). The average time between learning about ISVs and field testing was 3.76 years, and for the third quartile, this time was 4 years. Farmers in the desired demand group who lacked information were either not aware of any improved sorghum varieties, or if they were aware, then the threshold was less than 4 years. Farmers in the desired demand group who were aware of ISVs were asked follow-up questions to identify reasons for non-adoption, and they either identified lack of capital or credit as a major constraint to adoption.
There are three types of covariates to include in Eq. (2): farm and farmer associated attributes, attributes associated with the technology, and farming goals. Examples of these variables include human capital represented by the level of education of the farmer, risk and risk management strategies, and access to the institutional support systems such as marketing facilities, research and extension services, availability of credit, and transportation. Other variables include production factors, such as farm size, number of livestock, and off-farm income and income sources. Farmers may have different farming goals such as subsistence or market-oriented farming. Feder and Slade (1984); de Janvry, Fafchamps, and Elisabeth (1991); Holden, Shiferaw, and Pender (2001); and Adegbola and Gardebroekb (2007) describe these variables in detail.
Apart from finding factors affecting adoption, understanding the diversity of farmers is of critical importance for the successful development of interventions. We extended this study by grouping farmers into sub-homogenous groups with similar adoption patterns through a two-step cluster analysis. There were three main procedures applied in the cluster analysis: hierarchical cluster analysis, k-means cluster analysis, and two-step cluster analysis (Rousseeuw 1987). Hierarchical clustering is useful for small datasets or when examining changes (merging and emerging clusters). With k-means clustering, the number of clusters is specified in advance, and k is the number of clusters. It is also efficient when using normally distributed continuous variables and when there is enough data to allow variability among the created clusters (Gower 1971).
Two-step clustering is suitable for large datasets, especially when there is a mixture of continuous and categorical variables (Gorgulu, 2010). The goal is to automatically form several clusters based on the mix of categorical and continuous variables. Most algorithms for two-step clustering use the first step to pre-cluster the data into many small sub-clusters. The second step uses the pre-cluster to form the desired number of clusters, or if the desired number of clusters is unknown, then these algorithms will automatically find the best number of clusters. In this study, we used two-step clustering tools to group sample households into homogenous groups. The variables used for grouping were both categorical and continuous and included the estimated probability of censoring (P(y∗ > 0)) and the estimated expected value of an uncensored dependent variable (E(y ∣ y∗ > 0) and all statistically significant variables in Eq. (2).
The first step involved calculating Gower’s distance matrix to separate households into (dis)similar groups. We could not use the Euclidean distance since it is valid for only continuous variables. For the limitations of using Euclidean distance in cluster analysis, see Gower (1971) and Struyf, Hubert, and Rousseeuw (1997). After calculating Gower’s distance matrix, the second step involved finding an optimal number of clusters and portioning the (dis)similar groups partitioned around medoids (PAM) to form clusters and using a silhouette distance to determine optimal number of clusters as suggested in Rousseeuw (1987), Kaufman and Rousseeuw (1990), and Pollard and van der Laan (2002). This approach depends on the actual partition of the objects and not on the type of clustering algorithm. The best method to visualize the formed clusters is t-distributed stochastic neighbor embedding or t-SNE. Developed by van der Maaten and Hinton (2008), t-SNE is a dimension reduction technique that tries to preserve the local structure and make clusters visible in a 2D or 3D visualization. t-SNE is a non-linear dimensionality reduction algorithm for finding patterns in the data by grouping observation clusters based on similarities in a large dataset with many variables. It is extremely useful for visualizing high-dimensional data. It overcomes the limitations of many linear dimensionality reduction algorithms and concentrates on placing dissimilar data points far apart in a lower dimension representation. t-SNE is based on probability distributions with a random walk on neighborhood graphs to find the structure within the data. Bunte et al. (2012) and Donaldson (2016) show that t-SNE presents high dimension data on low dimension while preserving global geometry at all measurement scales in the dataset. We conducted all analysis in the R environment (R Core Team 2017).