Johansen–Juselius (1992) cointegration model
If prices are nonstationary and in same order of integration, then the Johansen–Juselius (1992) likelihood ratio test in the vector autoregressive (VAR) specification is as follows:
$$\Delta P_{t} = \Phi D_{t} + \Pi P_{t  1} + \sum\limits_{i = 1}^{k  1} {\Gamma_{i} } \Delta P_{t  1} + \omega_{t}$$
(1)
where P_{t} includes all n variables of the model which are \(I(1)\), \(\Pi\), \(\Gamma_{i}\) and \(\Phi\) are parameter matrices to be estimated, \(D_{t}\) is a vector with deterministic elements (constant, trend) and \(\omega_{t}\) is a vector of random error that follows Gaussian process. If \(\Delta P_{t}\)\({\Delta \mathrm{P}}_{\mathrm{t}}\sim \mathrm{I}\left(0\right)\) is \(I(0),\) then Π will be a zero matrix except when a linear combination of the variables in \(P_{t}\) is stationary. If rank Π = r = K, the variables in levels are stationary meaning that no integration exists; if rank Π = r = 0, all the elements in the adjustment matrix have value zero; therefore, none of the linear combinations are stationary. According to the Granger representation theorem (1987) that when 0 < rank (Π = r) < K, there are r cointegrating vectors. For example if rank (Π = r) = 1, there is single cointegrating vector or one linear combination which is stationary such that the coefficient matrix Π can be decomposed into \(\Pi = \alpha \beta^{\prime}\) where \(\alpha\) is the vector of loading factor and \(\beta\) is the cointegrating vector in where \(\beta^{\prime}P_{t  1}\) is \(I(0).\) Johansen method is to estimate Π matrix from an unrestricted VAR and to test whether we can reject the restrictions implied by the reduced rank of Π. There are two methods of testing for reduced rank (Π), the trace test and maximum eigenvalue tests. The trace statistics tests the null hypothesis that the number of distinct cointegrating vectors (r) is less than or equal to r against a general alternative. Another statistics maximal eigenvalue tests the null that the number of cointegrating vector is r against the alternative of r + 1.
Causality tests from Johansen VECM
The existence of cointegration in bivariate relationship implies Granger causality which under certain restrictions can be tested within the framework of Johansen VECM by standard Wald test (Masconi and Giannini 1992; Dolado and Lutkephol 1996). The underlying principle is that if α matrix in cointegration matrix (Π) has a complete column of zeros, then no causal relationship exist, because there is no cointegrating vector in that particular block. For pairwise causal relationship, it can be written in Eq. (2):
$$\left[ {\begin{array}{*{20}c} {\Delta P_{1,t} } \\ {\Delta P_{2,t} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mu_{1} } \\ {\mu_{2} } \\ \end{array} } \right] + \sum\limits_{i = 1}^{k  1} {\left[ {\begin{array}{*{20}c} {\Gamma_{i,11} } & {\Gamma_{i,12} } \\ {\Gamma_{i,21} } & {\Gamma_{i,22} } \\ \end{array} } \right]} \left[ {\begin{array}{*{20}c} {\Delta P_{1,t  i} } \\ {\Delta P_{2,t  i} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {\alpha_{1} } \\ {\alpha_{2} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\beta_{1} } & {\beta_{2} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {P_{1,t  k} } \\ {P_{2,t  k} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {\omega_{1t} } \\ {\omega_{2t} } \\ \end{array} } \right]$$
(2)
In Eq. (2), the subscript number refers to the markets. There are three possible cases of causality to be tested, (a) \(\alpha_{1} \ne 0\), \(\alpha_{2} \ne 0\) (b) \(\alpha_{1} = 0\), \(\alpha_{2} \ne 0\) and (c) \(\alpha_{1} \ne 0\), \(\alpha_{2} = 0\). The first one is bidirectional causality and the last two imply unidirectional causality. To explain how to make implications of the causality decision suppose α_{1} = 0 this implies that the error correction term or the third term of the righthand side of the first equation of Eq. (2) is eliminated and the longrun solution to \(\Delta P_{1,t}\) will not be affected by the deviations from the longrun equilibrium path defined by the cointegrating vector. In the same way, when \(\alpha_{2} = 0\) the \(\Delta P_{1,t}\) will not cause \(\Delta P_{2,t}\).
Threshold cointegration
Early research concentrated on linear price cointegration, while subsequent research has moved to the regimedependent nature of price relationships (Ihle and von CramonTaubadel 2008). The concept of threshold cointegration was first introduced by Balke and Fomby (1997) to account of transaction costinduced nonlinear price dynamics. It is now well understood in the literature that TC may inhibit price integration across spatially separated markets (for example, see Barret and Li 2002; Fackler and Goodwin 2001; Goodwin and Piggot 2001; Abdulai 2000, 2002; Goodwin and Harper 2000). A threshold introduces nonlinearities into the functional relationship between prices of market pairs (Tong 1990). Hansen and Seo (2002) proposed a SupLM test statistic with the null hypothesis of linear cointegration against the alternative hypothesis of threshold cointegration. Hansen and Seo (2002) note that price movements toward a longrun equilibrium might not occur in every time period, due to the presence of TC. Goodwin and Piggott (2001) used a threshold error correction model to estimate spatial integration in US corn and soybean markets. BenKaabia and Jose (2007) have estimated price transmission between vertical stages of the Spanish lamb market using a threshold model. Sanogo and Maliki (2010) have analyzed integration between Nepal and Indian rice markets applying a threshold autoregressive model. One implicit assumption of linear cointegration models, like Johansen and Jesulius (1992) and Engel and Granger (1987), is that price adjustments induced by deviations from a longterm equilibrium are a continuous and a linear function of the magnitude of the deviations.
In contrast, a threshold cointegration mode that takes into account TC allows price adjustments to differ based on the magnitude of the deviations from a longrun equilibrium. The speed of price adjustment can also differ depending upon whether deviations are above or below a specific threshold—which proxies the size of TC.
In Fig. 1, the price adjustment (∆Pt) is considered to be a function of deviations from a longrun equilibrium, which can be represented by a tworegime threshold vector error correction model (TVECM). We proceed by estimating the tworegime TVECM proposed by Meyer (2004), which is an extension of Hansen and Seo (2002). Pede and McKenzie (2005) take this approach to estimate market integration in Benin maize markets.
Following Hansen and Seo (2002), let P_{t} be a twodimensional I (1) price series with one 2 × 1 cointegrating vector β and \(w_{t} (\beta ) = \beta^{\prime}P_{t}\) denote the I (0) error correction term. Considering linear relationship, the vector error correction model (VECM) can be written as follows:
$$\Delta p_{t} = A^{\prime}P_{t  1} (\beta ) + \mu_{t}$$
(3)
where
$$P_{t  1} (\beta ) = \left( \begin{gathered} {\kern 1pt} \quad 1 \hfill \\ w_{t  1} (\beta ) \hfill \\ \;\Delta p_{t  1} \hfill \\ \;\Delta p_{t  2} \hfill \\ \;\;\;. \hfill \\ \;\;\;. \hfill \\ \;\Delta p_{t  l} \hfill \\ \end{gathered} \right)$$
(4)
In Eq. 4, \(P_{t  1} (\beta )\) is \(k \times 1\) and the matrix \(A\) is \(k \times 2\) of coefficients. The model assumes that the error term \(u_{t}\) is a vector of a Martingale difference sequence with finite covariance matrix \(\Sigma = E(u_{t} u^{\prime}_{t})\). The term \(w_{t  1}\) represents the error correction term obtained from the estimated longterm relationship between two market prices. The two prices are simultaneously explained by deviations from the longterm equilibrium (error correction term), the constant terms and the lagged shortterm reactions to previous price changes. The parameters \((\beta ,A,\Sigma )\) are estimated following a maximum likelihood estimate (MLE) approach with the assumption that the errors \(u_{t}\) are independently and identically Gaussian.
A tworegime threshold cointegration model is given as:
$$\Delta p_{t} = \left\{ \begin{gathered} A^{\prime}_{1} P_{t  1} + \,\,u_{t} \;\;\;if\;\;\;\;w_{t  1} (\beta ) \le \left \gamma \right \hfill \\ A^{\prime}_{2} P_{t  1} + \,\,u_{t} \;\;\;if\;\;\;w_{t  1} (\beta ) > \left \gamma \right \hfill \\ \end{gathered} \right.$$
(5)
where \(\gamma\) represents the threshold parameter. The model in Eq. (5) may also be written as:
$$\Delta p_{t} = A^{\prime}_{1} P_{t  1} (\beta )d_{1t} (\beta ,\gamma ) + A^{\prime}_{2} P_{t  1} (\beta )d_{2t} (\beta ,\gamma ) + u_{t}$$
(6)
where
$$d_{1t} \left( {\beta ,\gamma } \right) = 1\;{\text{if}}\;w_{t  1} (\beta ) \le \left \gamma \right$$
(7)
$$d_{2t} \left( {\beta ,\gamma } \right) = 1\;{\text{if}}\;w_{t  1} (\beta ) \le \left \gamma \right$$
(8)
The coefficient matrices A_{1} and A_{2} govern the dynamics in the regimes. Values of the error correction term, in relation to the level of the threshold parameter \(\gamma\) (in other words, whether \(w_{t  1}\) is above or below \(\gamma\)), allow all coefficients except the cointegrating vector \(\beta\) to switch between these two regimes.
The threshold effect exists if \(0 \prec P(w_{t  1} \le \left \gamma \right) \prec 1\), otherwise the model belongs to the linear cointegration form. We impose this constraint assuming that \(\pi_{0} \prec P(w_{t  1(\beta )} \le \left {\left. \gamma \right} \right. \prec (1  \pi_{0} )\) and by setting \(\pi_{0} \succ 0\) as a trimming parameter equal to 0.05 (Andrews 1993)^{Footnote 1} in the empirical estimation. Further, we ensure that the indicator function represented by Eqs. (7) and (8) contains enough sample variation for each choice of \(\gamma\). The likelihood function of the model in Eq. (6) under the assumption of iid Gaussian error u_{t} has the following form:
$$Ln(A_{1} ,A_{2} ,\beta ,\Sigma ,\gamma ) =  \frac{n}{2}Log\left {\left. \Sigma \right} \right. + \frac{1}{2}\sum\limits_{t = 1}^{n} {u_{t} } (A_{1} ,A_{2} ,\beta ,\gamma )^{\prime}\Sigma^{  1} u_{t} (A_{1} ,A_{2} ,\beta ,\gamma ),$$
(9)
where
$$u{}_{t}(A_{1} ,A_{2} ,\beta ,\gamma ) = \Delta p_{t}  A^{\prime}_{1} P_{t  1} (\beta )d_{1t} (\beta ,\gamma )  A^{\prime}_{2} P_{t  1} (\beta )d_{2t} (\beta ,\gamma )$$
(10)
The MLE of \(\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2,} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } } \right)\) is obtained by maximizing the \(\ln (A_{1} ,A_{2} ,\beta ,\Sigma ,\gamma ).\) This is achieved by first holding \((\beta ,\gamma )\) fixed4 and computing the constrained MLE for \((A_{1} ,A_{2} ,\Sigma )\) using the OLS regression as follows:
$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} (\beta ,\gamma ) = \left( {\sum\limits_{t = 1}^{n} {P_{t  1} (\beta )P_{t  1} (\beta )^{\prime}d_{1t} (\beta ,\gamma )} } \right)^{  1} \left( {\sum\limits_{t = 1}^{n} {P_{t  1} (\beta )P_{t  1} (\beta )^{\prime}d_{1t} (\beta ,\gamma )} } \right),$$
(11)
$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} (\beta ,\gamma ) = \left( {\sum\limits_{t = 1}^{n} {P_{t  1} (\beta )P_{t  1} (\beta )^{\prime}d_{2t} (\beta ,\gamma )} } \right)^{  1} \left( {\sum\limits_{t = 1}^{n} {P_{t  1} (\beta )P_{t  1} (\beta )^{\prime}d_{2t} (\beta ,\gamma )} } \right),$$
(12)
$$\begin{aligned} & \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{u}_{t} (\beta ,\gamma ) = u_{t} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} (\beta ,\gamma ),\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} (\beta ,\gamma ),\beta ,\gamma )\;{\text{and}} \\ & \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma }_{t} (\beta ,\gamma ) = \frac{1}{2}\sum\limits_{t = 1}^{n} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{u}_{t} } (\beta ,\gamma )\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{u} {}_{t}(\beta ,\gamma )^{\prime} \\ \end{aligned}$$
Equations (11) and (12) are the OLS regressions of \(\Delta P_{t}\) on \(P_{t  1} (\beta )\) for two subsamples where \(w_{t  1} (\beta ) \le \gamma\) and \(w_{t  1} (\beta ) \succ \gamma\). In the next step, the estimates \((\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } )\) are utilized to yield the concentrated likelihood
$${\text{Ln}}(\beta ,\gamma ) = L\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} } \right.(\beta ,\gamma ),\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} (\beta ,\gamma ),\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } (\beta ,\gamma ) =  \frac{n}{2}\log \left {\left. {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } (\beta ,\gamma )} \right} \right.  \frac{np}{2}$$
(13)
The maximum likelihood estimator \(\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } } \right)\) can be obtained by minimizing \(\log \left {\left. {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } (\beta ,\gamma )} \right} \right.\) subject to the normalization imposed to the β and the constraints:
$$\pi_{0} \le n^{  1} \sum\limits_{t = 1}^{n} {1(P^{\prime}_{t} } \beta \le \gamma ) \le 1  \pi_{0}$$
Hansen and Seo (2002) used a grid search algorithm to obtain the MLE estimates of \(\beta\) and γ . The grid searching algorithm is summarized as follows:

Step 1: Construct a grid on \(\left[ {\gamma_{U} \gamma_{U} } \right]\) and \(\left[ {\beta_{L} ,\;\beta_{U} } \right]\) based on the linear estimate of β and constraint above.

Step 2: Calculate \(\hat{A}_{1} (\beta ,\gamma )\), \(\hat{A}_{2} (\beta ,\gamma ),\) and \(\hat{\Sigma }(\beta ,\gamma )\) for each value of \((\beta ,\gamma )\) on those grids.

Step 3: Search \((\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } )\) as the values of \((\beta ,\gamma )\) on those grids which minimize \(\log \left {\left. {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } (\beta ,\gamma )} \right} \right.\)

Step 4: Estimate \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\Sigma } (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } ),\;\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{1} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } ),\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A}_{2} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } ),\) and, \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{u}_{t} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{u}_{t} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\gamma } )\) as the final estimated parameters.
In the empirical application, the grid search procedure is carried out with 130 grid points. Once \(\beta\) and \(\gamma\) have been estimated, the null of linear cointegration is tested against the alternative of threshold cointegration by means of supremum Lagrange multiplier (SupLM) test following Andrews (1993) and Andrews and Ploberger (1994):
$${\text{Sup}}\;LM^{1} = \mathop {{\text{Sup}}\;LM(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\beta } ,\gamma )}\limits_{{\mathop \gamma \nolimits_{L} \le \gamma \le \mathop \gamma \nolimits_{U} }}$$
Since the asymptotic distribution of the test is not known, it is approximated by means of the residual bootstrap. In the empirical application, the bootstrap is done with 5000 replications. So, the model under null hypothesis is
$$\Delta p_{t} = A^{\prime}_{1} P_{t  1} (\beta ) + u_{t}$$
with an alternative hypothesis, \(\Delta p_{t} = A^{\prime}_{1} P_{t  1} (\beta ) \cdot d_{1t} (\beta ,\gamma ) + A^{\prime}_{1} P_{t  1} (\beta ) \cdot d_{2t} (\beta ,\gamma ) + u_{t}\).
Empirical results presented in this article are estimated using R algorithm. We have carried out the tests for all market pairs.