CLASSIFICATION TREE ANALYSIS AS APPLIED TO MENA REGIONS

E. Mine Cinar

Loyola University Chicago

email:mcinar@luc.edu

Key Words: Country Risk Scores, Middle East and North Africa, Decision Trees

Jel Codes: C11, C14,F37, G24

Abstract: IICR and ICRG country risk scores are examined in a decision tree framework to analyze groupings on countries, including Middle East and North Africa.

INTRODUCTION

The purpose of this paper is to examine if Middle East and North Africa (MENA) countries form a distinct financial risk group, based on financial predictor variables and country risk scores. Country risk scores are calculated internally by banks and other financial institutions to assess various risks facing international capital flows to countries around the world. What exactly determine each of the scores depend on the perspectives of the institutions which do the analysis and are kept confidential. The resulting risk scores are published or are distributed to clients of the firm.

The scores are calculated as a weighted basket of both financial and non-financial risk. A large category of non-financial risk is political risk, which includes a government takeover of the investment as well as other factors. Other factors may include consumers’ preference, if strong, for domestic goods, legal and political infrastructure for foreign capital, the probability of war or political instability and obstruction of financial transfers. Financial risk variables are also very important in composing the risk indexes. The health of the balance of payments accounts, the possibility of recessions, the health of the banking system, the growth potential of GNP, regulatory and financial environment are some of the variables considered in calculating financial risk.

Uses of country risk scores are more an art than a science, especially since the scores are in a continual basis (from zero to 100 in theory). Bhalla (1983) promoted the use of financial versus socio-political risk matrices which are still used by some fund managers. The clustering of lower risk countries (both in political and economic) terms has been one of the primary reasons where majority of financial flows in the world are concentrated in a few developed countries (the EU countries) and in few developing ones (Mexico, India and China). A previous study has examined the relationship between international poverty and country risk scores (Cinar, 2000) and this study concentrates on whether the MENA region exhibits certain special country risks when compared with other regions in the world.

Many services measure country risk, including Bank of America World Information Services, Business Environment Risk Intelligence (BERI) S.A., Control Risks Information Services (CRIS), Economist Intelligence Unit (EIU), Euromoney, Institutional Investor, Standard & Poor's Rating Group (S&P), Political Risk Services: International Country Risk Guide (ICRG), Political Risk Services: Coplin-O'Leary Rating System, and Moody's Investors Service. Each of the above institutions provide qualitative and quantitative information as well as a single index or rating. The two most prominent among the above are the Institutional Investor (IICR) and International Country Risk Guide (ICRG). Institutional Investor credit ratings are based on a survey of leading international bankers, who are asked to rate each country on a scale from 0 to 100 (100-best credit rating) which are then averaged (with larger weights assigned to institutions with large world-wide exposures). Each institution uses their own internal formula to rank the importance of different factors affecting the countries. Erb et. al. (1996) provide a cross-country comparison of S&P and Moody's ratings with the IICR and ICRG ratings as of October 1995. They find that S&P and Moody's ratings have a close correspondence with the IICR credit-risk measure (rank order correlation of 95 percent) and with the ICRG financial rating (rank order correlation of 90 percent) and the correlations are weaker for the other measures. They conclude by noting the ICRG composite index contains considerable information in terms of forecasting risk-adjusted returns on the portfolios they construct. Erb, et. al (1995) also find that country credit ratings have substantial predictive power in differentiating between high and low expected portfolio returns and are very effective tools. Following their studies, this study examines IICR and ICRG credit ratings with respect to MENA countries. Data in the learning sample are available on these two scores (as well as a set of financial and other variables) for 63 countries.

STUDY METHOD: CLASSIFICATION TREES, A NON-PARAMETRIC APPROACH

Country risk scores do not satisfy the stringent theoretical and distributional assumptions (such as white noise errors, normality assumptions) of more traditional methods. Instead, a non-parametric method from artificial intelligence is used in this study: classification trees are used to predict high versus low risk countries, or MENA, non-MENA countries by using country-based predictor variables.

Classification tree analysis is one of the main techniques used in data mining. The analysis is similar to discriminant analysis, cluster analysis, and is nonparametric statistics with nonlinear estimation. The decision process used by classification trees provide an efficient method for sorting countries with high versus low risk, given the black box formulations which are used in calculating the country risk scores.

Currently, classification tree analysis is widely used in applied fields such as diagnosis in medicine, data structures in computer sciences and decision theory in psychology. Classification trees can be quite complex and graphical procedures are used to visually interpret trees. They are hierarchical in nature (Breiman, et.al., 1984) and are more flexible that traditional (such as discriminant or regression) analysis. They examine the effects of the predictor variables one at a time, rather than just all at once, where the predictor variables could be a combination of continuous and categorical variables when univariate splits are used. This allows one the freedom from stringent assumptions such as non-collinearity and homoskedasticity on the predictor variables. This study uses the algorithms used in CART, as described by Breiman et al. (1984). CART (Classification And Regression Trees) is a classification tree program that uses an exhaustive grid search of all possible univariate splits to find the splits for a classification tree. CART searches can be lengthy when there are a large number of predictor variables with many levels, and it is biased toward choosing predictor variables with more levels for splits, but because it employs an exhaustive search, it is guaranteed to find the splits producing the best classification. The trees are constructed such that there are terminal nodes which are points on the tree which show terminal decisions. The trees start with the top decision node which is called the root node. The root node is split, forming two new nodes. The numbers below the root node show the value of the predictor variable in which there is a split. When univariate splits are performed, the predictor variables can be ranked for their potential importance in determining the classification of the dependent variable.

The purpose of classification tree analysis is to obtain the most accurate classification/prediction where the most accurate prediction is defined as the prediction with the minimum costs. Therefore, the best prediction has the lowest misclassification rate. The relationships between prior probabilities, misclassification costs, and case weights, which can become quite complex, can be found in Breiman et al, (1984) and Ripley (1996). A very important part of classification tree analysis is to select the node splits on the predictor variables which are used to predict the categorical dependent variable. Due to the hierarchical nature of classification trees, these splits are selected one at time, starting with the split at the root node. CART uses an exhaustive search for univariate splits. With this method, all possible splits for each predictor variable at each node are examined to find the split producing the largest improvement in goodness of fit which is measured by the Gini measure of node impurity. This is a measure which reaches a value of zero when only one class is present at a node. The tree is stopped and pruned by rules given in Breiman et al. (1984) for selecting the "right-sized" tree. Resubstitution costs (e.g., the misclassification rate in the learning sample) rather consistently decrease as tree size increases. The Bayesian decision rule, splitting rules and the cost function are not reported here and can be obtained from the author.

The ability to predict out of sample is the true ‘test’ for any statistical process. Classification trees are no exception. The trees grown in this study were also applied to out-of-sample data for 1996-1998. For parsimony, those results are summarized in the text. The learning sample results are provided in detail below.

DATA AND CLASSIFICATION TREES ON INTERNATIONAL COUNTRY RISK AND MENA COUNTRIES

Learning sample data come from World Development Report 2000/2001. Using the 2000 data, countries were partitioned into categories of ‘risky’ versus ‘non-risky’ based on the values of the scores. ICGR and IICR scores were considered to denote ‘risky’ if they were 70 or below. The MENA countries for which the scores are available are listed in Table 1.

TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES

COUNTRY ICRG IICR

EGYPT 70.5 45.4

IRAN 66.3 29.2

ISRAEL 69.8 57.6

JORDAN 70.8 38.6

LEBANON 58.5 35.0

TUNISIA 73.5 49.7

TURKEY 54.4 39.0

YEMEN 62.5 34.0*

Mean-MENA 65.8 41.1

Mean-all data 66.2 47.6

Range-all data 50.3-89.3 11.1-93.8

* Yemen’s IICR is from IFC Statistics

From Table 1, none of the countries can be considered ‘high risk’ when we look at ICRG scores, where high risk is defined as scores with values under 50. However, when we adjust the definition of risk to scores of 70 or below, then Iran, Israel, Lebanon, Turkey and Yemen are all included in the ‘risky’ group. When IICR scores are similarly partitioned into ‘risky’ and ‘very risky’ groups, all MENA countries listed in the table fall into both the risky and almost all into the very risky category (except for Israel).

Classification trees were run on ‘risky’ ICRG and both ‘risky’ and ‘very risky’ IICR scores. There were 63 complete set of observations on countries which comprised the data set. Out of these, 8 were in MENA, 14 were ‘developed’, 9 were in Latin America, 12 were in Asia, 6 were in Africa and 14 were in Eastern Europe (post-Soviet block) countries. Other country-specific variables used in the study for the 63 countries were as follows: socio-economic variables included the percentage of children in malnutrition, female and male life expectancies, literacy rates, population, percentage of women in the labor force, percentage of people in total poverty, percentage of the government budget spent on education, health, social services, per capita expenditures on social services. Financial variables included government budget deficit (percentage and per capita), interest rate spreads (between borrowing and lending), size of domestic financial credit provided by the banking sector with respect to GDP, ICRG risk rating, IICR risk rating, foreign direct investment, international private net capital flows, GDP, GDP per capita, reserves, external debt, external debt per capita. Regional proxies were for MENA, Latin America, Asia, Africa, Eastern Europe, and developed countries. Lists of the variables that have been used in the modeling in this paper are given at the end of the paper.

Three different models were tested in this paper. The first model (Model A) was run to examine if the MENA countries can be classified as a group by themselves with the relevant predictor variables (IRT99, ICRG300, IICR300, GOVDEF98, BADR270, PRIVCR99, BADR250, GSOCICAP, BADR170) which included categories of risk levels. For parsimony, only the most relevant classification trees are reported in the analysis and the results are summed up in the summary tables. Node information, the actual trees and the relevant statistics can be requested from the author. The second model (Model B) used the same set of variables but without the categories of risk levels. Model C added financial flows and population to Model B. Table 2 given below summarizes the results and also includes the most important variable selection.

TABLE 2-SUMMARY OF CLASSIFYING MENA COUNTRIES

MODEL	Number of nodes	Primary (P), secondary (S) splitters and node values	Percentage of MENA countries correctly classified	Percentage of non-MENA countries correctly classified	The top three most important predictor variables
A	3	(P)-ICRG < 73.7 and (S)-IRT< 7.55%	100 %	76.4 %	Irt,ICRG, IICR
B	3	(P)-ICRG < 73.7 (S)-IRT-	100 %	76.4 %	Irt, ICRG, Gsocicap
C	3	(P)-FORI<$1,893	100%	49.1 %	Fori,ICRG,IICR

This analysis is able to build very short, parsimonious trees in which 100 % of all MENA countries can be classified on the basis of their ICRG scores, interest rate spreads and the direct foreign investment that came in the year before. Hence, values of country risk scores do matter.

Trees in the first and the second models give the same results. Therefore, adding ‘high risk’ MENA countries (in the predictor variable set) does not make a difference. A whole lot of other non-MENA countries also are predicted (wrongly) to be in the same classification. Hence, based on these predictor variables, macro indicators show that the MENA region is not so unique and a whole lot of other countries exhibit the same macro split nodes.

These results are further reinforced when the analysis is turned around and the country risk scores are predicted by using the different regions, including MENA, as one the predictor variables. These results are summarized in Table 3 below. Again, the decision tree information can be obtained from the author.

TABLE 3-SUMMARY OF CLASSIFYING HIGH RISK COUNTRIES

MODEL

Number of nodes

Primary (P), secondary (S) splitters and node values

Percentage of high risk countries

correctly classified

Percentage of low risk countries

correctly classified

The top three most important predictor variables

D (using ICRG risk)

(P)-developed countries=1

(S)-Latin America=1

84.6%

56.8 %

Developed, Latin America

E (using IICR risk)

(P)-developed countries=1

97.8%

88.2 %

Developed, Latin America, Eastern Europe

This table reinforces the findings of Table 2 above: the MENA region countries do not form a distinct group The primary splits are determined by the ‘developed’ countries, followed by Latin American countries. In terms of importance, Latin American and then Eastern European countries follow developed countries. Hence, if there is any distinct grouping of regions, it is those above, rather than the MENA region.

Other models, which used financial predictor variables to explain the ICGR and IICR scores were used but are not reported here. Suffice it to say that when MENA was added as an explanatory variable in all these models, it contributed nothing to the classification trees, which were grown.

TESTS OF THE MODEL: OUT-OF-SAMPLE PREDICTIONS

The above trees used what is called the ‘learning sample’ to grow the models. The ability to predict out of sample is very important. The same models were applied to the data set which appeared in World Development Report, 1999-2000, which reported March 1999 country risk scores and 1998 financial statistics. There were 66 countries in the out-of-sample data. The summary tables below give the prediction rates for the older data set.

TABLE 4: OUT OF SAMPLE PREDICTION RATES FOR TREES REPORTED ON TABLE 2: CLASSIFYING MENA COUNTRIES

MODEL	Percentage of MENA countries correctly classified	Percentage of non-MENA countries correctly classified
Using Model A	55.6 %	91.2 %
Using Model B	100 %	96.5 %
Using Model C	100 %	98.2 %

The first out-of sample prediction (which models MENA countries with respect to levels of high risk and financial variables) in this data set give different results from the results reported on Table 2. Almost have of the MENA countries are lumped into the non-MENA group. Again, this shows that the data are fuzzy about distinctions between MENA countries and other regions of the world. The other two categories reported in the table show very good predictive power between MENA and non-MENA group. This finding was not found in the learning sample.

The same regional analysis reported in Table 3 was also applied to the out of sample group. Table 5 below gives the summary results.

TABLE 5: OUT OF SAMPLE PREDICTION RATES FOR TREES REPORTED ON TABLE 3-CLASSIFYING HIGH RISK COUNTRIES

MODEL	Percentage of high risk countries correctly classified	Percentage of low risk countries correctly classified
Using Model D	75.0%	73.1 %
Using Model E	96.2 %	92.9 %

Again, the proxy for MENA was found to be an unimportant node in classifying high versus low risk countries as in the learning sample. The primary splitting nodes remained the same, reinforcing the learning sample results.

CONCLUSIONS

Threshold values for node splitting in the first three trees gave mean values as to clustering MENA nations and showed that a whole lot of other non-MENA countries were consistently wrongly classified as MENA, regardless of the model. The latter two trees reported in the learning sample in the study, which examined two high-country risk scores, also showed that MENA countries are not good predictor variables. This finding was also repeated for the out-of-sample predictions. If there is any clustering of regions, it is developed countries, Latin America or the post-Soviet Eastern Europe. In summary, this paper does not find strong classifiers in MENA countries when examining country risk scores or other financial variables.

Hence, the variations of MENA economies is once more evident from this study, where MENA nations have cultural ties but do not necessarily form a uniform economic unit.

BIBLIOGRAPHY

Baird, J. (1983), “Where Did Country Risk Analysis Go Wrong?”, Institutional Investor, May, p. 227-229.

Barro, R., (1994) 'Recent Research on Economic Growth' in National Bureau of Economic Research Reporter, Summer, p. 6-10.

Bhalla, B. (1983), “How Corporations Should Weigh Up Country Risk”, Euromoney, June, p. 66-72.

Breiman, L. and J. H. Friedman, R. A. Olshen and C. J. Stone (1984), Classification and Regression Trees, Wadsworth International Group, Belmont, California.

Burton, F.N., and H. Inoue (1983), “Country Risk Evaluation Methods: A Survey of Systems in Use”, The Banker, January, p. 41-43.

Cinar, E. M., (2000), “Poverty Classification Trees and Threshold Levels of International Capital Flows”, paper presented in Association of International Business Conference, Tucson, Arizona, Fall.

Davis, R.R. (1981), “Alternative Techniques for Country Risk Evaluation”, Business Economics, May, p. 34-41.

Denison, D. (1984), “A Pragmatic Model for Country Risk Analysis”, Journal of Commercial Bank Lending, March, p. 28-37.

Divecha, Arjun B., Jaime Drach, and Dan Stefek. "Emerging Markets: A Quantitative Perspective." Journal of Portfolio Management, Fall 1992, p. 41-50.

Erb, C. B. C. R. Harvey, T. E. Viskanta (1995), “Country Risk and Global Equity Selection”, Journal of Portfolio Management, Winter, p. 74-79.

Erb, C. B., C. R. Harvey, and T. E.. Viskanta (1994), "National Risk in Global Fixed-Income Allocation." Journal of Fixed Income, September, p. 17-26.

Erb, C. B. C. R. Harvey, T. E. Viskanta (1996), “Political Risk, Economic Risk and Financial Risk”, Financial Analysts Journal, 52, 6, Nov/Dec, p. 29-46.

Harvey, Campbell R.(1994) , "The Risk Exposure of Emerging Market Investments." World Bank Economic Review,

Keppler, A. M. (1991), "The Importance of Dividend Yields in Country Selection."
Journal of Portfolio Management, Winter, p. 24-29.

Leavy, B. (1984), “Assessing Country Risk for Foreign Investment Decisions”, Long Range Planning, v.17,3,, p. 141-151.

Ripley, B. D. (1996), Pattern recognition and neural networks, Cambridge University Press, New York.

Saini, K. G. and P. S. Bates (1984), “A Survey of Quantitative Approaches to Country Risk Analysis”, Journal of Banking and Finance, June, p. 341-355.

Speidell, L. S., and R. Sappenfield (1992) "Global Diversification in a
Shrinking World.", Journal of Portfolio Management, Fall, p. 57-67.

Steinberg, D. and P. Colla (1997), CART-Classification and Regression Trees, Salford Systems, San Diego, California.

World Bank, (1999) World Development Report 1999-2000, published for the World Bank by Oxford University Press, New York.

World Bank, (2000/2001) World Development Report 2000/2001; Attacking Poverty, published for the World Bank by Oxford University Press, New York.

LIST OF VARIABLES USING IN CONSTUCTING TREES

GOVDEF98: government budget deficit of 1998

GSOCICAP: per capita government expenditures on social programs

IRT99: interest rate differential between borrowing and lending for 1999 PRIVCR99: the size of the private credit market with respect to GDP 1999 ICRG300: ICRG risk score for the country, March 2000

BADR170 : Categorical variable, assigning the country a ‘high risk’ score if the ICRG score is less than 70.

IICR300: IICR country risk rating, 1/2000

BADR250 Categorical variable, assigning the country a ‘high risk’ score if the IICR score is less than 70.

BADR270: Categorical variable, assigning the country a ‘very high risk’ score if the IICR score is less than 50.

MENA: Categorical variable, assigning the country to the MENA region.

TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES

COUNTRY	ICRG	IICR
EGYPT	70.5	45.4
IRAN	66.3	29.2
ISRAEL	69.8	57.6
JORDAN	70.8	38.6
LEBANON	58.5	35
TUNISIA	73.5	49.7
TURKEY	54.4	39
YEMEN	62.5	34*
Mean for MENA countries	65.78	41.06
Mean for the data set	66.23	47.57
Range for the data set	50.3 (Indonesia)- 89.3 (Norway)	11.1 (Georgia)- 93.8 (Sweden)

* Yemen’s IICR is from IFC Statistics.