CLASSIFICATION TREE ANALYSIS AS APPLIED TO MENA REGIONS

 

 

E. Mine Cinar

Loyola University Chicago

email:mcinar@luc.edu

 

Key Words:  Country Risk Scores, Middle East and North Africa, Decision Trees

 

Jel Codes: C11, C14,F37, G24

 

Abstract: IICR and ICRG country risk scores are examined in a decision tree framework to analyze groupings on countries, including Middle East and North Africa.

 

INTRODUCTION

The purpose of this paper  is to examine if  Middle East and North Africa (MENA) countries form a distinct financial risk group, based on financial predictor variables and country risk scores.  Country risk  scores are calculated internally by  banks and other financial institutions to assess various risks facing international capital flows to countries around the world.  What exactly determine each of the scores depend on the  perspectives of the institutions  which do the analysis and are kept confidential.    The resulting risk scores are published or are distributed to clients of the firm.  

The scores are calculated as a weighted basket of  both financial and non-financial risk.  A large category of non-financial risk is political risk, which includes a government takeover of the investment as well as other factors.  Other factors may include consumers’ preference, if  strong, for domestic goods,  legal and political infrastructure for foreign capital, the probability of war or political instability and  obstruction of financial transfers.  Financial risk variables are also very important in composing the risk indexes.  The health of the balance of payments accounts, the  possibility of recessions, the health of the banking system,  the growth potential of GNP, regulatory and financial environment  are some of the variables considered in calculating financial risk. 

Uses of country risk scores are more an art than a science, especially since the scores are in a continual basis (from zero to 100 in theory).  Bhalla (1983) promoted the use of  financial versus socio-political risk matrices which are still used by some fund managers.  The clustering of lower  risk countries (both in political and economic) terms has been one of the primary reasons where majority of financial flows in the world are concentrated in a few developed countries (the EU countries) and in few developing ones (Mexico, India  and China).  A previous study has examined the relationship between  international poverty and country risk scores (Cinar, 2000) and this study concentrates on whether the MENA region exhibits certain special country risks  when compared with other regions in the world. 

Many services measure country risk, including  Bank of America World Information Services, Business Environment Risk Intelligence (BERI) S.A., Control Risks Information Services (CRIS), Economist Intelligence Unit (EIU), Euromoney,  Institutional Investor, Standard & Poor's Rating Group (S&P), Political Risk Services: International Country Risk Guide (ICRG), Political Risk Services: Coplin-O'Leary Rating System, and Moody's Investors Service.  Each of the above institutions provide qualitative and quantitative information as well as a single index or rating.  The two most prominent among the above are the Institutional Investor (IICR) and International Country Risk Guide (ICRG). Institutional Investor credit ratings are based on a survey of leading international bankers, who are asked to rate each country on a scale from 0 to 100 (100-best credit rating) which are then averaged (with larger weights assigned to institutions with large world-wide exposures).  Each institution uses their own internal formula  to rank the importance of different factors affecting the countries.  Erb et. al. (1996) provide a cross-country comparison of S&P and Moody's ratings with the IICR and ICRG ratings as of October 1995. They find that  S&P and Moody's  ratings have a close correspondence with the IICR credit-risk measure (rank order correlation of 95 percent) and with the ICRG financial rating (rank order correlation of 90 percent) and the correlations are weaker for the other measures.  They conclude  by noting the  ICRG composite index contains considerable information in terms of forecasting risk-adjusted returns on the portfolios they construct.  Erb, et. al (1995) also find that  country credit ratings have substantial predictive power in differentiating between  high and low expected portfolio returns and are very effective tools.  Following their studies, this study examines IICR and ICRG credit ratings with respect to MENA countries.  Data in the learning sample are available on these two scores (as well as a set of financial and other variables) for 63 countries.

STUDY METHOD: CLASSIFICATION TREES, A  NON-PARAMETRIC APPROACH

Country risk scores do not satisfy the stringent theoretical and distributional assumptions (such as white noise errors, normality assumptions) of more traditional methods.  Instead, a non-parametric method from artificial intelligence is used in this study:  classification trees are used to predict high versus low risk countries, or MENA, non-MENA countries  by  using  country-based predictor variables.

Classification tree  analysis is one of the main techniques used in data mining.  The analysis is  similar to discriminant analysis,  cluster analysis, and is nonparametric statistics with nonlinear estimation. The decision process  used by classification trees provide an efficient method for sorting countries with high versus low risk, given the black box formulations which are used in calculating the country risk scores.

Currently, classification tree  analysis is  widely used in applied fields such as diagnosis in medicine, data structures in computer sciences and decision theory in psychology.  Classification trees can be quite complex and graphical procedures are used to visually interpret trees.  They are hierarchical in nature (Breiman, et.al., 1984) and are more flexible that traditional (such as discriminant or regression) analysis.  They examine the effects  of the predictor variables one at a time, rather than just all at once, where the predictor variables could be a combination of continuous and categorical variables when univariate splits are used.  This allows one the freedom from stringent assumptions such as non-collinearity  and  homoskedasticity  on the predictor variables.  This study uses the algorithms used in CART, as described by Breiman et al. (1984). CART  (Classification And Regression Trees) is a classification  tree program that uses an exhaustive grid search of all possible univariate  splits to find the splits for a classification tree.  CART searches can be lengthy when there are a large number of predictor variables with many levels, and it is biased toward choosing predictor variables with more levels for splits, but because it employs an exhaustive search, it is guaranteed to find the splits producing the best classification.  The trees are constructed such that there are terminal nodes which are points on the tree which show terminal decisions.  The trees start with the top decision node  which is  called the root node. The root node is split, forming two new nodes. The numbers below the root node show the value of the predictor variable in which there is a split.  When univariate splits are performed, the predictor variables can be ranked  for their potential importance in determining the classification of the dependent variable.

The purpose of classification tree analysis is to obtain the  most accurate classification/prediction where the most accurate  prediction is defined as the prediction with the minimum costs.  Therefore, the  best prediction has the lowest misclassification rate. The relationships between prior probabilities, misclassification costs, and case weights,  which can become quite complex,  can be found in  Breiman et al, (1984) and Ripley (1996).  A very important part of  classification tree analysis is to select the node splits on the predictor variables which are used to predict  the categorical dependent variable.  Due to the hierarchical nature of classification trees, these splits are selected one at time,  starting with the split at the root node.  CART uses an exhaustive search for univariate splits. With this method, all possible splits for each predictor variable at each node are examined to find the split producing the largest improvement in goodness of fit which is measured by the Gini  measure of node impurity.  This  is a measure which reaches a value of zero when only one class is present at a node.  The tree is stopped and pruned by rules given in  Breiman et al. (1984) for selecting the "right-sized" tree.   Resubstitution costs (e.g., the misclassification rate in the learning sample) rather consistently decrease as tree size increases.   The Bayesian decision rule, splitting rules and the cost function are not reported here and can be obtained from the author.

The ability to predict out of sample is the true ‘test’ for any statistical process.  Classification trees are no exception.  The trees grown in this study were also applied to out-of-sample data for 1996-1998.  For parsimony, those results are summarized in the text.  The learning sample results are provided in detail below.

 

 

DATA AND CLASSIFICATION TREES ON INTERNATIONAL COUNTRY RISK AND MENA COUNTRIES

 

    Learning sample data come from World Development Report 2000/2001.  Using the 2000 data, countries were partitioned into categories of ‘risky’ versus ‘non-risky’ based on the values of the scores.  ICGR and IICR scores were considered to denote  ‘risky’ if they were 70 or below.   The MENA countries for which the scores are available are listed in Table 1.

 

 

TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES

COUNTRY                 ICRG              IICR

EGYPT                        70.5                 45.4

IRAN                           66.3                 29.2

ISRAEL                       69.8                 57.6

JORDAN                     70.8                 38.6

LEBANON                 58.5                 35.0

TUNISIA                     73.5                 49.7

TURKEY                     54.4                 39.0

YEMEN                      62.5                 34.0*

Mean-MENA              65.8                 41.1

Mean-all data               66.2                 47.6

Range-all data              50.3-89.3     11.1-93.8

 

* Yemen’s IICR is from IFC Statistics

 

From Table 1, none of the countries can be considered ‘high risk’ when we look at ICRG scores, where high risk is defined as scores with values under 50.  However, when we adjust the definition of risk  to scores of 70 or below, then Iran, Israel, Lebanon, Turkey and Yemen  are all included in the ‘risky’ group.  When IICR scores are similarly partitioned  into ‘risky’ and ‘very risky’ groups,  all MENA countries listed in the table fall into both the risky and almost all into the very risky category (except for Israel). 

Classification trees were run on ‘risky’ ICRG and both ‘risky’ and ‘very risky’ IICR scores.  There were 63 complete set of observations on countries which comprised the data set.  Out of these, 8 were in MENA, 14 were ‘developed’,  9 were in Latin America, 12 were in Asia, 6 were in Africa and 14 were in Eastern Europe (post-Soviet block) countries.  Other  country-specific variables used in the study for the 63 countries were as follows: socio-economic variables included the percentage of children in  malnutrition, female and male life expectancies, literacy rates, population, percentage of women in the labor force, percentage of people in total poverty, percentage of the government budget spent on education, health, social services, per capita expenditures on social services.  Financial variables  included  government budget deficit (percentage and per capita), interest rate spreads (between borrowing and lending), size of domestic financial credit provided by the banking sector with respect to GDP, ICRG  risk rating, IICR  risk rating, foreign direct investment, international private net capital flows,  GDP, GDP per capita, reserves, external debt, external debt per capita.  Regional proxies  were for MENA, Latin America, Asia, Africa, Eastern Europe,  and developed countries.  Lists of the variables that  have been used in the modeling in this paper are given at the end of the paper.

Three different models were tested in this paper.  The first model (Model A)  was run to examine  if the MENA countries can be classified as a group by themselves with the relevant predictor variables (IRT99, ICRG300, IICR300, GOVDEF98, BADR270, PRIVCR99, BADR250, GSOCICAP, BADR170)  which included categories of  risk levels.  For parsimony, only the most relevant classification trees are reported in the analysis and the results are  summed up in the summary tables.  Node information, the actual trees and the relevant statistics can be requested from the author. The second model (Model B) used the same set of variables but without the categories of risk levels.  Model C added financial flows and population to Model B. Table 2 given below summarizes the results and  also includes the most important variable selection.

 

TABLE 2-SUMMARY  OF  CLASSIFYING MENA COUNTRIES

 

MODEL

Number of nodes

Primary (P), secondary (S) splitters and node values

Percentage of MENA countries

correctly classified

Percentage of non-MENA countries

correctly classified

The top three most important predictor variables

A

3

(P)-ICRG < 73.7 and

(S)-IRT< 7.55%

100 %

76.4 %

Irt,ICRG, IICR

B

3

(P)-ICRG < 73.7

(S)-IRT-

100 %

76.4 %

Irt, ICRG, Gsocicap

C

3

(P)-FORI<$1,893

100%

49.1 %

Fori,ICRG,IICR

 


This  analysis is able to build very short, parsimonious trees in which 100 % of all MENA countries can be classified on the basis of their ICRG scores, interest rate spreads and the direct foreign investment that came in the year before. Hence, values of country risk scores do matter.

Trees in the first and the second models give the same results.  Therefore, adding ‘high risk’ MENA countries (in the predictor variable set)  does not make a difference.  A whole lot of other non-MENA countries also are predicted (wrongly) to be in the same classification.   Hence, based on these predictor variables, macro indicators show that the MENA region is not so unique and a whole lot of other countries exhibit the same macro split nodes.

            These results are further reinforced when the analysis is turned around and  the country risk scores are predicted by using the different regions, including MENA, as one the predictor variables.  These results are summarized in Table 3 below.  Again, the decision tree information can be obtained from the author.

TABLE 3-SUMMARY  OF  CLASSIFYING HIGH RISK  COUNTRIES

MODEL

Number of nodes

Primary (P), secondary (S) splitters and node values

Percentage of high risk  countries

correctly classified

Percentage of low risk countries

correctly classified

The top three most important predictor variables

D (using ICRG risk)

3

(P)-developed countries=1

(S)-Latin America=1

84.6%

56.8 %

Developed, Latin America

E (using IICR risk)

2

(P)-developed countries=1

 

97.8%

88.2 %

Developed, Latin America, Eastern Europe

 

 

            This table reinforces the findings of Table 2 above: the MENA region countries do not form a distinct group The primary splits are determined by the ‘developed’ countries, followed by Latin American countries.  In terms of importance, Latin American and then Eastern European countries follow developed countries.  Hence, if there is any distinct grouping of regions, it is those above, rather than the MENA region.

            Other models, which used financial predictor variables to explain the ICGR and IICR scores were used but are not reported here.  Suffice it to say that when MENA was added as an explanatory variable in all these models, it contributed nothing to the classification trees, which were grown.

 

TESTS OF THE MODEL: OUT-OF-SAMPLE PREDICTIONS

 

            The above trees  used what is called the ‘learning sample’ to grow the models.  The ability to predict out of sample is very important.  The same models were applied to the data set which appeared in World Development Report, 1999-2000, which reported March 1999 country risk scores and 1998  financial statistics. There were 66 countries in the out-of-sample data.   The summary tables below give the prediction rates for the older data set.

 

TABLE 4: OUT OF SAMPLE PREDICTION RATES  FOR TREES REPORTED ON TABLE 2: CLASSIFYING MENA COUNTRIES

 

MODEL

Percentage of MENA countries correctly classified

Percentage of non-MENA countries correctly classified

Using Model A

55.6 %

91.2 %

Using Model B

100  %

96.5 %

Using Model C

100 %

98.2 %

 

The first out-of sample prediction (which models  MENA countries with respect to  levels of high risk and financial variables) in this data set give different results from the results reported on Table 2.  Almost have of the MENA countries are lumped into the non-MENA group.  Again, this shows that the data are fuzzy about distinctions between MENA countries and other regions of the world.  The other two categories reported in the table show  very good predictive power between MENA and non-MENA group.  This finding was not found  in the learning sample. 

The same regional analysis  reported in Table 3 was also applied to the out of sample group.  Table 5 below gives the summary results. 

 

TABLE 5:  OUT OF SAMPLE PREDICTION RATES  FOR TREES REPORTED ON TABLE 3-CLASSIFYING HIGH RISK  COUNTRIES

MODEL

Percentage of  high risk countries correctly classified

Percentage of low risk countries correctly classified

Using Model D

75.0%

73.1 %

Using Model E

96.2 %

92.9 %

 

Again, the proxy for MENA was found to be an unimportant node in classifying high versus low risk countries as in the learning sample.  The primary splitting nodes remained the same, reinforcing the learning sample results.

 

CONCLUSIONS

            Threshold values for node splitting in the first three trees  gave mean values as to clustering MENA nations and showed that a whole lot of  other non-MENA countries were consistently wrongly classified as MENA, regardless of the model.  The latter two trees reported in the learning sample in the study, which examined  two high-country risk scores, also showed that MENA countries are not good predictor variables. This finding was also repeated for the out-of-sample predictions.  If there is any clustering of regions, it is developed countries, Latin America or the post-Soviet Eastern Europe.  In summary,  this paper does not find strong classifiers in MENA countries when examining country risk scores or other financial variables. 

Hence, the variations of MENA economies is once more evident from this study, where MENA nations have cultural ties but do not necessarily form a uniform economic unit.

 


BIBLIOGRAPHY

Baird, J. (1983), “Where Did Country Risk Analysis Go Wrong?”, Institutional Investor, May, p. 227-229.

 

Barro, R., (1994) 'Recent Research on Economic Growth' in National Bureau of Economic Research Reporter, Summer,  p. 6-10.

 

Bhalla, B. (1983), “How Corporations Should Weigh Up Country Risk”, Euromoney, June, p. 66-72.

 

Breiman, L. and J. H. Friedman, R. A. Olshen and C. J. Stone (1984),  Classification and Regression Trees, Wadsworth International Group, Belmont, California.

 

Burton, F.N.,  and  H. Inoue (1983), “Country Risk Evaluation Methods: A Survey of Systems in Use”, The Banker, January, p. 41-43.

 

Cinar, E. M., (2000), “Poverty Classification Trees and Threshold Levels of International Capital Flows”,  paper presented in  Association of International  Business Conference, Tucson, Arizona, Fall.

 

Davis, R.R. (1981), “Alternative Techniques for Country Risk Evaluation”,  Business Economics, May, p. 34-41.

 

Denison, D. (1984), “A Pragmatic Model for Country Risk Analysis”, Journal of Commercial Bank Lending,  March, p. 28-37.

 

Divecha, Arjun B., Jaime Drach, and Dan Stefek. "Emerging Markets: A Quantitative Perspective." Journal of Portfolio Management, Fall 1992, p. 41-50.
 
 Erb, C. B. C. R. Harvey, T. E. Viskanta (1995), “Country Risk and Global Equity Selection”, Journal of Portfolio Management, Winter,  p. 74-79.

 

Erb, C.  B., C. R. Harvey, and T. E.. Viskanta (1994),  "National Risk in  Global Fixed-Income Allocation." Journal of Fixed Income, September, p. 17-26.

Erb, C. B. C. R. Harvey, T. E. Viskanta (1996), “Political Risk, Economic Risk and Financial Risk”,  Financial Analysts Journal,  52, 6, Nov/Dec, p. 29-46.

 

Harvey, Campbell R.(1994) , "The Risk Exposure of Emerging Market Investments." World Bank Economic Review,

Keppler, A. M. (1991),  "The Importance of Dividend Yields in Country Selection."
Journal of Portfolio Management, Winter, p. 24-29.

Leavy, B. (1984), “Assessing Country Risk for Foreign Investment Decisions”,  Long Range Planning,  v.17,3,, p. 141-151.

 

Ripley, B. D. (1996), Pattern recognition and neural networks, Cambridge University Press, New York.

 

Saini, K. G. and P. S. Bates (1984), “A Survey of Quantitative Approaches to Country Risk Analysis”, Journal of Banking and Finance,  June, p. 341-355.

 

Speidell, L.  S., and R. Sappenfield (1992)  "Global Diversification in a
Shrinking World.",  Journal of Portfolio Management, Fall, p. 57-67.

Steinberg, D. and P. Colla (1997), CART-Classification and Regression Trees, Salford Systems, San Diego, California.

 

World Bank, (1999)  World Development Report 1999-2000, published for the World Bank by Oxford University Press, New York.

 

World Bank, (2000/2001)  World Development Report 2000/2001; Attacking Poverty, published for the World Bank by Oxford University Press, New York.


LIST OF VARIABLES USING IN CONSTUCTING TREES

 

 

 

 

GOVDEF98: government budget deficit of 1998

GSOCICAP:  per capita government expenditures on social programs

IRT99: interest rate differential between borrowing and lending for 1999      PRIVCR99:  the size of the private credit market with respect to GDP 1999     ICRG300: ICRG risk score for the country, March 2000

BADR170 : Categorical variable, assigning the country a ‘high risk’ score if the ICRG score is less than 70.

IICR300:  IICR country risk rating, 1/2000

BADR250  Categorical variable, assigning the country a ‘high risk’ score if the IICR score is less than 70.

BADR270: Categorical variable, assigning the country a ‘very high risk’ score if the IICR score is less than 50.

MENA: Categorical variable, assigning the country to the MENA region.


 

TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES

COUNTRY

ICRG

IICR

EGYPT

70.5

45.4

IRAN

66.3

29.2

ISRAEL

69.8

57.6

JORDAN

70.8

38.6

LEBANON

58.5

35

TUNISIA

73.5

49.7

TURKEY

54.4

39

YEMEN

62.5

34*

Mean for MENA countries

65.78

41.06

Mean for the data set

66.23

47.57

Range for the data set

50.3 (Indonesia)-

89.3 (Norway)

11.1 (Georgia)-

93.8 (Sweden)

* Yemen’s IICR is from IFC Statistics.