CLASSIFICATION TREE ANALYSIS AS APPLIED TO MENA
REGIONS
Loyola
University Chicago
email:mcinar@luc.edu
Key Words:
Country Risk Scores, Middle East and North Africa, Decision Trees
Jel Codes: C11, C14,F37, G24
Abstract: IICR and ICRG country risk scores are
examined in a decision tree framework to analyze groupings on countries,
including Middle East and North Africa.
INTRODUCTION
The purpose of this paper is to examine if Middle East and North Africa (MENA) countries form a distinct financial risk group, based on financial predictor variables and country risk scores. Country risk scores are calculated internally by banks and other financial institutions to assess various risks facing international capital flows to countries around the world. What exactly determine each of the scores depend on the perspectives of the institutions which do the analysis and are kept confidential. The resulting risk scores are published or are distributed to clients of the firm.
The scores are calculated as a weighted basket of both financial and non-financial risk. A large category of non-financial risk is political risk, which includes a government takeover of the investment as well as other factors. Other factors may include consumers’ preference, if strong, for domestic goods, legal and political infrastructure for foreign capital, the probability of war or political instability and obstruction of financial transfers. Financial risk variables are also very important in composing the risk indexes. The health of the balance of payments accounts, the possibility of recessions, the health of the banking system, the growth potential of GNP, regulatory and financial environment are some of the variables considered in calculating financial risk.
Uses of country risk scores are more an art than a science, especially since the scores are in a continual basis (from zero to 100 in theory). Bhalla (1983) promoted the use of financial versus socio-political risk matrices which are still used by some fund managers. The clustering of lower risk countries (both in political and economic) terms has been one of the primary reasons where majority of financial flows in the world are concentrated in a few developed countries (the EU countries) and in few developing ones (Mexico, India and China). A previous study has examined the relationship between international poverty and country risk scores (Cinar, 2000) and this study concentrates on whether the MENA region exhibits certain special country risks when compared with other regions in the world.
Many services measure country risk, including Bank of America World Information Services,
Business Environment Risk Intelligence (BERI) S.A., Control Risks Information
Services (CRIS), Economist Intelligence Unit (EIU), Euromoney, Institutional Investor, Standard &
Poor's Rating Group (S&P), Political Risk Services: International Country
Risk Guide (ICRG), Political Risk Services: Coplin-O'Leary Rating System, and
Moody's Investors Service. Each of the
above institutions provide qualitative and quantitative information as well as
a single index or rating. The two most
prominent among the above are the Institutional Investor (IICR) and
International Country Risk Guide (ICRG). Institutional Investor credit ratings
are based on a survey of leading international bankers, who are asked to rate
each country on a scale from 0 to 100 (100-best credit rating) which are then
averaged (with larger weights assigned to institutions with large world-wide
exposures). Each institution uses their
own internal formula to rank the
importance of different factors affecting the countries. Erb et. al. (1996) provide a cross-country
comparison of S&P and Moody's ratings with the IICR and ICRG ratings as of
October 1995. They find that S&P
and Moody's ratings have a close
correspondence with the IICR credit-risk measure (rank order correlation of 95
percent) and with the ICRG financial rating (rank order correlation of 90
percent) and the correlations are weaker for the other measures. They conclude by noting the ICRG
composite index contains considerable information in terms of forecasting
risk-adjusted returns on the portfolios they construct. Erb, et. al (1995) also find that country credit ratings have substantial predictive
power in differentiating between high
and low expected portfolio returns and are very effective tools. Following their studies, this study examines
IICR and ICRG credit ratings with respect to MENA countries. Data in the learning sample are available on
these two scores (as well as a set of financial and other variables) for 63
countries.
STUDY METHOD: CLASSIFICATION TREES, A NON-PARAMETRIC APPROACH
Country risk scores do not satisfy the
stringent theoretical and distributional assumptions (such as white noise
errors, normality assumptions) of more traditional methods. Instead, a non-parametric method from
artificial intelligence is used in this study:
classification trees are used to predict high versus low risk countries,
or MENA, non-MENA countries by using
country-based predictor variables.
Classification tree analysis is one of the main techniques used
in data mining. The analysis is similar to discriminant analysis, cluster analysis, and is nonparametric
statistics with nonlinear estimation. The decision process used by classification trees provide an
efficient method for sorting countries with high versus low risk, given the
black box formulations which are used in calculating the country risk scores.
Currently, classification tree analysis is
widely used in applied fields such as diagnosis in medicine, data
structures in computer sciences and decision theory in psychology. Classification trees can be quite complex
and graphical procedures are used to visually interpret trees. They are hierarchical in nature (Breiman,
et.al., 1984) and are more flexible that traditional (such as discriminant or
regression) analysis. They examine the
effects of the predictor variables one
at a time, rather than just all at once, where the predictor variables could be
a combination of continuous and categorical variables when univariate splits
are used. This allows one the freedom
from stringent assumptions such as non-collinearity and homoskedasticity on the predictor variables. This study uses the algorithms used in CART,
as described by Breiman et al. (1984). CART
(Classification And Regression Trees) is a classification tree program that uses an exhaustive grid
search of all possible univariate
splits to find the splits for a classification tree. CART searches can be lengthy when there are
a large number of predictor variables with many levels, and it is biased toward
choosing predictor variables with more levels for splits, but because it employs
an exhaustive search, it is guaranteed to find the splits producing the best
classification. The trees are
constructed such that there are terminal nodes which are points on the tree
which show terminal decisions. The
trees start with the top decision node
which is called the root node.
The root node is split, forming two new nodes. The numbers below the root node
show the value of the predictor variable in which there is a split. When univariate splits are performed, the
predictor variables can be ranked for
their potential importance in determining the classification of the dependent
variable.
The purpose of classification tree
analysis is to obtain the most accurate
classification/prediction where the most accurate prediction is defined as the prediction with the minimum
costs. Therefore, the best prediction has the lowest
misclassification rate. The relationships between prior probabilities,
misclassification costs, and case weights,
which can become quite complex,
can be found in Breiman et al,
(1984) and Ripley (1996). A very
important part of classification tree
analysis is to select the node splits on the predictor variables which are used
to predict the categorical dependent
variable. Due to the hierarchical
nature of classification trees, these splits are selected one at time, starting with the split at the root
node. CART uses an exhaustive search
for univariate splits. With this method, all possible splits for each predictor
variable at each node are examined to find the split producing the largest
improvement in goodness of fit which is measured by the Gini measure of node impurity. This
is a measure which reaches a value of zero when only one class is
present at a node. The tree is stopped
and pruned by rules given in Breiman et
al. (1984) for selecting the "right-sized" tree. Resubstitution costs (e.g., the
misclassification rate in the learning sample) rather consistently decrease as
tree size increases. The Bayesian decision rule, splitting rules and the cost function
are not reported here and can be obtained from the author.
The ability to predict out of sample is the true ‘test’ for any statistical process. Classification trees are no exception. The trees grown in this study were also applied to out-of-sample data for 1996-1998. For parsimony, those results are summarized in the text. The learning sample results are provided in detail below.
DATA AND CLASSIFICATION TREES ON INTERNATIONAL COUNTRY RISK AND MENA COUNTRIES
Learning sample data come from World Development Report 2000/2001. Using the 2000 data, countries were partitioned into categories of ‘risky’ versus ‘non-risky’ based on the values of the scores. ICGR and IICR scores were considered to denote ‘risky’ if they were 70 or below. The MENA countries for which the scores are available are listed in Table 1.
TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES
COUNTRY ICRG IICR
EGYPT 70.5 45.4
IRAN 66.3 29.2
ISRAEL 69.8 57.6
JORDAN 70.8 38.6
LEBANON 58.5 35.0
TUNISIA 73.5 49.7
TURKEY 54.4 39.0
YEMEN 62.5 34.0*
Mean-MENA 65.8 41.1
Mean-all data 66.2 47.6
Range-all data 50.3-89.3 11.1-93.8
* Yemen’s IICR is from IFC Statistics
From Table 1, none of the countries can be considered ‘high risk’ when we look at ICRG scores, where high risk is defined as scores with values under 50. However, when we adjust the definition of risk to scores of 70 or below, then Iran, Israel, Lebanon, Turkey and Yemen are all included in the ‘risky’ group. When IICR scores are similarly partitioned into ‘risky’ and ‘very risky’ groups, all MENA countries listed in the table fall into both the risky and almost all into the very risky category (except for Israel).
Classification trees were run on ‘risky’ ICRG and both ‘risky’ and ‘very risky’ IICR scores. There were 63 complete set of observations on countries which comprised the data set. Out of these, 8 were in MENA, 14 were ‘developed’, 9 were in Latin America, 12 were in Asia, 6 were in Africa and 14 were in Eastern Europe (post-Soviet block) countries. Other country-specific variables used in the study for the 63 countries were as follows: socio-economic variables included the percentage of children in malnutrition, female and male life expectancies, literacy rates, population, percentage of women in the labor force, percentage of people in total poverty, percentage of the government budget spent on education, health, social services, per capita expenditures on social services. Financial variables included government budget deficit (percentage and per capita), interest rate spreads (between borrowing and lending), size of domestic financial credit provided by the banking sector with respect to GDP, ICRG risk rating, IICR risk rating, foreign direct investment, international private net capital flows, GDP, GDP per capita, reserves, external debt, external debt per capita. Regional proxies were for MENA, Latin America, Asia, Africa, Eastern Europe, and developed countries. Lists of the variables that have been used in the modeling in this paper are given at the end of the paper.
Three different models were tested in this paper. The first model (Model A) was run to examine if the MENA countries can be classified as a group by themselves with the relevant predictor variables (IRT99, ICRG300, IICR300, GOVDEF98, BADR270, PRIVCR99, BADR250, GSOCICAP, BADR170) which included categories of risk levels. For parsimony, only the most relevant classification trees are reported in the analysis and the results are summed up in the summary tables. Node information, the actual trees and the relevant statistics can be requested from the author. The second model (Model B) used the same set of variables but without the categories of risk levels. Model C added financial flows and population to Model B. Table 2 given below summarizes the results and also includes the most important variable selection.
TABLE 2-SUMMARY OF CLASSIFYING MENA COUNTRIES
MODEL |
Number of nodes |
Primary (P), secondary (S) splitters and node values |
Percentage of MENA countries correctly classified |
Percentage of non-MENA countries correctly classified |
The top three most important predictor variables |
A |
3 |
(P)-ICRG < 73.7 and (S)-IRT< 7.55% |
100 % |
76.4 % |
Irt,ICRG, IICR |
B |
3 |
(P)-ICRG < 73.7 (S)-IRT- |
100 % |
76.4 % |
Irt, ICRG, Gsocicap |
C |
3 |
(P)-FORI<$1,893 |
100% |
49.1 % |
Fori,ICRG,IICR |
This analysis is able to build very short, parsimonious trees in which 100 % of all MENA countries can be classified on the basis of their ICRG scores, interest rate spreads and the direct foreign investment that came in the year before. Hence, values of country risk scores do matter.
Trees in the first and the second models give the same results. Therefore, adding ‘high risk’ MENA countries (in the predictor variable set) does not make a difference. A whole lot of other non-MENA countries also are predicted (wrongly) to be in the same classification. Hence, based on these predictor variables, macro indicators show that the MENA region is not so unique and a whole lot of other countries exhibit the same macro split nodes.
These results are further reinforced when the analysis is turned around and the country risk scores are predicted by using the different regions, including MENA, as one the predictor variables. These results are summarized in Table 3 below. Again, the decision tree information can be obtained from the author.
TABLE 3-SUMMARY OF CLASSIFYING HIGH RISK COUNTRIES
MODEL |
Number of nodes |
Primary (P), secondary (S) splitters and node values |
Percentage of high risk countries correctly classified |
Percentage of low risk countries correctly classified |
The top three most important predictor variables |
D (using ICRG risk) |
3 |
(P)-developed countries=1 (S)-Latin America=1 |
84.6% |
56.8 % |
Developed, Latin America |
E (using IICR risk) |
2 |
(P)-developed countries=1 |
97.8% |
88.2 % |
Developed, Latin America, Eastern Europe |
This table reinforces the findings of Table 2 above: the MENA region countries do not form a distinct group The primary splits are determined by the ‘developed’ countries, followed by Latin American countries. In terms of importance, Latin American and then Eastern European countries follow developed countries. Hence, if there is any distinct grouping of regions, it is those above, rather than the MENA region.
Other models, which used financial predictor variables to explain the ICGR and IICR scores were used but are not reported here. Suffice it to say that when MENA was added as an explanatory variable in all these models, it contributed nothing to the classification trees, which were grown.
TESTS OF THE MODEL: OUT-OF-SAMPLE PREDICTIONS
The above trees used what is called the ‘learning sample’ to grow the models. The ability to predict out of sample is very important. The same models were applied to the data set which appeared in World Development Report, 1999-2000, which reported March 1999 country risk scores and 1998 financial statistics. There were 66 countries in the out-of-sample data. The summary tables below give the prediction rates for the older data set.
TABLE 4: OUT OF SAMPLE PREDICTION RATES FOR TREES REPORTED ON TABLE 2: CLASSIFYING MENA COUNTRIES
MODEL |
Percentage of MENA countries correctly classified |
Percentage of non-MENA countries correctly classified |
Using Model A |
55.6 % |
91.2 % |
Using Model B |
100 % |
96.5 % |
Using Model C |
100 % |
98.2 % |
The first out-of sample prediction (which models MENA countries with respect to levels of high risk and financial variables) in this data set give different results from the results reported on Table 2. Almost have of the MENA countries are lumped into the non-MENA group. Again, this shows that the data are fuzzy about distinctions between MENA countries and other regions of the world. The other two categories reported in the table show very good predictive power between MENA and non-MENA group. This finding was not found in the learning sample.
The same regional analysis reported in Table 3 was also applied to the out of sample group. Table 5 below gives the summary results.
TABLE 5: OUT OF SAMPLE PREDICTION RATES FOR TREES REPORTED ON TABLE 3-CLASSIFYING HIGH RISK COUNTRIES
MODEL |
Percentage of high risk countries correctly classified |
Percentage of low risk countries correctly classified |
Using Model D |
75.0% |
73.1 % |
Using Model E |
96.2 % |
92.9 % |
Again, the proxy for MENA was found to be an unimportant node in classifying high versus low risk countries as in the learning sample. The primary splitting nodes remained the same, reinforcing the learning sample results.
CONCLUSIONS
Threshold values for node splitting in the first three trees gave mean values as to clustering MENA nations and showed that a whole lot of other non-MENA countries were consistently wrongly classified as MENA, regardless of the model. The latter two trees reported in the learning sample in the study, which examined two high-country risk scores, also showed that MENA countries are not good predictor variables. This finding was also repeated for the out-of-sample predictions. If there is any clustering of regions, it is developed countries, Latin America or the post-Soviet Eastern Europe. In summary, this paper does not find strong classifiers in MENA countries when examining country risk scores or other financial variables.
Hence, the variations of MENA economies is once more evident from this study, where MENA nations have cultural ties but do not necessarily form a uniform economic unit.
BIBLIOGRAPHY
Baird, J. (1983), “Where Did Country Risk Analysis Go Wrong?”, Institutional Investor, May, p. 227-229.
Barro, R., (1994) 'Recent Research on Economic Growth' in National Bureau of Economic Research Reporter, Summer, p. 6-10.
Bhalla, B. (1983), “How Corporations Should Weigh Up Country Risk”, Euromoney, June, p. 66-72.
Breiman, L. and J. H. Friedman, R. A. Olshen and C. J. Stone (1984), Classification and Regression Trees, Wadsworth International Group, Belmont, California.
Burton, F.N., and H. Inoue (1983), “Country Risk Evaluation Methods: A Survey of Systems in Use”, The Banker, January, p. 41-43.
Cinar, E. M., (2000), “Poverty Classification Trees and Threshold Levels of International Capital Flows”, paper presented in Association of International Business Conference, Tucson, Arizona, Fall.
Davis, R.R. (1981), “Alternative Techniques for Country Risk Evaluation”, Business Economics, May, p. 34-41.
Denison, D. (1984), “A Pragmatic Model for Country Risk Analysis”, Journal of Commercial Bank Lending, March, p. 28-37.
Divecha, Arjun
B., Jaime Drach, and Dan Stefek. "Emerging Markets: A Quantitative
Perspective." Journal of Portfolio Management, Fall 1992, p. 41-50.
Erb, C. B. C. R. Harvey, T. E.
Viskanta (1995), “Country Risk and Global Equity Selection”, Journal of
Portfolio Management, Winter, p.
74-79.
Erb, C. B., C. R. Harvey, and T. E.. Viskanta
(1994), "National Risk in Global Fixed-Income Allocation." Journal
of Fixed Income, September, p. 17-26.
Erb, C. B. C. R. Harvey, T. E. Viskanta (1996), “Political Risk, Economic Risk and Financial Risk”, Financial Analysts Journal, 52, 6, Nov/Dec, p. 29-46.
Harvey, Campbell
R.(1994) , "The Risk Exposure of Emerging Market Investments." World
Bank Economic Review,
Keppler, A. M. (1991), "The
Importance of Dividend Yields in Country Selection."
Journal of Portfolio Management, Winter, p. 24-29.
Leavy, B. (1984), “Assessing Country Risk for Foreign Investment
Decisions”, Long Range Planning, v.17,3,, p. 141-151.
Ripley, B. D. (1996), Pattern recognition and neural networks, Cambridge University Press, New York.
Saini, K. G. and P. S. Bates (1984), “A Survey of Quantitative Approaches to Country Risk Analysis”, Journal of Banking and Finance, June, p. 341-355.
Speidell, L. S., and R. Sappenfield (1992) "Global Diversification in a
Shrinking World.", Journal of
Portfolio Management, Fall, p. 57-67.
Steinberg, D. and P. Colla (1997), CART-Classification and Regression
Trees, Salford Systems, San Diego, California.
World Bank, (1999) World Development Report 1999-2000, published for the World Bank by Oxford University Press, New York.
World Bank, (2000/2001) World Development Report 2000/2001; Attacking Poverty, published for the World Bank by Oxford University Press, New York.
LIST OF VARIABLES USING IN CONSTUCTING TREES
GOVDEF98: government budget deficit of
1998
GSOCICAP:
per capita government expenditures on social programs
IRT99: interest rate differential between
borrowing and lending for 1999
PRIVCR99: the size of the
private credit market with respect to GDP 1999 ICRG300: ICRG risk score for the country, March 2000
BADR170 : Categorical variable, assigning
the country a ‘high risk’ score if the ICRG score is less than 70.
IICR300:
IICR country risk rating, 1/2000
BADR250
Categorical variable, assigning the country a ‘high risk’ score if the
IICR score is less than 70.
BADR270: Categorical variable, assigning
the country a ‘very high risk’ score if the IICR score is less than 50.
MENA: Categorical variable, assigning the
country to the MENA region.
TABLE 1: COUNTRY RISK SCORES OF MENA COUNTRIES
COUNTRY |
ICRG |
IICR |
EGYPT |
70.5 |
45.4 |
IRAN |
66.3 |
29.2 |
ISRAEL |
69.8 |
57.6 |
JORDAN |
70.8 |
38.6 |
LEBANON |
58.5 |
35 |
TUNISIA |
73.5 |
49.7 |
TURKEY |
54.4 |
39 |
YEMEN |
62.5 |
34* |
Mean for MENA countries |
65.78 |
41.06 |
Mean for the data set |
66.23 |
47.57 |
Range for the data set |
50.3 (Indonesia)- 89.3 (Norway) |
11.1 (Georgia)- 93.8 (Sweden) |
* Yemen’s IICR is from IFC Statistics.