2.2 Historical background

2.3 Development since 2000

2.4 Future developments

2.5 Training PhDs

The major objectives of the current programme are the analysis of individual or group differences, or subject-based variation in multivariate data. This type of focus is of the utmost interest in the behavioral sciences such as the educational sciences and psychology, but the same focus can be found in a whole range of other disciplines such as the medicine, biology, and life sciences. Considering the active research in these fields, there will be a continuing need for further methodological research in handling subject-based techniques.

The classical statistical approach to analyze observations of persons on a (large) number of characteristics uses very strong assumptions, and a very important one is that the individual units in the analysis (the persons) belong to a homogeneous population, and form a random sample from this population. In the behavioral sciences, however, observations are usually not independent, and are obtained from heterogeneous, often unidentified, populations. This situation calls for a multivariate analysis approach to develop methods that do not rely on multivariate normality and homogeneous samples. Because observations usually arise from subgroups, individual or groups differences are central, which calls for the study of interaction.

Three major aspects can be distinguished in the research project:

1. Development of methods for the analysis of multivariate data, in particular in the subareas

1a. Nonlinear optimal scaling methods

1b. Three-way multivariate methods

1c. Clustering and multidimensional scaling methods

1d. Regression and classification trees

2. Their implementation in user-oriented software

3. Their exemplary application in educational, psychological, medical, and life sciences

The main research group carrying out the present program was originally part of an independent research department, called Data Theory, within the Faculty of Social and Behavioral Sciences. Both multidimensional scaling and optimal scaling techniques for multivariate categorical data have been recognized as major innovations within data analysis, and a project proposal under the title "Subject-Oriented Multivariate Analysis" under the direction of Meulman was awarded an NWO-PIONEER grant for the period 1994-1999. As a result of a large-scale reorganization of the Faculty, part of the Department of Data Theory was merged with the Psychometrics group in the Department of Psychology. However, the larger part of the group, including the staff involved in the former PIONEER project and the major part of the software development section, joined the Department of Education in 1997. From the Department of Education Kroonenberg joined the program in 1997 for half of his research input, while the other half remained in the Attachment program. As the Data Theory group under the direction of Professor Meulman co-operated with empirical researchers from various disciplines before it became a part of the Department of Education (in 1997), the group continued to do so. This has led to a considerable number of papers exemplifying the developed methods with data collected within those disciplines.

When the work of the Data Theory group was previously assessed in 2001 by the international site visit committee for the Educational Sciences, the group received the following marks: quality (5), productivity (4.5), relevance (4), and viability (4.5) as well as the following commendation: The research program excels in every respect and is worldwide at the forefront in this field. In all three research areas, the group has taken up methodological questions that are of central significance for educational research, presented unusually innovative solutions and ensured their practical application. The impact of this research program on the field has been extraordinarily high. The program is justifiably one of the priority projects of the Department of Education.

In an assessment conducted in 1999 of the research included in the research school ISED, an international examination committee concluded that the work done by the group directed by Professor Meulman is "outstanding" in two ways. "First, because the group willingly supplies large amounts of consultation to researchers and students in the empirical research clusters". Second, because members of this group "contribute substantively to the field through an excellent research program, mostly in multivariate statistics".

In 1993 the work of the Data Theory group was assessed as part of the Psychology Quality Assessment in 1993. The group received the highest marks for quality, productivity, relevance, and viability as well as the following commendation: "The group from Leiden is world-wide recognized as the group at the basis of a major innovation in data analysis methods. It is of international quality. The innovations are more than technical. They have reshaped the field and have contributed much to a unifying theory of data analysis. The idea is that variables of any type, including categories, can be quantified, that is, they can be given quantitative values through analysis, such as to optimize some criterion, for instance the total percentage of variance explained by principal components. This is a major achievement, and the group has received world-wide recognition for this."

The research in the past five years has been partly supported by a grant awarded to Meulman by the Leiden University Board of Directors (275 KEuro) for a three-year project (2000-2003) designed to further develop the field of Relational data theory for the behavioral sciences. This has resulted in a number of developments in all four major subareas of the research program, i.e. nonlinear optimal scaling methods, three-way methods, clustering, multidimensional scaling, and multidimensional, unfolding, and regression and classification trees which will be described below.

The position of three-way analysis in the Faculty of Social and Behavioral Sciences has been strengthened by the recent appointment (1-11-2004) of Kroonenberg as Professor of Multivariate Analysis in particular of three-way data by the Foundation for the Advancement of the Science of Data Theory.

Method development

Research on nonlinear optimal scaling methods (1a) will continue with the investigation and development of techniques for the representation of multivariate categorical and qualitative data, focusing on visual exploration, classification, and prediction. In particular, research will continue on resampling methods to study stability, permutation methods to study inference, and regularization methods to improve both prediction accuracy and efficiency in large (both wide and long) data sets. Moreover, in co-operation with Heiser (Department of Psychology, Leiden University), Meulman has signed a contract with Springer-Verlag to write a book on the current status of nonlinear optimal scaling methods, including multidimensional scaling and unfolding.

Ongoing research on the methodology of three-way multivariate methods (1b) is and will be focused on the development of methods for three-way analysis of longitudinal data, analysis of variance models, and the inclusion of optimal scaling in three-way models. Furthermore, the experience gained over the years in the practice of three-mode analysis is at present being collected in a book in statu nascendi. The intention is to distribute the developed software in conjunction with this book. An ongoing project involves testing three-way methods with respect to their applicability in educational, social sciences, and other disciplines. Long-term plans exist to evaluate the role robust procedures can play in three-way analysis, but such research is still in the planning stage.

Research on clustering and multidimensional scaling methods (1c) will be continued through cooperation with the Department of Psychology, Leiden University (Heiser, Busing) and the Department of Econometrics, Erasmus University, Rotterdam (Groenen). Research on combinatorial optimization using dynamic programming in clustering, partitioning and ordering will continue through cooperation with the Department of Psychology, University of Illinois at Urbana-Champaign (Hubert) and Department of Marketing, Rutgers University, Newark (Arabie). In addition, the Data Theory group Leiden will focus on two extensions, which are heuristic varieties of dynamic programming to allow the analysis of larger data sets, and methods to find non-exhaustive partitionings through the use of dynamic programming to reduce the influence of outliers. Research on clustering on subsets of attributes will continue through cooperation with Stanford University (Friedman).

Further research on regression and classification trees (1d) will follow two different paths. On the one hand, research on the integration approach to trees and multiple regression will be continued in the VENI project by Dusseldorp (supported by the Netherlands Organization for Scientific Research until 2007, and continued when Dusseldorp will obtain a tenured position in the Data Theory research group afterwards). On the other hand, research will address (multiple additive) optimal regression trees that can be fitted using dynamic programming (Van Os).

Software development

The development of commercially available computer software for multivariate categorical analysis will include enhancements and extensions of the existing programs for nonlinear optimal scaling methods in the SPSS module CATEGORIES. Also, in the near future, the current package will be extended with PREFSCAL, a program for multidimensional unfolding that includes optimal scaling, various individual differences models, and an optimization approach that solves the notorious degeneration problem in nonmetric multidimensional unfolding (Busing, Groenen, & Heiser, 2001). Development of the PREFSCAL program to be included in SPSS CATEGORIES is a major activity in the cooperation with the Department of Psychology, Leiden University.

Exploration of new avenues in designing software architecture, driven by subject-oriented programming, are currently called for; and this will imply a major investment of resources within the Data Theory group. The first emphasis will be on developing a pilot project with GROUPER, a program that will combine clustering with the optimal scaling approach, and that can handle large multivariate data sets as are common in data mining.

Further extensions and improvements of the three-way software are under way through the development of new programs the 3WAYPACK, the suite of program for three-way analysis, and the updating of the present ones. Special emphasis is placed on missing data procedures and on redesigning the user interface to enhance its user-friendliness.

Research schools

Members of the research group are associated with the research schools IOPS: Interuniversity Research School for Psychometrics and Sociometrics (at present: Meulman, Kroonenberg, Van Os, Dusseldorp; in the past: Groenen, Bensmail and Commandeur; Meulman was scientific director from 2000-2003) and ISED: Interuniversity School for Education and Development (Kroonenberg and Meulman). In this capacity, they are involved with courses in the general area of multivariate data analysis. Sometimes the courses have a more applied emphasis (for PhD students who will write their thesis in one of the behavioral or related sciences), at other times the courses are much more theoretical (for PhD students who will write a thesis on a psychometric or related topic).

Erasmus Programme

Members of the research group (Meulman, Dusseldorp, Van Os) are part of the Socrates programme under the Erasmus agreement between Leiden University and the University of Cassino, Italy (2004-2006).

