The course focuses on methods for exploring and analyzing categorical longitudinal data describing life courses such as family trajectory or professional careers. The aim is: (i) to explain the whole process of sequence analysis from the preparation of longitudinal data and the exploration of sequences to the use of more advanced explanatory analyses, and (ii) to train participants to the practice of sequence analysis by means of the TraMineR package for the R graphical and statistical environment.
Covered topics include - for state sequences:
- the visual rendering of sequence data,
- transversal and longitudinal sequence descriptive statistics,
- optimal matching and other ways of measuring the dissimilarity between sequences,
- clustering individual sequences,
- identifying representative trajectories,
- discrepancy analysis and regression trees for sequence data;
For event sequences:
- rendering the sequencing,
- mining typical subsequences and associations between those subsequences,
- finding the subsequences that best discriminate between groups such as between women and men for instance.
- measuring the dissimilarity between event sequences and dissimilarity-based analysis of event sequences.
The course is user oriented and includes an introduction to R to provide the basic knowledge required for using TraMineR. The scope of sequence analysis will be illustrated with real data from the Swiss Household Panel http://www.swisspanel.ch and other datasets that come with the TraMineR package. Participants are encouraged to train the methods with their own data.
About R and TraMineR
R is free open source software available at http://www.r-project.org. TraMineR is distributed through the CRAN http://cran.r-project.org. See http://mephisto.unige.ch/traminer for details about TraMineR
Abbott, A. and A. Tsay (2000). Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33. (With discussion, pp 34–76).
Aisenbrey, S. and A. E. Fasang (2010). New life for old ideas : The “second wave” of sequence analysis bringing the “course” back into the life course. Sociological Methods and Research 38(3), 430–462.
Billari, F. C. (2001). Sequence analysis in demographic research. Canadian Studies in Population 28(2), 439–458. Special Issue on Longitudinal Methodology.
Billari, F. C. (2005). Life course analysis : Two (complementary) cultures? Some reflections with examples from the analysis of transition to adulthood. In R. Levy, P. Ghisletta, J.-M. Le Goff, D. Spini, and E. Widmer (Eds.), Towards an Interdisciplinary Perspective on the Life Course, Advances in Life Course Research, Vol. 10, pp. 267–288. Amsterdam : Elsevier.
Elzinga, C. H. (2010). Complexity of categorical time series. Sociological Methods & Research 38(3), 463–481.
Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization of family-life trajectories of young adults : A cross-national comparison using sequence analysis. European Journal of Population 23, 225–250.
Gabadinho, A., G. Ritschard, N. S. Müller, and M. Studer (2011a). Analyzing and visualizing state sequences in R with TraMineR. Journal of Statistical Software 40(4), 1–37.
Gabadinho, A., G. Ritschard, M. Studer, and N. S. Müller (2011b). Extracting and rendering representative sequences. In A. Fred, J. L. G. Dietz, K. Liu, and J. Filipe (Eds.), Knowledge Discovery, Knowledge Engineering and Knowledge Management, Volume 128 of Communications in Computer and Information Science (CCIS), pp. 94–106. Springer-Verlag.
Maindonald, J. H. (2008). Using R for data analysis and graphics : Introduction, code and commentary. Manual, Centre for Mathematics and Its Applications, Austrialian National University.
Piccarreta, R. and F. C. Billari (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society : Series A (Statistics in Society) 170(4), 1061–1078.
Piccarreta, R. and O. Lior (2010). Exploring sequences : a graphical tool based on multi-dimensional scaling. Journal of the Royal Statistical Society : Series A (Statistics in Society) 173(1), 165–184.
Pollock, G. (2007). Holistic trajectories : A study of combined employment, housing and family careers by using multiple-sequence analysis. Journal of the Royal Statistical Society A 170(1), 167–183.
Ritschard, G., A. Gabadinho, N. S. Müller, and M. Studer (2008). Mining event histories : A social science perspective. International Journal of Data Mining, Modelling and Management 1(1), 68–90.
Ritschard, G., A. Gabadinho, M. Studer, and N. S. Müller (2009). Converting between various sequence representations. In Z. Ras and A. Dardzinska (Eds.), Advances in Data Management, Volume 223 of Studies in Computational Intelligence, pp. 155–175. Berlin : Springer-Verlag.
Studer, M. (2012). Le manuel de la librairie WeightedCluster : un guide pratique pour la création de typologie de séquences avec R. In Étude des inégalités de genre en début de carrière académique à l’aide de méthodes innovatrices d’analyse de données séquentielles, PhD Thesis. Faculté des SES, Université de Genève.
Studer, M., N. S. Müller, G. Ritschard, and A. Gabadinho (2010). Classer, discriminer et visualiser des séquences d’événements. Revue des nouvelles technologies de l’information RNTI E–19, 37–48.
Studer, M., G. Ritschard, A. Gabadinho, and N. S. Müller (2011). Discrepancy analysis of state sequences. Sociological Methods and Research 40(3), 471–510.