Semi-Supervised Psychometric Scoring of Document Collections

Burak Suyunu, Gonul Ayci, Mine Ö\ugretir, Ali Taylan Cemgil, Suzan Uskudarli, Hamza Zeytinoglu, Bulent Ozel, Arman Boyacı

International Conference on Data Mining Workshops ({ICDMW})

Abstract

We describe a generic computational approach that can be used in developing methods for psychometric profiling. Our approach is based on semi-supervised analysis of document collections using topic modeling. The method depends on a supervisor providing a set of seed documents, grouped by abstract themes, such as Schwartz values or personality traits; and possibly a separate background document corpus. Instead of casting the problem into a standard classification framework, we interpret the group labels as a guide for finding distinguishing features. During training, we train each group of documents associated with a theme separately by using nonnegative matrix factorization to obtain theme specific topic distributions. In the analysis, we decompose a new document using the model learned during training to arrive at the theme scores. We demonstrate our approach on two psychometric profiling theories (Schwartz and Big Five) and evaluate our Schwartz scores with leave-one-out cross-validation method and compare Big Five scores to independent surveys, which are much more costly to carry out.

BibTeX

@inproceedings{suyunu2018semi,
  title        = {Semi-Supervised Psychometric Scoring of Document Collections},
  author       = {Suyunu, Burak and Ayci, Gonul and {\"O}{\u{g}}retir, Mine and Cemgil, Ali Taylan and Uskudarli, Suzan and Zeytinoglu, Hamza and Ozel, Bulent and Boyac{\i}, Arman},
  year         = 2018,
  booktitle    = {International Conference on Data Mining Workshops ({ICDMW})},
  pages        = {1367--1374},
  doi          = {10.1109/ICDMW.2018.00194},
  issn         = {2375-9232},
  group        = {conference},
  abstract     = {We describe a generic computational approach that can be used in developing methods for psychometric profiling. Our approach is based on semi-supervised analysis of document collections using topic modeling. The method depends on a supervisor providing a set of seed documents, grouped by abstract themes, such as Schwartz values or personality traits; and possibly a separate background document corpus. Instead of casting the problem into a standard classification framework, we interpret the group labels as a guide for finding distinguishing features. During training, we train each group of documents associated with a theme separately by using nonnegative matrix factorization to obtain theme specific topic distributions. In the analysis, we decompose a new document using the model learned during training to arrive at the theme scores. We demonstrate our approach on two psychometric profiling theories (Schwartz and Big Five) and evaluate our Schwartz scores with leave-one-out cross-validation method and compare Big Five scores to independent surveys, which are much more costly to carry out.},
  keywords     = {document handling;matrix decomposition;abstract themes;Schwartz values;personality traits;group labels;nonnegative matrix factorization;theme specific topic distributions;theme scores;psychometric profiling theories;Schwartz scores;leave-one-out cross-validation method;Big Five scores;semisupervised psychometric scoring;document collections;semisupervised analysis;topic modeling;seed documents;background document corpus;Feature extraction;Task analysis;Training;Matrix decomposition;Semantics;Conferences;Encyclopedias;non-negative matrix factorization;semi-supervised learning;Schwartz theory of basic human values;big five personality traits;psychometric profiling;personality recognition},
  link         = {https://ieeexplore.ieee.org/abstract/document/8637398},
  organization = {IEEE}
}