When confronted
with a large set of data, particularly qualitative data, I think most
people's reaction is to set about working out how it can be grouped
together for ease of interpretation.
Manually
clustering data is acceptable for small amounts of data, and we are so
good at pattern recognition that we will generally create very good
subsets of the whole data set with which to do further work.
Large data sets,
however, make this uneconomical, unless we distribute the task, and
analysing the data requires us to use some form of automation.
There are many
ways of clustering data, be it quantitative or qualitative, but the field
remains one in which there is active research.
Areas of research
Currently I am (on an occasional basis) looking at dynamic
SOMs, n-tuple approaches, weighted update rules, using kernel methods in
association with SOMs and the potential for using stochastic diffusion
search and/or climate space modelling (CSM) as clustering techniques.
I am hoping to do more on this than time currently allows,
but it is more of a sideline which just happens to have major
applicability to most of what I do.