ILO Working paper 102

A Technological Construction of Society: Comparing GPT-4 and Human Respondents for Occupational Evaluation in the UK

The paper systematically compares GPT-4's evaluations of occupations with those from a high-quality human survey in the UK, finding high correlation but also highlighting the potentials and risks of using LLMs in sociological and occupational research.

Despite initial research about the biases and perceptions of Large Language Models (LLMs), we lack evidence on how LLMs evaluate occupations, especially in comparison to human evaluators. In this paper, we present a systematic comparison of occupational evaluations by GPT-4 with those from an in-depth, high-quality and recent human respondents survey in the United Kingdom. Covering the full ISCO-08 occupational landscape (expanded to 580 occupational groups), and two distinct metrics (prestige and social value), our findings indicate that GPT-4 and human scores are highly correlated across all ISCO-08 major groups. Our analyses show both the potentials and risks of using LLM-generated data for sociological and occupational research. Potentials include LLMs’ efficiency, cost effectiveness, speed, and accuracy in capturing general tendencies. By contrast, there are risks of bias, contextual misalignment, and downstream issues, for example when problematic and opaque occupational evaluations of LLMs may feed back into working life. We also discuss the policy implications of our findings for the integration of LLM tools into the world of work.