A potential application of LLMs in SE is the production of sets of requirements for a particular problem. In this paper, we survey how researchers evaluate the quality of requirements generated by LLMs and assess the adequacy of such approaches. Particularly, we show that metrics traditionally employed to evaluate the similarity of LLM-generated artifacts against ground truths do not perform well for requirements. The underlying reason is that similarity ought not to be achieved on the textual output but on the problem space captured by the requirements.
«
A potential application of LLMs in SE is the production of sets of requirements for a particular problem. In this paper, we survey how researchers evaluate the quality of requirements generated by LLMs and assess the adequacy of such approaches. Particularly, we show that metrics traditionally employed to evaluate the similarity of LLM-generated artifacts against ground truths do not perform well for requirements. The underlying reason is that similarity ought not to be achieved on the textual o...
»