https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/Head
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://www.nanopub.org/nschema#hasAssertion
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://www.nanopub.org/nschema#hasProvenance
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/provenance
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://www.nanopub.org/nschema#hasPublicationInfo
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/pubinfo
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.nanopub.org/nschema#Nanopublication
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://id.crossref.org/issn/1868-1158
http://purl.org/dc/terms/title
Studies on the Semantic Web
https://doi.org/10.3233/SSW240006
http://purl.org/dc/terms/abstract
Traditional dataset retrieval systems rely on metadata for indexing, rather than on the underlying data values. However, high-quality metadata creation and enrichment often require manual annotations, which is a labour-intensive and challenging process to automate. In this study, we propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini. Our analysis focuses on classifying column headers based on domain-specific topics from the Consortium of European Social Science Data Archives (CESSDA), a Linked Data controlled vocabulary. Our approach operates in a zero-shot setting, integrating the controlled topic vocabulary directly within the input prompt. This integration serves as a Large Context Windows approach, with the aim of improving the results of the topic classification task. We evaluated the performance of the LLMs in terms of internal consistency, inter-machine alignment, and agreement with human classification. Additionally, we investigate the impact of contextual information (i.e., dataset description) on the classification outcomes. Our findings suggest that ChatGPT and GoogleGemini outperform GoogleBard in terms of internal consistency as well as LLM-human-agreement. Interestingly, we found that contextual information had no significant impact on LLM performance. This work proposes a novel approach that leverages LLMs for topic classification of column headers using a controlled vocabulary, presenting a practical application of LLMs and Large Context Windows within the Semantic Web domain. This approach has the potential to facilitate automated metadata enrichment, thereby enhancing dataset retrieval and the Findability, Accessibility, Interoperability, and Reusability (FAIR) of research data on the Web.
https://doi.org/10.3233/SSW240006
http://purl.org/dc/terms/date
2024-09-11
https://doi.org/10.3233/SSW240006
http://purl.org/dc/terms/isPartOf
http://id.crossref.org/issn/1868-1158
https://doi.org/10.3233/SSW240006
http://purl.org/dc/terms/title
Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment
https://doi.org/10.3233/SSW240006
http://purl.org/ontology/bibo/authorList
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/author-list
https://doi.org/10.3233/SSW240006
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/spar/fabio/Article
https://orcid.org/0000-0001-8004-0464
http://schema.org/affiliation
https://ror.org/008xxew50
https://orcid.org/0000-0001-8004-0464
http://xmlns.com/foaf/0.1/name
Margherita Martorana
https://orcid.org/0000-0002-1267-0234
http://schema.org/affiliation
https://ror.org/008xxew50
https://orcid.org/0000-0002-1267-0234
http://xmlns.com/foaf/0.1/name
Tobias Kuhn
https://orcid.org/0000-0002-2146-4803
http://schema.org/affiliation
https://ror.org/008xxew50
https://orcid.org/0000-0002-2146-4803
http://xmlns.com/foaf/0.1/name
Lise Stork
https://orcid.org/0000-0002-7748-4715
http://schema.org/affiliation
https://ror.org/008xxew50
https://orcid.org/0000-0002-7748-4715
http://xmlns.com/foaf/0.1/name
Jacco van Ossenbruggen
https://ror.org/008xxew50
http://xmlns.com/foaf/0.1/name
Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/author-list
http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
https://orcid.org/0000-0001-8004-0464
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/author-list
http://www.w3.org/1999/02/22-rdf-syntax-ns#_2
https://orcid.org/0000-0002-1267-0234
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/author-list
http://www.w3.org/1999/02/22-rdf-syntax-ns#_3
https://orcid.org/0000-0002-2146-4803
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/author-list
http://www.w3.org/1999/02/22-rdf-syntax-ns#_4
https://orcid.org/0000-0002-7748-4715
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/provenance
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0001-8004-0464
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-1267-0234
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-2146-4803
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-7748-4715
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/assertion
http://www.w3.org/ns/prov#wasDerivedFrom
https://doi.org/10.3233/SSW240006
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/pubinfo
https://orcid.org/0000-0001-8004-0464
http://xmlns.com/foaf/0.1/name
Margherita Martorana
https://orcid.org/0000-0002-1267-0234
http://xmlns.com/foaf/0.1/name
Tobias Kuhn
https://orcid.org/0000-0002-2146-4803
http://xmlns.com/foaf/0.1/name
Lise Stork
https://orcid.org/0000-0002-7748-4715
http://xmlns.com/foaf/0.1/name
Jacco van Ossenbruggen
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://purl.org/dc/terms/created
2026-02-22T17:36:53.000+01:00
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://purl.org/dc/terms/creator
https://w3id.org/np/RAkkUz7qBJ-BIOCHV_4WCTgHCdTyI25_bnRuw166SXjwM/DOI-bot
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://purl.org/dc/terms/license
https://creativecommons.org/publicdomain/zero/1.0/
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://purl.org/nanopub/x/hasNanopubType
http://purl.org/spar/fabio/ScholarlyWork
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://purl.org/nanopub/x/introduces
https://doi.org/10.3233/SSW240006
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
http://www.w3.org/2000/01/rdf-schema#label
Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/sig
http://purl.org/nanopub/x/hasAlgorithm
RSA
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/sig
http://purl.org/nanopub/x/hasPublicKey
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArRL5MjH1KfuE89dpKsIiscF/THrJ4uSvhl0NgaC8x3TdTDrL00kCnlH+2g7PMYhaUQIGWq27TTXHAGp7ehO8yLjRNeDCc8zjUCQJqLbzay3DB51PCiz50OsMgxiZC1+e0bVdk/CAQV4oVo+VgI+awHI1bTT4Yp7pR2I67imf1PIcwczGVhn8EQwtNdWQOZ63wDgUCY+6IubHBQzjLfbYh0828UETEyIV28T7fvf5+y4A5M590InmgkLGpJbRXoL0pnCm1BtFOoxeAVqfivbxIZWPYN2Yd0cSfqwIIUYyaLFpjDrBwc4iJdOus4UQ9OYqkeZDMpU3opU8jWKDIm77jwIDAQAB
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/sig
http://purl.org/nanopub/x/hasSignature
PKv1MVA/4qMw7Pemeg1xGGWWB9QTFjx2dSM9Ac7Az82ZzS1xGY3GAGyyKELrEnwAIk7tlZZNrTeAurar045rIYnU6cg6EV8E37ZfdV+LgC5FHj+wsHT9h86VjBgToDkio69gicCP+KWr8vQnTHPmo1lTx6zgkPiTxLuIE/UGyQ6acgkVgQFg4M0+c60qdnXPLGKU331tJW60IxRa1dZQo7c54dKJSE+Xk6HVHoI8MCA4s4e6xw6U42qUiouLHLY5yOe+Pw1haAAo1URhftNDhE+5huycBlKEVjOKfsvInPhJ9HentE9l9Tt8LnzTpjZgAXBxlYB2igi41dcjVfH3nQ==
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/sig
http://purl.org/nanopub/x/hasSignatureTarget
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY
https://w3id.org/np/RAjg_CiZk1KUas15hpkYrxZRoro-umA0aTl_l2IEIjplY/sig
http://purl.org/nanopub/x/signedBy
https://w3id.org/np/RAkkUz7qBJ-BIOCHV_4WCTgHCdTyI25_bnRuw166SXjwM/DOI-bot