Application of small language models for semantic analysis of Web interface accessibility

B.O. Kuzikov, O.A. Shovkoplias, P.O. Tytov, S.R. Shovkoplias

Abstract


Web accessibility remains a critical aspect of ensuring equal opportunities for internet resource usage, especially for people with disabilities. The Web Content Accessibility Guidelines 2.5.3 criterion “Label in Name” requires that the accessible name of an interface component include text that is visually presented. Existing automated verification methods for this criterion are predominantly based on primitive string comparison, which does not account for semantic context. Objective: investigate the possibilities of using small language models with up to 1 billion parameters for automated semantic analysis of compliance with Web Content Accessibility Guidelines 2.5.3, as an alternative to resource-intensive large language models and limited algorithmic methods. Methodol ogy: the research involved creating synthetic datasets (7,200 English-language and 5,615 Ukrainian-language samples) and using real-world datasets (Top500 – 380 samples, UaUniv – 319). Sentence Bidirectional Encoder Representations from Transformers models were tested for computing semantic similarity, and fine-tuning of the google/electra-base-discriminator model was performed for 3-class classification of semantic relationships (“similar”, “unrelated”, “opposite”). Results: the trained model of 437 MB demonstrated high accuracy on syn thetic data (0.96) and sufficient accuracy on real datasets (Top500: 0.77, UaUniv: 0.73). The model effectively identifies all three classes of semantic relationships with an accuracy of 95.1 % for “opposite”, 92.7 % for “un related”, and 97.4 % for “similar” texts in the validation sample. Conclusions: the research confirmed the feasi bility of using small language models for automated verification of semantic compliance according to Web Content Accessibility Guidelines 2.5.3. The proposed approach provides acceptable classification accuracy with significantly lower computational costs compared to large language models, allowing for the integration of se mantic analysis into standard development and testing processes. Despite certain limitations, the developed so lution can significantly improve web accessibility testing.

Prombles in programming 2025; 2: 77-86

 


Keywords


Small Language Models; Semantic Analysis; Text Classification; Web Accessibility; Web Content Accessibility Guidelines; Model Fine-Tuning; Natural Language Processing

Full Text:

PDF

References


Web Content Accessibility Guidelines (WCAG) 2.1 [Electronic resource]. 2025. URL: https://www.w3.org/TR/WCAG21/

Suarez C. Comprehensive Guide to Automated Accessibility Testing [Electronic resource]. 2024. URL: https://kobiton.com/blog/comprehensive guide-to-automated-accessibility-testing/

Prasad M. DigitalA11Y. Automated Accessibility Testing Is Not a Sil-ver Bullet [Electronic resource]. 2025. URL: https://www.digitala11y.com/automated accessibility-testing-is-not-a-silver-bullet/

Wieland R. Limitations of an Automated-Only Web Accessibility Plan [Electronic resource]. 2024. URL: https://allyant.com/blog/limitations-of-an automated-only-web-accessibility-plan/

Intelligence Community Design System. Limitations of automated testing [Electronic resource]. URL: https://design.sis.gov.uk/accessibility/testing/a utomated-testing-limitation

Barrell N. Enhancing Accessibility with AI and ML [Electronic resource]. 2023. URL: https://www.deque.com/blog/enhancing accessibility-with-ai-and-ml/

Du N. et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts // Proc Mach Learn Res. ML Research Press, 2021. Vol. 162. P. 5547–5569.

Achary S. The Rise of Small Language Models: A New Era of AI Accessibility and Efficiency [Electronic resource]. 2024. URL: https://medium.com/small-language models/the-rise-of-small-language-models-a new-era-of-ai-accessibility-and-efficiency 752322d82656.

Aralimatti R. et al. Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective. Preprints, 2025.

Abdin M. et al. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. 2024.

Moz. Top 500 Most Popular Websites [Electronic resource]. URL: https://moz.com/top500

Титов П.О., Шовкопляс О.А., Кузіков Б.О. Аналіз вебдоступності сайтів українських закладів вищої освіти // Системні дослі дження та інформаційні технології. 2025. № 2.

Gu J. et al. A Survey on LLM-as-a-Judge. 2024. Vol. 1.

Reimers N., Gurevych I. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation // EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2020. P. 4512–4525.

Muennighoff N. et al. MTEB: Massive Text Embedding Benchmark // EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2022. P. 2006–2029.

Clark K. et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators // 8th International Conference on Learning Representations, ICLR 2020. International Conference Representations, ICLR, 2020. on Learning

Tytov P. Semantic Language Models for WCAG [Electronic resource]. URL: https://www.kaggle.com/datasets/tytovpavel/s emantic-language-models-for-wcag


Refbacks

  • There are currently no refbacks.