Master thesis: Multimodal representations for structured data like flowcharts and sequence diagrams

OmrådeLuleå
Publicerad2025-09-23
Ansök senastÖppet tills vidare

Om jobbet

Join our Team

About this Opportunity

Many images have structured information, e.g., flowcharts and sequence diagrams, and are of critical importance in telecom processes. Whilst the latest large language models, especially the commercial ones, are quite good at parsing the images for question answers, open-source models — especially smaller variants — are relatively poor. Additionally, and more importantly, it is important to convert these images into a structured format like a graph so that it is amenable to further processing – graph queries, embeddings, etc.

Previous work done on this https://openreview.net/forum?id=OSpqs2PdpG was accepted at a KDD workshop. Here we evaluated graph-based approach provided better retrieval results when it was done with interspersed text (i.e. text and images processed in the same way).

What You Will Do

Project Goals:
  • Processing of sequence diagrams (and other UML diagrams) to extract a structured graph.
  • Evaluation of alternate techniques for multimodal embeddings on structured data images.
  • Exploring options like building graph auto-encoders on top of the structured data generation is a possibility time permitting.

Work Description:
  • Using a large public image dataset for UML diagrams. One option is https://zenodo.org/records/15103682 , however we are free to explore other options. One advantage of this dataset is it is a dataset with PlantUML diagrams allowing flexibility to modify the source code as well as programmatically generate further diagrams.
  • Evaluate both commercial models as well as open-source models to generate structured output like a graph in JSON format.
  • Fine-tune open-source models for better performance based on the public data used.
  • Evaluate retrieval and question answering by considering embeddings based on generated descriptions vs. structured data processing.
  • Time permitting, evaluate graph embedding models for processing.


The Skills You Bring

Qualifications and Experience:
  • MSc Student in EE/Computer Science/Mathematics or other related fields.
  • Whilst not mandatory, some experience in image processing and computer vision would help.
  • Proficient in Machine Learning and deep learning.
  • Good programming skills are required; knowledge of C++ and Python.
  • Familiarity with programming environment in Linux.


Additional Details

The primary contact for this work would be Sujoy Roychowdhury, Principal Data Scientist based out of Bangalore, India. However, regular online interactions would be taking place. There will be local contact in Kista to also support if anything urgent

Ericsson AB