Master thesis: AI-Assisted Nightly Test Result Triage and Fault Analysis
Actia Nordic ABPublicerad: 2026-05-18
Ansök senast: 2027-05-31
Beskrivning
Background
Modern software development relies heavily on continuous integration and automated testing to ensure software quality and system stability. In large-scale embedded and automotive software systems, nightly test executions may generate thousands of test results across multiple platforms, configurations, and environments.
A significant challenge for engineering teams is the manual triage of failing tests. Engineers often need to inspect logs, test reports, metadata, source code, and historical execution results to determine whether a failure originates from production code, unstable tests, infrastructure issues, or configuration problems. This process is both time-consuming and difficult to scale.
Recent advances in Artificial Intelligence, Machine Learning, and Large Language Models (LLMs) open new possibilities for intelligent analysis of software testing artifacts and automated fault triage. However, the practical usefulness, robustness, and limitations of such systems in real-world CI environments remain largely unexplored.
This thesis aims to investigate how AI-based methods can support engineers by automatically analyzing nightly test failures and assisting with root-cause identification and decision-making.
Scope
The objective of this thesis is to design, implement, and evaluate an AI-assisted system for automated triage of nightly test failures.
The system should be capable of ingesting and analyzing information such as:
Based on this information, the assistant should support engineers by:
Example recommendations may include:
The thesis should include both:
The exact scope and research direction can be refined together with the students based on interests and background.
Examples of research questions include:
We are looking for one or two students with interest in software quality, AI, and developer tooling.
Relevant background includes:
Software Engineering
Experience or interest in the following areas is considered valuable:
Modern software development relies heavily on continuous integration and automated testing to ensure software quality and system stability. In large-scale embedded and automotive software systems, nightly test executions may generate thousands of test results across multiple platforms, configurations, and environments.
A significant challenge for engineering teams is the manual triage of failing tests. Engineers often need to inspect logs, test reports, metadata, source code, and historical execution results to determine whether a failure originates from production code, unstable tests, infrastructure issues, or configuration problems. This process is both time-consuming and difficult to scale.
Recent advances in Artificial Intelligence, Machine Learning, and Large Language Models (LLMs) open new possibilities for intelligent analysis of software testing artifacts and automated fault triage. However, the practical usefulness, robustness, and limitations of such systems in real-world CI environments remain largely unexplored.
This thesis aims to investigate how AI-based methods can support engineers by automatically analyzing nightly test failures and assisting with root-cause identification and decision-making.
Scope
The objective of this thesis is to design, implement, and evaluate an AI-assisted system for automated triage of nightly test failures.
The system should be capable of ingesting and analyzing information such as:
- Test logs
- Test reports
- Test metadata
- Test source code
- Relevant production/source code
- Historical test execution results
Based on this information, the assistant should support engineers by:
- Classifying failures into likely fault categories
- Identifying probable root-cause locations, such as file, module, or function
- Producing concise human-readable explanations
- Recommending suitable next actions
Example recommendations may include:
- Fixing production code
- Stabilizing flaky tests
- Re-running tests
- Investigating the test environment
- Reviewing configuration or dependency issues
The thesis should include both:
- A practical prototype implementation
- A scientific evaluation of the proposed approach
The exact scope and research direction can be refined together with the students based on interests and background.
Examples of research questions include:
- How accurately can ML- or LLM-based methods classify nightly test failures into meaningful fault categories?
- How does the selection of context - such as log fragments, source code snippets, and metadata - affect triage accuracy?
- Can an AI assistant identify likely root-cause locations with useful precision?
- How robust is the assistant when logs are incomplete, noisy, ambiguous, or contain multiple simultaneous failures?
- How does AI-assisted triage compare to manual triage performed by engineers in terms of:
- accuracy time to diagnosis agreement on root cause?
- Which methods are most suitable for detecting flaky or intermittent tests using historical execution data?
- What are the main risks of introducing AI into CI pipelines, including:
- hallucinations misclassification confidentiality concerns overreliance on generated explanations?
We are looking for one or two students with interest in software quality, AI, and developer tooling.
Relevant background includes:
Software Engineering
- Computer Science
- Embedded Systems
- DevOps
- AI / Machine Learning
Experience or interest in the following areas is considered valuable:
- Python development
- ML/LLM application development
- Log parsing and data processing
- CI/CD systems and automated testing
- C/C++ or similar programming languages
- Basic understanding of static analysis
- Classification metrics and evaluation methods
- Automotive embedded systems (meritorious)










