📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal launched AMÁLIA, a €5.5 million Portuguese language model, which is operational and outperforms some benchmarks. However, critical questions about its openness, data use, and objectives are still unanswered, raising broader issues for European sovereign LLMs.
Portugal’s €5.5 million AMÁLIA language model is now operational, marking a significant milestone in the country’s AI development efforts. While the model outperforms previous benchmarks on Portuguese tasks, key questions about its openness, data sources, and strategic goals remain unanswered, raising broader concerns for European national LLM initiatives.
AMÁLIA, a consortium project involving around 60 researchers from leading Portuguese institutions, was officially launched in December 2024. The model, based on a continuation of the EuroLLM multilingual foundation, was completed on September 30, 2025, and is now accessible through the FCT’s IAedu platform to 450,000 academic users. It handles text input and is expected to incorporate multimodal capabilities in future versions.
Technically, AMÁLIA was trained via extended pre-training on 107 billion tokens, with approximately 5.8 billion tokens from Portugal’s national web archive, Arquivo.pt. The model outperforms previous open models on European Portuguese benchmarks and surpasses Qwen 3-8B on most Portuguese-specific tests, though it still trails Qwen on some tasks like ALBA, the primary benchmark.
However, despite these technical achievements, questions persist about how open the model truly is, how much native-language data is enough, and what strategic goals the project is prioritizing. These questions are central to evaluating the project’s broader impact and future development.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Portuguese for Beginners: Practical Learning with SynapseLingo (Learn Portuguese)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.

Yahboom Raspberry Pi 5 ROS2 Robot Car 360°Movement, AI Vision & Tracking, Integrated Multimodal Large AI Model OpenRouter, AI Voice Interaction (Superior Without RPi5)
【Powerful control system】RaspberryPi 5 has made breakthroughs in processor speed,multimedia performance,memory and connection.Based on the RaspberryPi 5 main…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.

Large Language Models (LLMs)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.

Learn to Speak Italian Language PC/Mac Software – Intermediate level for all ages
Quickfire TV Quiz: Challenge a friend, or a computer
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign-Language Models
The case of AMÁLIA exemplifies the broader challenges faced by European countries developing national LLMs. The questions about openness, data sufficiency, and strategic goals are not only technical but also political, affecting transparency, national sovereignty, and AI policy. How these questions are answered will influence the future landscape of European AI and the ability of these models to serve local languages and communities effectively.
European Sovereign LLM Efforts and Structural Challenges
Across Europe, multiple countries have launched or announced large language models, including Italy’s Minerva, Germany’s Aleph Alpha, France’s Mistral, and others within the OpenEuroLLM consortium. Most efforts are at early stages, with models often trained from scratch or based on multilingual foundations. A common issue is the lack of clarity on how open these models are, how much native-language data is sufficient, and the strategic objectives guiding their development. Portugal’s AMÁLIA stands out because of its publicly funded nature and clear national scope, making these questions particularly relevant.
Public discourse has largely focused on technical benchmarks, but experts like Duarte O.Carmo have begun raising critical questions about the structural assumptions underpinning these projects. The debate is still in early stages, with many uncertainties about how these models will evolve and be integrated into national AI strategies.
“The three questions—openness, native data sufficiency, and strategic focus—are fundamental to understanding the true impact and future of national LLM efforts.”
— Duarte O.Carmo
Unanswered Questions About Openness and Strategy
It remains unclear how open AMÁLIA truly is—specifically, the extent of its training data transparency and licensing. Additionally, the strategic goals guiding its development—whether prioritizing openness, performance, or sovereignty—are not fully articulated. The final version’s capabilities and how these questions will be addressed in future iterations are still uncertain.
Next Milestones and Ongoing Evaluation
The final version of AMÁLIA is scheduled for release in June 2026, which will likely clarify some of the current uncertainties. Over the next 12-24 months, researchers and policymakers will scrutinize the model’s openness, data transparency, and strategic alignment. Broader European efforts will also be influenced by these developments, shaping the future of sovereign-language models across the continent.
Key Questions
What makes AMÁLIA different from other European LLMs?
AMÁLIA is a publicly funded Portuguese language model built as a continuation of a multilingual foundation, with a focus on serving Portugal’s national interests and academic community. It is distinct because of its strategic emphasis and public accountability.
Why are questions about openness important for AMÁLIA?
Openness affects transparency, reproducibility, and trust. Knowing how the model was trained, what data was used, and licensing terms are crucial for assessing its fairness, bias, and potential for public deployment.
What are the main challenges facing European sovereign LLMs?
The key challenges include defining clear openness standards, determining the necessary amount and type of native-language data, and establishing strategic goals that balance performance, sovereignty, and accessibility.
When will we know more about AMÁLIA’s final capabilities?
The final version is expected in June 2026, which will provide more clarity on its features, data transparency, and strategic positioning.
Source: ThorstenMeyerAI.com