Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now
after Summer seizure With a group of strong and open source models with open source and coding the models that match with them or in some cases, closed American competitors/competitors in some cases, The “QWEN Team” from AI Crack returned from Alibaba again today with the release of a new AI image generator. – The source is also open.
QWEN-TAIAGE stands out in a crowded field of obstetric image because of Focus on presenting the text accurately inside the video photos – An area still many competitors are struggling.
With the support of both alphabet and settlement programs, the model is particularly prepared in complex printing management, multi -lines, semantics at the paragraph level, and Dial language content (for example, English-Chinese).
In practice, this allows users Create content such as film stickers, presentation slices, store scenes, handwritten hair, and flower graphs – With a fragile text in line with their claims.
AI Impact series returns to San Francisco – August 5
The next stage of artificial intelligence here – are you ready? Join the leaders from Block, GSK and SAP to take an exclusive look on how to restart independent agents from the Foundation’s workflow tasks-from decisions in an actual time to comprehensive automation.
Securing your place now – the space is limited: https://bit.ly/3GUPLF
Examples of QWEN-IMAGE output include a wide range of use cases in the real world:
- Marketing and brands: Blasting Language Persons with brand slogans, elegant line, and consistent design forms
- Display designFloor, slices familiar with planning with the hierarchical serials for the title and the appropriate views of the topic
- educationGetting the subjects of classrooms that are characterized by graphic fees and the educational text presented accurately
- Retail and e -commerceStore’s interface scenes where you should read products, banners and environmental context
- Creative contentHandwritten poetry, scene novels, clarification similar to animation with the text of the compact story
Users can interact with the form on QWEN Chat The web site by determining the mode of “generation of images” from the buttons below the admission field.

However, my brief initial tests revealed that the text and immediate commitment were not significantly better than Midjourney, a famous artificial intelligence generator from the American company of the same name. My session through QWEN Chat has produced multiple errors in rapid understanding and sincerity of the text, which raises the disappointment of my hope, even after repeated attempts and the immediate reformulation:


However, Midjourney provides only a limited number of free generations and requires contributions for more information, compared to QWEN, which, thanks to its open source license and its postulates published on it EmbroideryIt can be adopted by any institution or provider affiliated with an external authority.
Licensing and availability
QWEN image was distributed under Apache 2.0 licenseAllow commercial and non-commercial use, redistribution, and modification-although support and inclusion of the license text is required for derived work.
This may make it attractive to institutions looking for an open source images generation tool for use in making internal or external guarantees such as bulletins, ads, notifications, newsletters and other digital contacts.
But the fact that the model training data is still a secretly deprived – As with most AI’s leading photo generators – Some institutions may spoil the idea of using it.
QWEN, unlike Adobe Firefly or GPT-4o original images from Openai, For example, It does not provide compensation for the commercial uses of its product (That is, if a lawsuit is filed against the user due to the violation of copyright, Adobe and Openai will help support them in court).
Form and associated assets-including experimental laptops, evaluation tools and textual programs for control-are available through multiple warehouses:
In addition, the direct evaluation portal called AI Arena provides users to compare the generations of images in marital tours, which contributes to the ELO leaders.
Training and Development
Behind QWEN-DIMAGE performance is A wide -ranging training process is based on gradual learning, align multimedia tasks, and arranging aggressive dataAccording to the technical paper issued by the research team today.
The training group includes billions of pairs of pictures obtained from four areas: natural images, human images, artistic content and design (such as stickers and user interface layouts), and data that focuses on the artificial text. The QWEN team did not specify the size of the training data setRegardless of “billions of text pairs”. They made a collapse of the rough percentage of each of the category of content that included:
- nature: ~ 55 %
- Design (user interface, stickers, art): ~ 27 %
- People (pictures, human activity): ~ 13 %
- Artificial text provision data: ~ 5 %
It is worth noting that QWEN confirms that all artificial data has been created at home, and no images created were used by other models of artificial intelligence. Despite the detailed stages and liquidation described, The documents do not clarify whether any of the data is licensed or derived from public or property data groups.
Unlike many obstetric models that exclude the artificial text due to the risk of noise, QWEN-TAGAGE uses artificial display pipelines that are tightly controlled to improve the character coverage-especially for low-frequency letters in the tray.
A strategy similar to the curriculum is used: The model begins with simple suspension pictures and a non -text contentThen it comes to sensitive text scenarios, mixing mixed language, and dense vertebrae. this The gradual exposure to helping the model appear to circulate through text programs and types of formatting.
QWEN-Disage merges three basic units:
- QWEN2.5-VLThe multimedia language model, extracts the contextual meaning and guides generation through the system’s claims.
- Vae encryption/decoderTrainers of high -resolution documents and realistic plannings in the real world, deal with detailed visual representations, especially the small or thick text.
- MmditThe backbone of the proliferation model, coordinating joint learning through the methods of image and text. A new MSROPE system (developed topical coding) improves spatial compatibility between symbols.
Together, these ingredients allow the qwen-amage to work effectively in tasks that involve understanding and generate images and exact editing.
Performance standards
QWEN image was evaluated against many general standards:
- Geneva and DPG To agree
- One bench seats and Tin For formative thinking and devotion to design
- CVTG-2Kand ChinesewordAnd Long text seats To present the text, especially in multi -language contexts
In almost each case, the QWEN-IMAGE image or transcends closed source models such as GPT Image 1 (High), Seedream 3.0 and Flux.1 Kontext (Pro). It is worth noting that its performance of the Chinese text was much better than all comparative systems.
On AI Arena General Tersors based on more than 10,000 comparisons of the human husband-QWEN-TAGAGE is in general in general and is an open source model.
The effects of the technical decision makers of the institutions
For the AI teams for institutions that manage the complex media workflow, QWEN-IMAGE offers many functional advantages that are in line with the operational needs of different roles.
Those who run a life cycle of vision language models-from training to publishLook for a value in the quality of the Created output of QWEN-amge and its ingredient is ready for integration. The open source nature reduces the costs of licensing, while the standard structure (QWEN2.5-VL + VAE + MMDIT) facilitates adaptation to the designated data sets or refining them for the outputs of the field.
the Training data, along the lines of study curricula and clear standard results, helps the difference in evaluating fitness for the purpose. Whether publishing marketing photos, documents offers or e-commerce product graphics, QWEN-DISE allows a quick experience without royal restrictions.
Engineers The cost of building artificial intelligence pipelines or models through the distributed systems will be estimated. The model has been trained to use the product structure and the consumer, and supports multi-precisely developed processing (256p to 1328p), and was designed to run using Megatron-LM and Tensor. this QWEN-amagage makes a candidate for publication in mixed cloud environments where reliability and productivity are important.
Moreover, support for photo editing workflow to the image (TI2I) and the demands of the task allows their use in actual or interactive applications.
Professionals focused on swallowing data, verifying health and transformation QWEN-ISMAAGE can be used as a tool to create artificial data sets to train or increase computer vision models. Its ability to generate high -resolution images with multi -language illustrative comments can improve in definition, object detection, or layout.
Since QWEN-Image was Also train to avoid artifacts such as QR codesDistinguished text and watermarks, and provides high-quality artificial inputs from many general models-assisting institutions teams to maintain the integrity of the training group.
Looking for reactions and opportunities for cooperation
The QWEN team emphasizes openness and community cooperation in issuing the form.
The developers are encouraged to test and set a QWEN image, submit withdrawal requests, and share the evaluation board. Reactions will be on the presentation of the text, the loyalty of editing, and cases of multi -language futures for future repetitions.
With a declared goal “Reducing technical barriers in front of the creation of visual content”, the team hopes to serve QWEN-amage as a model only, but as a basis for further research and practical publishing through industries.
https://venturebeat.com/wp-content/uploads/2025/08/aliyun-1.png?w=1024?w=1200&strip=all
Source link