Visual mental imagery or “seeing with the mind’s eye” also plays
an important role in many cognitive processes such as memory, spatial
navigation, and reasoning. One technique for creating visuals that correspond
to textual descriptions is called "text-to-image generation." It
affects a wide range of applications and research fields (e.g., photo-editing,
photo-searching, art-making, computer-aided design, image reconstruction,
captioning, and portrait drawing). With the development of text-to-image
generation models, artificial intelligence (AI) has reached a turning point
where robots are now able to convert human language into aesthetically
beautiful and coherent images, creating new opportunities for creativity and
innovation. The creation of stable diffusion models is one of this field's most
noteworthy developments. These models provide a strong framework for producing
realistic images that are semantically linked with the given textual
descriptions. But even with their remarkable abilities, conventional
text-to-image models frequently have serious shortcomings, especially when it
comes to training timeframes and computing costs. These models can be costly
and time-consuming to train because they usually need large amounts of
processing power and long training times. The main goal of this work is to
develop a better Stable Diffusion model to overcome these shortcomings and
produce high-quality images from text. The suggested model will drastically cut
down on training durations and processing needs without sacrificing the quality
of the output photos. The proposed method shows that the fine-tuning of the
Stable Diffusion model results in a considerable improvement in producing
images that are more akin to the original. The results of the improved model
denoted a lower FID score (212.52) when contrasted with the base model
(251.22). The fine-tuning process gradually improves the model's global ability
to produce better and more diverse digital imagery.
Author
(s) Details
Sara Faez Abdulghani
Department of Radiological Techniques, College of Health and Medical Technology,
Al-Zahraa University for Women, Karbala, Iraq.
Ashwan Anwer
Abdulmunem
Department of Computer Science, College of Computer Science and Information
Technology, Karbala University, Karbala, Iraq.
Please see the book here:- https://doi.org/10.9734/bpi/mcsru/v5/4707
No comments:
Post a Comment