Literature Review
In the study conducted by R. Leupers et al. [1] , the
importance of software in digital signal processing (DSP) applications
was examined, highlighting the need for automated tools to support DSP
based software development. Techniques for high-level
block-diagram-based modeling of DSP applications and translating
block-diagram specifications into efficient C programs through global
target-independent optimization methods, and compiling C programs into
optimized machine code for programmable DSP processors were reviewed.Ansong Ni et al. [2] introduced LEVER, a straightforward
approach to enhance language-to-code generation by training verifiers to
assess program correctness based on natural language input, the program
itself, and execution results. The sampled programs were ranked by
integrating the verification score with the LLM generation probability.
LEVER achieved notable improvements ranging from 4.6% to 10.9% with
code-davinci-002 across four datasets containing samples from table QA,
math QA, and fundamental Python programming domain. Chanchai
Supaartagorn et al. [3] introduced an automatic code generator tool
based on structured flowcharts, comprising basic shapes that can be
combined to form structured flowcharts convertible into source codes.
The tool’s performance was evaluated with two groups: 5 experts and 93
general users. Results demonstrated high satisfaction levels among both
groups, with average values of 4.48 and 4.27 and standard deviations at
0.59 and 0.64 for experts and general users respectively. The tool
demonstrated agreeable performance. Tomasz Szydło et al.
[4] noticed that programming libraries often demand excessive
resources, making them impractical for deployment on embedded
processors. With their research they introduced the concept of source
code generation for machine learning models, along with algorithms for
generating commonly utilized machine learning methods. The effectiveness
of this concept has been validated through various use cases.
Batuhan Aşıroğlu et al. [5] observed that the web design
process starts with creating mock-ups for individual web pages, either
manually or using graphic design tools. These mock-ups are then
converted into structured HTML or similar code by software engineers,
undergoing refinements until the desired template is achieved. Their
research aimed to automate the code generation process from hand-drawn
mock-ups, using computer vision techniques and incorporating select deep
learning methods. Their system demonstrated a method accuracy of 96%
and a validation accuracy of 73%. Samantha Ray et al. [6]observed that the existing user interface driven solutions for creating
flowcharts present challenges to learners with intricate drag-and-drop
menus, while sketching-based alternatives lack support beyond initial
pseudocode generation. The researchers created Flow2Code to facilitate
the translation of hand-drawn flowcharts into code. Flow2Code recognizes
various flowchart shapes, and then converts the flowcharts into
executable code. It has an intuitive, interactive interface for users to
modify both their flowchart and resulting code. Humans have the ability
to comprehend technical documents effortlessly. Researchers have made
several attempts to impart such comprehension capabilities to AI-based
systems. Research done by N. G. Bourbakis et al. [7]focused on the utilization of the interaction between two technical
document (TD) modalities: block diagrams and associated natural language
text to develop a system for generating pseudocode that defines the
functionality of the system under examination automatically. The
methodology involves mapping the TD modalities into Stochastic
Petri-nets (SPN) to enhance system diagrams that are used for pseudocode
generation. With this method the researchers aimed to achieve automatic
deep comprehension of technical documents. Aspects like the use of
diagram images and the automated understanding of mathematical formulas
in technical documents remain relatively understudied.Gkorgkolis Nikolaos et al. [8] introduced a new formal
scheme for modeling digital diagram images, extending to a generative
framework for creating artificial images and annotations. They proposed
a method to convert the pseudocode generation problem into an image
captioning task, employing a range of techniques based on adaptive image
partitioning. They addressed semantic understanding of mathematical
formulas by conducting an evaluative survey, which was followed by the
introduction of a formal synthesis framework that utilized formula
graphs as metadata to produce valuable formulas. This synthesis
framework is validated using a deep geometric learning mechanism that
utilizes formula data to simulate missing a priori knowledge.
Enrique Dehaerne et al. [9] analyzed 37 publications
sourced from the arXiv and IEEE Xplore databases, which were based on
projects in which ML models were trained on programming language data to
produce code. They identified three main paradigms of code generation:
description-to-code, code-to-description, and code-to-code. These papers
primarily focused on ML applications such as generating code from
natural language descriptions, documentation generation, and automatic
program repair. Commonly used ML models for these research projects
include recurrent neural networks, transformers, and convolutional
neural networks, along with various other neural network architectures
and non-neural techniques. Comparisons of model types, tokenizers, data
volume and quality, and evaluation methods for synthesized code were
also discussed in this review. Presently, researchers are focused on
generating code from requirement documents; however, existing methods
often struggle with requirements that demand intricate problem-solving
abilities. Zejie Liu et al. [10] introduced a novel method
for generating source code from flowcharts along with textual
descriptions. The researchers manually curated a benchmark dataset
comprising 320 flowcharts paired with their source codes. Adapting
existing approaches to this new task has its challenges due to the
distinctive nature of flowcharts containing various elements and the
multiple connections between nodes within them. To address these
challenges, The researchers have proposed a two-stage code generation
model. In the first stage, a structure recognition algorithm is used to
translate the flowchart into pseudo code. In the second stage, a code
generation model converts the pseudo-code into executable code. To
ensure a comprehensive understanding of algorithms, it is necessary to
devise methods for generating corresponding text descriptions. The study
conducted by Sagarika Ghosh et al. [11] aligns algorithms
in various forms, such as pseudocode and hand-drawn flowcharts, with
textual explanation. The researchers proposed rules for generating
pseudocode from hand-drawn flowcharts and a transfer learning method
based on S-DistilBERT to find the similarity score between different
forms of algorithms and their text descriptions. Block and line
identification, along with OCR were used to generate pseudo codes from
hand-drawn flowcharts. Experimental results indicate an 85% success
rate in generating equivalent pseudocode. Their fine-tuned S-DistilBERT
model achieved accuracies of 75.59% for matching existing pseudocode
and 74.57% for generated pseudocode with their corresponding textual
descriptions. The rules devised by the researchers have been found to be
suitable only for non-recursive flowcharts. In the research done byXiang-Hu Wu et al. [12 ], they proposed a structure
identification algorithm for structured flowcharts, verified for
correctness using enumeration iteration. An automatic code generation
algorithm was also introduced, which was validated through enumeration
iteration. Finally, an integrated development platform was developed
utilizing these algorithms, and incorporating flowchart modeling, code
automatic generation, and support for
CDT\GCC\GDB. The effectiveness of the
proposed algorithms were evaluated through practical implementation.