In this technological environment that evolves day by day and hour by hour, we provide the optimal technology for our customer’s needs.
Here we introduce the cutting edge of our R&D by theme, within the scope that we can introduce at present.
Designing AI: Deep Learning
Here we introduce an example of neural network and machine learning flow design. The following 2 diagrams are an example of using deep learning to make a neural network that detects the center line of roads. The left diagram is the configuration diagram of the designed neural network. It has the shape of a CNN (convolutional neural network) commonly used for image recognition, configured with 3 convolution layers and 2 fully connected layers. In normal CNN’s, there is a pooling layer between them, but here the design omits the pooling layer in order to retain location information as much as possible. The diagram on the right shows the results of passing actual road images into the neural network to detect the center line of the road.
The most difficult of the four realistic hurdles mentioned at the beginning is adding large volumes of correct information. Deep learning belongs to the machine learning category known as “supervised learning,” so it requires training data in order to learn. For example, say that you wish to make an image recognition program that uses deep learning to identify the location of cardboard boxes in pictures taken inside a factory. In that case, human staff would prepare vast numbers of pictures with cardboard boxes in them, and put markers showing the location of the cardboard boxes in those images. The markers added to the images are correct information. The set of images and correct information together form “training data.” This correct information is also known as “ground truth.”
Generally, it is said that in order to secure accuracy adequate for practical use, a minimum of 100,000 images with correct information added are required. Depending on the application, tens, hundreds, or thousands of times as much data may be necessary. This is an extraordinary amount, but unfortunately, the work must be done manually. Securing the human resources for this is a major hurdle for realizing deep learning. For this reason, we have formed a correct information production team at our Vietnamese subsidiary Sanei Hytechs Vietnam Co., Ltd. We already have made achievements responding to high volume orders from major corporate customers.
Implementing AI: Edge Computing
What is needed in this case is to realize part of the AI algorithm on a microcomputer or FPGA. This involves connecting the microcomputer and FPGA with a high performance machine, to process data in real time on the microcomputer or FPGA and process data with a tolerable delay on the high performance machine. This concept, doing processing that was previously performed only on high performance machines on microcomputers and FPGA on the edges of the overall network system, is gaining popularity and Sanei Hytechs has already refined the necessary implementation technology. In order to implement AI algorithms on edge devices, technology is necessary to make lighter algorithms that can be installed on microcomputers or FPGA’s, and appropriately divide content processed between software and digital. Of course, using the devices correctly is also required as basic technology.
Based on our estimated figures, we examine how much resources should be reduced in order to install on the target device. In particular, it is very important what bit width of arithmetic will be used. If digital circuits are parallelized, then arithmetic is normally done with fixed point instead of floating point, so the neural network arithmetic must also be converted to fixed point. We also check the error that arises in neural network output results after converting to fixed point arithmetic. For example, in the case of the neural network that detects the center line in images of roads, we input images of roads into a neural network converted to fixed point arithmetic and have it center line output location information. The error is the difference between these values and the floating point arithmetic results from a high performance machine. We determine the ultimate bit width from the relationship between this error and the trade-off of resources from bit width. However, in order to realize bit width that satisfies specified accuracy, it is sometimes necessary to have an amount of resources that will not fit on the target device. In this case, we return to the AI design phase, revise the neural network configuration, and conduct deep learning again.
Once we have an approximation of resources and processing time, we can start concrete implementation. Using high level synthesis tools like SDSoC® from Xilinx, Inc., we produce digital circuits. If the size or latency of these digital circuits is larger than expected, then we design RTL by hand.
The concrete implementation flow does not differ significantly from typical systems, but in order to understand the trade-off relationship and make proper judgments, the staff in charge of implementation must have a solid understanding of AI. The decision of whether to return to the AI design phase when we cannot reach the target performance is largely entrusted to the staff in charge of implementation.
Improving Design Efficiency: Advanced Automation
For example, we are developing a program that conducts automatic analysis of circuit diagram topology with a search algorithm based on graph theory to categorize transistors on circuit diagrams (MOSFET) as “digital elements” or “analog elements”, and a program that determines circuit types (operational amplifier, etc.). By recognizing patterns in the same way as a human, it can make advanced judgments that are difficult by simply comparing character strings and numbers. We use information gained from this with automatic verification and automatic design programs that we are designing separately. Automation technology enables human engineers to focus on more difficult, more interesting design.
Sometimes these programs spend an incredible amount of time processing “search algorithms” for graph data. For arithmetic elements in this processing that have a high degree of independence, we do parallelization with parallel computing API’s (application program interfaces) such as OpenMP, greatly reducing processing time. In addition, we are also working to automate analog layout design that is commonly considered to be incredibly difficult, and have realized parallelization with OpenMP for graphic operations processing. We flexibly procure this software technology for parallelization and IT technology for work stations for parallel processing (servers) from our internal Software Department and IT Service Department, helping to accelerate design automation.
Improving Analog Performance: From the bottom and the top
Even if the circuit configuration (circuit topology) is already determined, the properties of semiconductor elements vary greatly depending on the type of semiconductor process (and fab) used, such as CMOS 0.25 μm, 0.18 μm, 90 nm, or 40 nm, often making it necessary to change the design plan. It takes a great deal of time to conduct simulations for the entire circuit to check circuit properties each and every time, and it is not easy to identify which circuit diagram elements do not satisfy specific specifications. Therefore, we simulate individual transistor elements, resistor elements, and capacity elements under various conditions in advance to make a database of various element properties, then make a quantitative design plan based on this. By designing in accordance with the properties of elements, we can predict what impact each element will have on the overall circuit diagram, making it easier to improve analog circuit performance. By using our analysis platform developed in house, we can efficiently make element quality databases and visually check properties.
Digital/Software Cooperative Design: Zynq & Altera SoC
We have found that in conventional design, where digital and software are designed separately and are acceptable as long as the whole system runs when they are connected, we find that it is impossible to exploit the full potential of Zynq and Altera SoC. In other words, we need to understand both digital and software and make technology that combines both of them into a single system to optimize the system overall. However, there are very few engineers in the world who have the skills to do this. Therefore, Sanei Hytechs is developing its own digital/software cooperative design service. We independently create systems that seem likely to be in demand and accumulate technology so that we can provide necessary design services to customers quickly when specific needs arise.
For example, we have developed “image recognition systems,” which have recently experience rapid market growth, in the configuration shown in the following diagram.
In developing such a system, important aspects of the technology include not making every single part ourselves, but using available libraries and IP as much as possible to minimize development man hours (the development period). For example, with this system, we installed the CPU on Linux-OS and ran the software on that OS. By using Linux-OS, we can use open programming libraries such as Qt (a type of GUI library) and OpenCV (a type of image processing library). In addition, Linux-OS comes with various device drivers, so when connecting USB cameras, we can use the USB driver included in Linux-OS, so it is not necessary to prepare a USB interface independently. On the FPGA as well, we acquire and install high speed interfaces such as HDMI which are difficult to realize with software from the Xilinx, Inc. website. However, because HDMI and our image recognition algorithm have different image standards, we make a digital circuit for standard conversion and insert this in front of the HDMI IP. In order to conduct development in this efficient manner, we must have the technology to skillfully combine existing IP and our own circuits.