R&D

Research &
Development

In this technological environment that evolves day by day and hour by hour, we provide the optimal technology for our customer’s needs.

Sanei Hytechs conducts design service R&D. The pillars of our R&D are “Designing AI,” “Pioneering new technology,” and “Streamlining design.”
Here we introduce the cutting edge of our R&D by theme, within the scope that we can introduce at present.

Designing AI: Deep Learning

In order to meet recent market needs, we have developed an “AI design service.” In AI technology, “deep learning” is becoming essential in order to realize relatively complex image recognition, and we are focusing our efforts on deep learning as well. If you wish to use deep learning for business, you may face several realistic hurdles. These include designing appropriate neural networks, designing appropriate machine learning flow, building a machine environment for machine learning, and adding a large volume of correct information. Sanei Hytechs covers all these elements to provide an integrated deep learning service.
 
Here we introduce an example of neural network and machine learning flow design. The following 2 diagrams are an example of using deep learning to make a neural network that detects the center line of roads. The left diagram is the configuration diagram of the designed neural network. It has the shape of a CNN (convolutional neural network) commonly used for image recognition, configured with 3 convolution layers and 2 fully connected layers. In normal CNN’s, there is a pooling layer between them, but here the design omits the pooling layer in order to retain location information as much as possible. The diagram on the right shows the results of passing actual road images into the neural network to detect the center line of the road.
Configuration of a neural network for detecting the center line
Road center line detection results (Displayed with a red line)
In this example, we have conducted machine learning using Chainer, a deep learning platform released as open source by Preferred Networks, Inc. (PFN). The massive calculations involved in machine learning are done using GPU machines on AWS, the cloud service provided by Amazon. Communication with the machine on AWS is done with a highly secure VPN connection, so we can do design in the same type of environment that we have for our machines in house. Sanei Hytechs also offers IT service, so we are skilled in this sort of environment building. Depending on the design work, we sometimes purchase physical machines and install them in house instead of using cloud service. We have experience constantly operating tens of work stations at a time primarily for semiconductor design, so we are also capable of environment building using hardware we have purchased ourselves. Our strength lies in our ability to flexibly and swiftly build machine environments for machine learning.
 
The most difficult of the four realistic hurdles mentioned at the beginning is adding large volumes of correct information. Deep learning belongs to the machine learning category known as “supervised learning,” so it requires training data in order to learn. For example, say that you wish to make an image recognition program that uses deep learning to identify the location of cardboard boxes in pictures taken inside a factory. In that case, human staff would prepare vast numbers of pictures with cardboard boxes in them, and put markers showing the location of the cardboard boxes in those images. The markers added to the images are correct information. The set of images and correct information together form “training data.” This correct information is also known as “ground truth.”
 
Generally, it is said that in order to secure accuracy adequate for practical use, a minimum of 100,000 images with correct information added are required. Depending on the application, tens, hundreds, or thousands of times as much data may be necessary. This is an extraordinary amount, but unfortunately, the work must be done manually. Securing the human resources for this is a major hurdle for realizing deep learning. For this reason, we have formed a correct information production team at our Vietnamese subsidiary Sanei Hytechs Vietnam Co., Ltd. We already have made achievements responding to high volume orders from major corporate customers.

Implementing AI: Edge Computing

In addition to designing AI, we are also developing an AI implementation service. Generally, AI algorithms are designed using machines with high processing power, and AI designers play a role as far as making software that operates on high performance machines. Yet, AI is ultimately expected to be installed on moving devices such as robots and cars. The obvious solution would be to install high performance machines on the moving device itself, but depending on the size and weight of the machine, it can seriously harm the mobility of the device it is installed on, so it is not a realistic answer. There is also the option of connecting the moving device with an external high performance machine wirelessly, but this method has a bottleneck in the form of communication latency, slowing down processing time in the system overall, which may result in a carefully made and very clever AI algorithm only being executable at a speed not viable for practical use at all.

What is needed in this case is to realize part of the AI algorithm on a microcomputer or FPGA. This involves connecting the microcomputer and FPGA with a high performance machine, to process data in real time on the microcomputer or FPGA and process data with a tolerable delay on the high performance machine. This concept, doing processing that was previously performed only on high performance machines on microcomputers and FPGA on the edges of the overall network system, is gaining popularity and Sanei Hytechs has already refined the necessary implementation technology. In order to implement AI algorithms on edge devices, technology is necessary to make lighter algorithms that can be installed on microcomputers or FPGA’s, and appropriately divide content processed between software and digital. Of course, using the devices correctly is also required as basic technology.
Implementing CNN on CPU and FPGA
High Level Synthesis Tool SDSoC®
Sanei Hytechs has made achievements installing the type of neural networks described in the previous section “Designing AI: Deep Learning” on Zynq® from Xilinx, Inc., and is steadily securing implementation technology (For information about development of systems using Zynq, see the section “Digital/Software Cooperative Design: Zynq & Altera SoC”). In order to lighten neural networks to a size that can be installed on target devices including Zynq, we first estimate the resources consumed when computing at the same precision as high performance machines. For example, this includes the necessary memory (resistor size) and the necessary digital circuit scale (DFF and lookup table volume) for realizing the main arithmetic unit. In parallel, we make several estimates before calculating processing time (latency and throughput). CNN format neural networks have many parallelizable product-sum arithmetic units on digital circuits, so there is a trade-off between processing time and parallel circuits (digital circuit scale).
 
Based on our estimated figures, we examine how much resources should be reduced in order to install on the target device. In particular, it is very important what bit width of arithmetic will be used. If digital circuits are parallelized, then arithmetic is normally done with fixed point instead of floating point, so the neural network arithmetic must also be converted to fixed point. We also check the error that arises in neural network output results after converting to fixed point arithmetic. For example, in the case of the neural network that detects the center line in images of roads, we input images of roads into a neural network converted to fixed point arithmetic and have it center line output location information. The error is the difference between these values and the floating point arithmetic results from a high performance machine. We determine the ultimate bit width from the relationship between this error and the trade-off of resources from bit width. However, in order to realize bit width that satisfies specified accuracy, it is sometimes necessary to have an amount of resources that will not fit on the target device. In this case, we return to the AI design phase, revise the neural network configuration, and conduct deep learning again.
 
Once we have an approximation of resources and processing time, we can start concrete implementation. Using high level synthesis tools like SDSoC® from Xilinx, Inc., we produce digital circuits. If the size or latency of these digital circuits is larger than expected, then we design RTL by hand.
 
The concrete implementation flow does not differ significantly from typical systems, but in order to understand the trade-off relationship and make proper judgments, the staff in charge of implementation must have a solid understanding of AI. The decision of whether to return to the AI design phase when we cannot reach the target performance is largely entrusted to the staff in charge of implementation.

Improving Design Efficiency: Advanced Automation

We focus particularly on analog circuit design to improve design efficiency. In addition to automation of typical routine work such as PVT (process, voltage, temperature) condition preference simulations and checking whether or not circuit diagram errors that were conventionally checked visually, we are also working on more advanced automation. While still only partial, we have added judgment functions that simulate the thought process of humans using relatively advanced numerical processing, so we refer to this advanced automation as “Simple AI” (Because it is “simple,” unlike the full scale AI being developed by Google and IBM).
 
For example, we are developing a program that conducts automatic analysis of circuit diagram topology with a search algorithm based on graph theory to categorize transistors on circuit diagrams (MOSFET) as “digital elements” or “analog elements”, and a program that determines circuit types (operational amplifier, etc.). By recognizing patterns in the same way as a human, it can make advanced judgments that are difficult by simply comparing character strings and numbers. We use information gained from this with automatic verification and automatic design programs that we are designing separately. Automation technology enables human engineers to focus on more difficult, more interesting design.
 
Sometimes these programs spend an incredible amount of time processing “search algorithms” for graph data. For arithmetic elements in this processing that have a high degree of independence, we do parallelization with parallel computing API’s (application program interfaces) such as OpenMP, greatly reducing processing time. In addition, we are also working to automate analog layout design that is commonly considered to be incredibly difficult, and have realized parallelization with OpenMP for graphic operations processing. We flexibly procure this software technology for parallelization and IT technology for work stations for parallel processing (servers) from our internal Software Department and IT Service Department, helping to accelerate design automation.

Improving Analog Performance: From the bottom and the top

We are in the process of building a design environment for improving the performance of analog circuits from both bottom-up and top-down perspectives. From a bottom-up perspective, we are developing an analysis platform for finding the properties of devices such as CMOS transistor elements and resistor elements.
 
Even if the circuit configuration (circuit topology) is already determined, the properties of semiconductor elements vary greatly depending on the type of semiconductor process (and fab) used, such as CMOS 0.25  μm, 0.18  μm, 90 nm, or 40 nm, often making it necessary to change the design plan. It takes a great deal of time to conduct simulations for the entire circuit to check circuit properties each and every time, and it is not easy to identify which circuit diagram elements do not satisfy specific specifications. Therefore, we simulate individual transistor elements, resistor elements, and capacity elements under various conditions in advance to make a database of various element properties, then make a quantitative design plan based on this. By designing in accordance with the properties of elements, we can predict what impact each element will have on the overall circuit diagram, making it easier to improve analog circuit performance. By using our analysis platform developed in house, we can efficiently make element quality databases and visually check properties.
Element Properties Database
Visualization of MOS properties (web browser)
From a top-down perspective, we conduct environment building for system design tools such as Matlab® and Simulink®. By conducting system level simulations and finding the performance trade-off relationship before making concrete circuit diagrams, we can improve analog performance as much as possible. As a result of clarifying the trade-off relationship, we sometimes find that with conventional circuits with would be difficult to realize the performance that our customers require. However, by understanding the level of difficulty of design as soon as possible, we can focus our design power on problem solving and make a new circuit system in a limited design period. In R&D initiatives, we are working to spread knowledge of how to use tools and build a reusable library so that this kind of system level design can be done easily by analog designers. In addition, we regularly gather information on and actively introduce new tools.
System Level Design (Simulink®)
Checking the trade-off relationship

Digital/Software Cooperative Design: Zynq & Altera SoC

Recently, LSI devices with FPGA and CPU installed have become more common, such as Zynq® from Xilinx, Inc. and Altera SoC from Altera, Inc. It is possible to do digital design for FPGA and do software design for CPU to operate digital and software as a single unified system. The advantage of using a single chip for FPGA and CPU, which are conventionally put on separate chips, is not simply to have a more compact design with less installation area. Normally, communication between the FPGA and CPU must go outside of their respective chips and travel through the circuit board, but communication speed is greatly improved because they are connected within a single chip on Zynq and Altera SoC. As a result, systems that realize a single function by combining digital and software have greater potential to eliminate the typical bottleneck of slow communication speed and greatly improve the processing speed of the system overall.
 
We have found that in conventional design, where digital and software are designed separately and are acceptable as long as the whole system runs when they are connected, we find that it is impossible to exploit the full potential of Zynq and Altera SoC. In other words, we need to understand both digital and software and make technology that combines both of them into a single system to optimize the system overall. However, there are very few engineers in the world who have the skills to do this. Therefore, Sanei Hytechs is developing its own digital/software cooperative design service. We independently create systems that seem likely to be in demand and accumulate technology so that we can provide necessary design services to customers quickly when specific needs arise.
 
For example, we have developed “image recognition systems,” which have recently experience rapid market growth, in the configuration shown in the following diagram.
We implemented this system on the Zynq evaluation board known as Zedboard® (Made by AVNET). By processing the basic part of the image recognition algorithm on software and splitting part of processing to digital circuits, we have accelerated processing overall. We operate software on the CPU inside Zynq, and operate the digital circuit on the FPGA inside Zynq. The key to good design here is understanding the algorithm, then considering how what parts to do with software and what parts to do with digital circuits, optimizing the system overall and maximizing performance.
 
In developing such a system, important aspects of the technology include not making every single part ourselves, but using available libraries and IP as much as possible to minimize development man hours (the development period). For example, with this system, we installed the CPU on Linux-OS and ran the software on that OS. By using Linux-OS, we can use open programming libraries such as Qt (a type of GUI library) and OpenCV (a type of image processing library). In addition, Linux-OS comes with various device drivers, so when connecting USB cameras, we can use the USB driver included in Linux-OS, so it is not necessary to prepare a USB interface independently. On the FPGA as well, we acquire and install high speed interfaces such as HDMI which are difficult to realize with software from the Xilinx, Inc. website. However, because HDMI and our image recognition algorithm have different image standards, we make a digital circuit for standard conversion and insert this in front of the HDMI IP. In order to conduct development in this efficient manner, we must have the technology to skillfully combine existing IP and our own circuits.
Example of image recognition algorithm
(circle detection)
Zynq evaluation board “Zedboard®”
(AVNET, Inc.)