# Silicon IP Cores

for your FPGA or ASIC design



# **Product Catalog**

September 2015



ALMA TECHNOLOGIES® SILICON IP CORES 2015 PRODUCT CATALOG

Copyright © 2015 ALMA TECHNOLOGIES S.A. All rights reserved.

XILINX and ARTIX are registered trademarks of Xilinx, Inc. ALTERA, Avalon-ST and CYCLONE are registered trademarks of Altera Corporation. All other trademarks and trade names are the property of their respective owners.



Addressing the ever-increasing resolution and frame rate needs of the imaging and videography industries, and driven by the increasing number and broadening range of new applications that need to work with these new massive pixel rates, Alma Technologies launched a separate product line of Ultra High Throughput - UHT<sup>™</sup> - image and video compression IP. The large amount of data produced by those applications makes compression even more significant, while the silicon requirements still need to remain within realistic and reasonably available levels. This new UHT series of IP is designed to be able to cope with the very high throughput and to offer uncompromised image quality in mainstream and highly cost effective FPGA and ASIC silicon options.



## **UHT™ JPEG Encoding / Decoding IP Cores**

The Ultra High Throughput – UHT<sup>™</sup> – JPEG series of IP is designed to enable the massive pixel rates of 4K/8K resolutions and high frame rate video applications in highly cost-effective FPGA and ASIC technologies. The key advantage of using JPEG compression is the simplicity of the resulting implementation, which keeps system costs to a minimum even when multiple parallel processing engines are used to speed-up processing. JPEG is capable of visually lossless compression in low compression ratios, while retaining a very good image quality up to medium compression ratios. In addition, the Constant Bitrate feature of the Alma Technologies UHT JPEG encoder, permits minimal memory buffering in the decoder side, enabling sub-frame latency in a live video system with compression, transmission and decompression. Applications such as video surveillance, operating remote machinery over video feedback, ultra high speed video recording and playback, or upgrading an available transmission channel from Full HD to Ultra HD video, can benefit from using the UHT JPEG encoder – decoder technology.

### Scalable and Transparent Parallel Processing

Powered by multiple internal processing engines, the UHT Image and Video Compression IP Cores bring all the speed needed today in the 4K/8K Ultra HD and Ultra High Frame Rate range of applications through their scalable and transparent parallel architectures. Each input image or video frame is first split internally into chunks and each chunk is then assigned to one of the multiple available internal compression units. This is done in a way which is totally transparent to the system utilizing the IP, as if a single encoding or decoding instance was used. Using always a single uncompressed data interface and a single, standalone, ready-to-use and standard compliant compressed stream interface, the UHT series of IP abstract all the parallelization complexity from the system.

### Silicon Resources Versatility

Packed with configurable expanded features and designed for silicon speed versus size scalability and versatility, the UHT series of IP also optimizes resource usage by leveraging, wherever possible, resource sharing among the multiple internal compression units. The number of maximum available internal compression units is configurable before synthesis, resulting the best match between throughput requirements and the corresponding silicon resources footprint, on a given target implementation technology. For example, increasing the number of units enables the encoding of Ultra HD 4K/8K video even in low-end FPGA devices. Decreasing the number of units when higher speed silicon is used, achieves the same using a lot less resources.

## UHT-JPEG-E Ultra High Throughput Baseline and Extended JPEG Encoder



The UHT-JPEG-E IP core is a standalone high performance JPEG encoder, designed for enabling ultra high frame rate SD and HD encoding, and Ultra HD video encoding (4K/8K and beyond), even in low-end ASIC or FPGA silicon. The UHT-JPEG-E core implements an advanced 8-bit Baseline and 12-bit Extended Sequential DCT JPEG encoder, compliant to the ITU T.81 and ISO/IEC 10918-1 standards. The core supports encoding of 4:4:4, 4:2:2, 4:2:0 and 4:0:0 (grayscale) video streams, in 8, 10 or 12 bits sample depths.

UHT-JPEG-E is very easy to use and integrate in a system, requiring minimal host intervention as it only needs to be programmed once per video sequence. Once programmed, encoding an arbitrary number of video frames is initiated without the need of any further intervention or assistance by the host system CPU.

UHT-JPEG-E accepts the uncompressed raw video data in interleaved scan format and outputs standalone, standard compliant, JPEG byte stream format. No post processing on the output stream other than, e.g., saving, muxing, or transmitting, is required by the host. The output JPEG byte stream can be decoded as is, by any corresponding ISO/IEC 10918-1 JPEG compliant decoder.

The UHT-JPEG-E core implements a simple yet flexible request based external memory interface with independent read and write data paths. The external memory may be an off-chip or an on-chip memory. This makes the UHT-JPEG-E independent of memory type, supporting most common types of memory. Simple add-on bridges for glue-less connection to many standard FPGA external memory controllers, or to on-chip memory, are also available. UHT-JPEG-E is designed to be tolerant to memory delays and latencies, which may be present on shared memory system architectures.

### **Features**



## UHT-JPEG-D Ultra High Throughput Baseline and Extended JPEG Decoder



The UHT-JPEG-D IP core is a standalone high performance JPEG decoder, designed for enabling ultra high frame rate SD and HD decoding, and Ultra HD video decoding (4K/8K and beyond), even in low-end ASIC or FPGA silicon. The UHT-JPEG-D core implements an advanced 8-bit Baseline and 12-bit Extended Sequential DCT JPEG decoder, compliant to the ITU T.81 and ISO/IEC 10918-1 standards. The core supports decoding of 4:4:4, 4:2:2, 4:2:0 and 4:0:0 (grayscale) video streams, in 8, 10 or 12 bits sample depths.

UHT-JPEG-D accepts the standalone, standard compliant JPEG byte stream format generated by the Alma Technologies UHT-JPEG-E IP core, or other compatible JPEG byte stream. The decoded raw video data to the output is in interleaved raster scan format.

UHT-JPEG-D is very easy to use and integrate in a system, requiring minimal host intervention as it only needs to be programmed once per video sequence. Once programmed, decoding of an arbitrary number of video frames is initiated without the need of any further intervention or assistance by the host system CPU.

The UHT-JPEG-D core implements a simple yet flexible request based (optional) external memory interface with independent read and write data paths. The decoder can also be configured to use only on-chip memory. This makes the UHT-JPEG-D independent of memory type, supporting most common types of memory. When using an external memory device, simple add-on bridges for glue-less connection to many standard FPGA external memory controllers are also available. UHT-JPEG-D is designed to be tolerant to memory delays and latencies, which may be present on shared memory system architectures.

### **Features**





Alma Technologies offers one of the finest hardware H.264 implementations available today as a silicon IP core. Fine-tuned for optimum performance, both in terms of pixel throughput and image quality, the H.264 IP cores perform exceptionally well in a variety of demanding video content situations.

The H.264 IP cores are remarkably easy to operate, ready to encode video right out of the box, designed for easy integration in customer designs with simple and comprehensive programming and input/output interfacing.



## H.264 Encoding IP Cores

The H.264 Encoding family of our silicon Intellectual Property Cores offers standalone and full hardware implementations of the ITU-T H.264 specification. The family includes Baseline, Main and High profile encoders for real time encoding of video streams up to Level 5.2.

Our H.264 encoders require minimal host intervention as they only need to be programmed once per video sequence. Once programmed, an arbitrary number of video frames can be encoded without needing any further intervention or assistance from the host system CPU (CPU-less operation).

## Ease of Use



The H.264 encoders use very simple interfaces to communicate with the external host system logic and are setup by means of a simple control register set. The default values for most parameters may be pre-configured to custom specifications. Input and output data interfaces support continues streaming while also allowing the external logic to fully control the flow of data. A generic memory controller interface is used, supporting a variety of available memory controllers.

Compressing video is as simple as that: connect an external memory controller with the required amount of memory and then program the source video parameters - if different than the default values - using the control registers of the core. Following that, the core is ready to accept and compress the input video stream, producing a compliant H.264 Annex B NAL Byte stream on the output. The external application may monitor the core's status by probing the status registers of the encoder.

### **High Performance**

The H.264 Encoding IP Cores are designed to achieve high throughput with lower clock frequencies. At the same time, a big effort was made to keep the core's silicon area requirements limited so that the core can fit and process Full HD video even in low-end FPGA devices. As an example, the core is capable of encoding Full HD 1080p30 video on the Altera Cyclone III or on the Xilinx Artix-7 devices. In ASIC implementations the core is capable to exceed the 120 fps mark for Full HD video.



### **Specifications**

The H.264 Encoding IP Cores can accept the uncompressed input video stream in planar, interleaved, or macroblock scan format. The output is standard compliant Annex B NAL Byte stream format. No post processing on the output stream, other than saving or transmitting, is required by the host. The output NAL Byte stream can be decoded, as is, by any ITU-T H.264 compliant decoder that satisfies the level requirements of the stream and conforms to the respective ITU-T H.264 profile.

The H.264 encoding cores implement a simple – yet flexible – request based external memory interface with independent read and write data paths. This makes the cores independent of memory type, supporting for example operation with SRAM, SDRAM, DDR, DDR2 or DDR3 types of memory. Glue-less connected external memory controllers are also available. The encoders are designed to be tolerant to memory delays or latencies, which may be present on shared memory system architectures.

### **Features**

#### Advanced H.264 Implementation

- High throughput implementation: Sustained 2.5 clock cycles per pixel worst case processing rate.
- Superior compression and video quality from QCIF to HD resolutions.
- CQP VBR encoding mode:
  - Rate-Distortion optimized output.
  - Up to 240 MBits/s output.
- CBR encoding mode:
  - HRD CPB compliant CBR NAL output.
  - Sub-frame operation with tunable number of macroblocks basis.
  - Further micro adjustment of quantization per macroblock maximizes the perceived video quality.
- Based on run-time continuous self-trained models for excellent adaptability to spatial and temporal video variations.
- $\circ\,$  Tunable and independent to video GOP size operation, enables video system latency control with respect to initial
- o On-the-fly rate changes are supported.
- Up to 240 MBits/s output.
- Full search, variable block size, sub-pixel motion estimation engine:
  - $\circ\;$  Significantly more immune to video content than partial searches.
  - Eliminates local minima traps.
  - 32x20 or higher search area around the partition origin.
- Up to 4 motion vectors per macroblock.
- $\circ~$  Full, half and quarter pixel accuracy.
- Single reference frame.
- Sophisticated block skipping logic enables advanced low bit rate encoding with minimized motion artifacts.
- Advanced Intra prediction: All prediction modes supported.
- Error resilience.
- Optional advanced thresholding of quantized transform coefficients.
- · Run-time tunable operation enables decoder compatibility trade-offs.

#### Smooth System Integration

- Full abstraction of the internal implementation details and the H.264 complexity from the top level I/O and its functionality.
- Simple, microcontroller like, programming interface.
- High speed, flow controllable, streaming I/O data interfaces:
  - Simple and FIFO like.
  - $\circ\;$  Full data flow control without extra clock cycles penalty.
  - Avalon-ST<sup>™</sup> compliant (ready latency 0).
- Registered I/O ports.
- Low native encoding latency:
  - $^{\circ}\,$  In macroblock scan video input the encoding latency is approximately 1280 video pixels.
  - $\,\circ\,$  In interleaved scan video input the encoding latency is approximately 16 video lines.
  - $\circ\;$  In planar scan video input the encoding latency is approximately one frame.
- Low requirements in external memory bandwidth.
- Flexible external memory interface:
  - o Independent of external memory type.
  - Tolerant to latencies.
  - Allows for shared memory access.
  - Can optionally operate on independent clock domain.

## H.264 Encoding IP Cores Selection Matrix

| - UI                                                    | H264-BP-E               | H264-BPI-E              | H264-MP-E | H264-MPI-E   | H264-HP-E                                      | H264-HPI-E                                            |
|---------------------------------------------------------|-------------------------|-------------------------|-----------|--------------|------------------------------------------------|-------------------------------------------------------|
| Profile                                                 | Constrained<br>Baseline | Constrained<br>Baseline | Main      | Main         | High 10<br>High 4:2:2<br>High 4:4:4 Predictive | High 10 Intra<br>High 4:2:2 Intra<br>High 4:4:4 Intra |
| Chroma format(s)                                        | 4:2:0                   | 4:2:0                   | 4:2:0     | 4:2:0        | 4:2:0, 4:2:2                                   | 4:2:0, 4:2:2                                          |
| Sample depth (bits)                                     | 8                       | 8                       | 8         | 8            | 8, 10, 12                                      | 8, 10, 12                                             |
| Slice type(s)                                           | IDR, P                  | IDR                     | IDR, P    | IDR          | IDR, P                                         | IDR                                                   |
| Multiple slices<br>encoding option                      | $\checkmark$            | ✓                       | ~         | $\checkmark$ | ~                                              | ✓                                                     |
| CAVLC                                                   | $\checkmark$            | ✓                       | ✓         | $\checkmark$ | ✓                                              | ✓                                                     |
| CABAC                                                   | -                       | -                       | ✓         | ✓            | ✓                                              | ✓                                                     |
| Rate control<br>(CBR & CQP-VBR)                         | ~                       | ✓                       | ~         | ✓            | ✓                                              | ✓                                                     |
| Separate Luma /<br>Chroma QP control                    | ~                       | ~                       | ~         | ✓            | ✓                                              | ✓                                                     |
| Ultra Low sub-frame<br>transmission latency<br>encoding | ~                       | ✓                       | ~         | ✓            | ~                                              | ✓                                                     |
| Annex B NAL Byte<br>stream compliant<br>output          | ✓                       | ✓                       | ~         | ✓            | ✓                                              | ✓                                                     |
| AVC-Intra 50 / 100 /<br>Ultra suitability               | _                       | -                       | -         | -            | ✓                                              | ✓                                                     |
| Standalone<br>CPU-less operation                        | ✓                       | ✓                       | ~         | ✓            | ✓                                              | ✓                                                     |
| Throughput<br>clocks/pixel)                             | 2.5                     | 2.5                     | 2.5       | 2.5          | 2.5 - 2.75                                     | 2.5 - 2.75                                            |
| Silicon<br>requirements                                 | high                    | medium                  | high      | medium       | high                                           | medium                                                |
| Available for<br>ASICs                                  | ✓                       | ✓                       | ~         | ✓            | ~                                              | ✓                                                     |
| Available for<br>Altera FPGAs                           | ✓                       | ✓                       | ~         | ✓            | ✓                                              | ✓                                                     |
| Available for<br>Lattice FPGAs                          | ✓                       | ✓                       | ~         | ✓            | ✓                                              | ✓                                                     |
| Available for<br>Microsemi FPGAs                        | ✓                       | ✓                       | ~         | ✓            | ~                                              | ✓                                                     |
| Available for<br>Kilinx FPGAs                           | ✓                       | ✓                       | ~         | ✓            | <                                              | ✓                                                     |

🖌 : Supported

- : Not Applicable



A complete series of IP Cores covering almost every application requiring image compression is available.

All our still image compression IP Cores have been carefully crafted to offer maximum image quality and performance, while maintaining efficiency and simplicity during the integration.

Still image compression algorithms are equally important for video compression applications. Applications sensitive to latency, throughput, power or silicon area, can benefit from the efficiency of the image compression algorithm implementations we offer.



## **Still Image Compression IP Cores**

Alma Technologies offers a complete family of still image compression IP Cores, covering almost every application requiring image compression. Most of the well-established still image compression algorithms are implemented, such as the JPEG, the JPEG 2000, Lossless JPEG and JPEG-LS.

The following selection matrix can help in deciding which still image compression scheme is best suited for each application.

## Still Image Compression Selection Matrix

|                                       | Baseline<br>JPEG | Extended<br>JPEG | Lossless<br>JPEG      | JPEG-LS               | JPEG 2000    |
|---------------------------------------|------------------|------------------|-----------------------|-----------------------|--------------|
| Lossy compression                     | ✓                | $\checkmark$     | <b>X</b> <sup>1</sup> | <b>X</b> <sup>1</sup> | $\checkmark$ |
| Numerically lossless compression      | ×                | ×                | ✓                     | ✓                     | ✓            |
| Lossy compression efficiency          | very good        | very good        | -                     | -                     | excellent    |
| Lossless compression efficiency       | -                | -                | good                  | excellent             | excellent    |
| Maximum bits per sample               | 8                | 12               | 16                    | 16                    | 16           |
| Grayscale                             | ✓                | ✓                | ✓                     | ✓                     | ✓            |
| Color                                 | ✓                | ✓                | ✓                     | ✓                     | ✓            |
| Rate control                          | ٥                | 0                | ×                     | ×                     | $\checkmark$ |
| Multiple quality layers               | ×                | ×                | ×                     | ×                     | ٥            |
| Region of interest                    | ×                | ×                | ×                     | ×                     | ٥            |
| Standalone CPU-less operation         | ✓                | ✓                | ✓                     | ✓                     | ✓            |
| Requires external memory <sup>2</sup> | no               | no               | no                    | no                    | yes          |
| Silicon requirements                  | low              | low              | very low              | low                   | high         |
| Available for ASICs                   | ✓                | $\checkmark$     | $\checkmark$          | $\checkmark$          | $\checkmark$ |
| Available for Altera FPGAs            | $\checkmark$     | ✓                | ✓                     | ✓                     | $\checkmark$ |
| Available for Lattice FPGAs           | $\checkmark$     | ✓                | ✓                     | ✓                     | $\checkmark$ |
| Available for Microsemi FPGAs         | ✓                | $\checkmark$     | ✓                     | ✓                     | ✓            |
| Available for Xilinx FPGAs            | ✓                | ✓                | ✓                     | ✓                     | ✓            |



The JPEG cores provide an efficient and low cost image compression / decompression solution based on the widely adopted JPEG image compression standard.

A full range of JPEG IP core solutions is available, combining high performance with area efficient architectures and support for Motion JPEG streaming with Rate Control. The JPEG cores are standalone; no extra software pre-processing or post-processing is required. Additional functions that may be needed, such as Color Space Conversion or Raster to Block Scan Converter, are also available.

### **Features**

#### Baseline ISO/IEC 10918-1 JPEG Compliance

- Programmable Huffman Tables (two DC, two AC).
- Programmable Quantization Tables (up to four).
- Up to four color components.
- Supports all possible scan configurations and all JPEG formats for input and output data.
- Supports any image size up to 64K x 64K.
- Supports DNL and restart markers.

#### Additional Processing Capabilities

- Motion JPEG payload support.
- Video mode Rate-Control engine (optional).

#### Ease of Integration

- Registered I/O ports.
- Simple, microcontroller like, programming interface.
- High speed, flow controllable, streaming I/O data interfaces:
  - Simple and FIFO like.
  - Avalon-ST<sup>TM</sup> compliant (ready latency 0).
- Decoding:
- Stand alone operation.
- Automatic self-programming by JPEG markers parsing.
- Marker errors catching.
- Broadcasting of decoded image parameters for controlling peripherals.
- Encoding:
  - Single clock per input sample processing rate.
  - $\circ\;$  Fully programmable through standard JPEG marker segments.
  - $\circ~$  Automatic JPEG markers generation on the output.
  - Automatic program-once encode-many operation.

Our JPEG implementations are fully compliant with the Baseline or the Extended Sequential DCT mode of the ISO/IEC 10918-1 JPEG standard. This makes the JPEG cores ideal for interoperable systems and devices such as digital cameras, camcorders, office automation equipment, medical imaging systems, video conference systems and remote surveillance systems.

In addition to generating standalone JPEG streams, the JPEG encoding cores are also capable of producing the (de facto) standard video payload of many Motion JPEG container formats (e.g. USB Video Class devices). Furthermore, bandwidth constrained applications will benefit from our unique and advanced Rate Control engine.

All cores feature easy-to-use, fully controllable and FIFO-like streaming input and output interfaces. Being carefully designed, rigorously verified and multiple times siliconproven, the JPEG cores offer a reliable and easy to integrate solution.



## JPEG-E, JPEG-E-X

Baseline JPEG Encoder, Extended JPEG Encoder



The JPEG-E and JPEG-E-X IP cores are standalone high performance JPEG encoders for still image and video compression applications.

The cores are able to encode at Full HD (1080p30) or higher rates, even in low-end FPGA devices. The JPEG-E-X encoder additionally supports up to 12 bits sample depth, making it well suited for high dynamic range applications such as professional video, machine vision and medical applications.

The cores include optionally an advanced Rate Control engine which provides important flexibility for video applications with constrained bandwidth specifications.

Besides generating standalone Baseline or Extende JPEG streams, the cores are also capable of producing the (de facto) standard video payload of many motion JPEG container formats. In addition, the JPEG-E-X IP produces JPEG streams that are also conforming to the Digital Imaging and Communications in Medicine (DICOM) requirements.

## JPEG-D, JPEG-D-X Baseline JPEG Decoder, Extended JPEG Decoder



The JPEG-D and JPEG-D-X cores are standalone high performance JPEG decoders for still image and video decompression applications.

The JPEG-D / JPEG-D-X cores can decode at Full HD (1080p30) or higher rates, even in FPGA devices. The JPEG-D-X decoder additionally supports up to 12 bits sample depth decoding.

Besides decoding standard compliant Baseline and Extended JPEG streams, the cores are also capable of decompressing the video payload of many (de facto) standard motion JPEG container formats.





The JPEG-C core is a standalone high performance, half-duplex Baseline JPEG codec for still image and video compression applications.

The JPEG-C can operate at Full HD (1080p30) or higher rates, even in FPGA devices.

The core includes optionally an advanced Rate Control engine which provides important flexibility for video applications with constrained bandwidth specifications.



## **Lossless JPEG IP Cores**

We offer Lossless JPEG coding solutions suitable for applications that need to compress images and reproduce them without any loss of quality. In fact, reproduced images are bit by bit identical to the original uncompressed images. The Lossless JPEG IP Cores are ideal for medical imaging applications, high-end digital photography or professional video equipment applications.

The Lossless coding mode of JPEG is an efficient light image compression algorithm which achieves good compression results, without the complexity overhead of more sophisticated solutions such as the JPEG 2000. Lossless JPEG was added to the ITU-T JPEG recommendations in 1995 (ISO/IEC 10918-1 standard, ITU T.81 recommendation).

### **Features**

#### ISO/IEC 10918-1 Compliance

- Conforms to the spatial (sequential) lossless encoding mode (SOF3), of the ISO/IEC 10918-1 standard (CCITT T.81 recommendation).
- Standalone operation:
  - Pixel samples input / output.
  - $\circ~$  Standalone ISO/IEC 10918-1 compliant JPEG stream input / output.
- Easily programmable through standard JPEG markers stream.
  - Programmable image dimensions.
  - $\circ~$  Programmable image sample precision (2 16 bits/sample).
  - $\circ~$  Up to 4 programmable Huffman tables.
  - $\circ~$  Programmable Restart Interval.
  - $\circ~$  Programmable Point Transform function.
  - $\circ~$  Programmable APPn and COM markers.
- Programming errors catch-up features.

#### Limitations

- Up to three image components are supported (Nf field of the SOF3 marker segment = 1 or 2 or 3).
- Single scan encoding
  - (Only one SOS marker segment, with Ns field = Nf).
- No DNL marker insertion (Y field of the SOF3 marker segment > 0).
  Fixed parameters:
  - No sub-sampling (Hi and Vi fields of the SOF3 marker segment = 1).
     Prediction function is fixed to left-hand predictor (predictor 1) (Ss field of SOS marker segment = 1).

The LJPEG cores implement Lossless JPEG compression / decompression in a compact, high-performance, stand-alone package ideal for applications where bit-by-bit accurate reproduction of an image is essential.

The LJPEG cores conform to the spatial (sequential) lossless encoding mode (SOF3) of the ISO/IEC 10918-1 standard (ITU T.81 recommendation). Rather than the Discrete Cosine Transform (DCT) functions used in the lossy JPEG compression - which can introduce round-off errors - the LJPEG cores employ a reversible predictor function as described in the specification. Encoding and decoding images with no information loss is thus possible, while also requiring a significantly smaller silicon footprint than the standard - lossy JPEG implementations.

The cores are full, standalone, CPU-less solutions and feature easy-to-use fully controllable and FIFO-like streaming input and output interfaces. Being carefully designed, rigorously verified and silicon-proven, the LJPEG cores offer a reliable and easy to integrate solution.





The LJPEG-E core implements the Lossless JPEG compression in a compact, high-performance, stand-alone package, ideal for applications where bit-by-bit accurate reproduction of an image is essential.

Designs show that the core requires just 21K equivalent NAND2 gates in an ASIC (90 nm process under typical conditions) and that it also fits in a variety of low-end and high-end FPGA devices. Its heavily optimized architecture enables also a very high throughput, reaching 500 MSamples/s on ASICs and 200 MSamples/s on FPGA devices.





The LJPEG-D core implements a Lossless JPEG decoder in a compact, high-performance, standalone package, ideal for applications where bit-bybit accurate reproduction of an image is essential.

Designs show that the core requires just 36K equivalent NAND2 gates in an ASIC (90 nm process under typical conditions) and that it also fits in a variety of low-end and high-end FPGA devices. Its optimized architecture enables also a high throughput, reaching 250 MSamples/s on ASICs and 95 MSamples/s on FPGA devices.



## **JPEG-LS IP Cores**

We offer JPEG-LS coding solutions suitable for applications that need numerically lossless or near-lossless image compression, such as professional video equipment, machine vision, medical and satellite imaging devices. The JPEG-LS is an excellent lossless image compression algorithm. It achieves results highly competitive to those offered by far more complex solutions, such as the JPEG 2000. The JPEG-LS cores are compliant to the ISO/IEC 14495-1 standard.

### Features

#### ISO/IEC 14495-1 JPEG-LS Compliance

- Programmable image dimensions from 8 x 8 up to 64K x 64K.
- Grayscale, 4:4:4, 4:2:2, 4:1:1 and 4:2:0 chroma subsampling formats.
- Programmable sample precision (BPP) from 2-bits up to 16-bits.
- Programmable point transform (up to BPP-1).
- Programmable local gradient thresholds (up to 2xBPP-1) and context parameters reset threshold value (up to 127).
- Header error catch-up features.

#### Ease of Integration

- Registered I/O ports.
- Simple, microcontroller like, programming interface.
- High speed, flow controllable, streaming I/O data interfaces:
   Simple and FIFO like.
- Avalon-ST<sup>™</sup> compliant (ready latency 0).
- Single clock per input sample processing rate.
- Fully programmable through standard JPEG-LS marker segments.
- Automatic JPEG-LS markers generation on the output.
- Automatic program-once encode-many operation.





JPEG-LS was developed to provide a low complexity lossless image compression specification with better compression potential than Lossless JPEG. The algorithm at the core of JPEG-LS is LOCO-I (Low Complexity - Lossless Compression for Images). It uses a non-linear predictive scheme with rudimentary edge detecting capability, based on the four nearest - causal - neighbors (left, upper left, upper and upper right) and an entropy encoder which uses adaptively selective Golomb-type codes. The low complexity scheme of JPEG-LS is based on the assumption that prediction residuals follow a two-sided geometric distribution and the fact that Golomb-codes are optimal for geometric distributions, thus the modeling and coding units are matching.

The cores are full, standalone, CPU-less solutions and feature easy-to-use fully controllable and FIFO-like streaming input and output interfaces. Being carefully designed, rigorously verified and silicon-proven, the JPEG-LS cores offer a reliable and easy to integrate solution.

The JPEGLS-E core is a JPEG-LS encoder that forms a high performance solution for lossless image and video compression applications. The JPEGLS-E can encode at Full HD (1080p30) or higher rates, even in FPGA devices. Full compliance to the ISO/IEC 14495-1 standard makes the JPEG-LS encoder core ideal for any open platform application where interoperability is a critical factor.

The JPEGLS-E core is available in two configurations: The true numerically lossless only encoding configuration offers the highest throughput, while the numerically-lossless and near-lossless encoding configuration offers the maximum application flexibility.



## JPEG 2000 IP Cores

Alma Technologies was one of the first companies worldwide to offer a hardware JPEG 2000 IP Core solution back in 2002. Today, the latest generation of our JPEG 2000 encoder is one of the more capable and flexible JPEG 2000 solutions available. Several objectives were set and achieved during the core's design: high performance, scalability, robustness and ease of integration.

### Features

#### ISO/IEC 15444-1 JPEG 2000 Image Coding System compliance

- Grayscale and three or four component color images.
- 4:4:4, 4:2:2, 4:1:1 and 4:2:0 chroma subsampling formats.
- Up to 16-bits per component sample precision.
- Up to 65535 x 65535 image resolution.
- Up to 8192 x 8192 tile resolution.
- Lossless or lossy compression.
- Advanced rate control engine.
- Single or multiple quality layers encoding.
- CPRL progression order.
- LRCP progression order (grayscale only).
- Error resilient encoding features.
- Standard compliant code stream (JPC) or file (JP2) output.

#### Programmable JPEG 2000 Encoding Options

- Image and pixel input format (frame and tile size, number of components, pixel depth, subsampling format and scan order).
- Wavelet filter type (5/3 or 9/7) and number of wavelet transform levels.
- Code-block size (64 or 32 or 16 on each dimension).
- Quantization tables.
- Entropy coding switches (reset, restart, segmark).
- Number of quality layers (grayscale only, up to 30).
- Output bitrate (per quality layer).
- Progression order and JPEG 2000 output format (proprietary or JPC or JP2).
- Operation on an entire image, or on tile-component, or on tile basis.

#### Ease of Integration

- Registered I/O ports.
- Simple, microcontroller like, programming interface.
- High speed, flow controllable, streaming I/O data interfaces:
  - Simple and FIFO like.
  - $\circ\;$  Full data flow control without extra clock cycles penalty.
  - Avalon-ST<sup>™</sup> compliant (ready latency 0).
- Single port external memory interface compatible with most (multi-port) memory controllers.
- Tunable design, enables silicon, frequency and throughput trade offs.
- Flexible external memory interface:
  - Independent of external memory type.
  - $\circ~$  Glue-less connection to external memory controllers.
  - Tolerant to latencies.
  - Allows for shared memory access.
  - $\circ\;$  Can optionally operate on independent clock domain.

Full compliance to the ISO/IEC 15444-1 JPEG 2000 standard makes the JPEG2K-E core ideal for interoperable systems and devices. The JPEG2K-E supports up to 8Kx8K image or tile resolution and it can sustain a very high, over 1080p30, throughput. The JPEG2K-E includes also an advanced post-compression, rate-distortion optimized, rate control engine which provides full control over the required bandwidth on the JPEG 2000 stream output. The bitrate of the JPEG 2000 stream can be accurately adjusted while, at the same time, preserving the maximum image fidelity that is possible within the available bandwidth constraints.

The JPEG2K-E core implements a simple but yet flexible, requests based, external memory interface with independent read and write data paths. This makes the JPEG2K-E independent of memory type supporting, for example, operation with SRAM, SDRAM, DDR, DDR2 and DDR3 types of memory. Glue-less connected external memory controllers are also available. JPEG2K-E is designed to be tolerant to memory delays and latencies, which may be present on shared memory system architectures.

The core is designed with easy to use, fully controllable and FIFO-like, streaming input and output interfaces. Being carefully designed, rigorously verified and silicon-proven, the JPEG2K-E is a reliable and easy to integrate core.



The JPEG2K-E core is a video and still image encoder that implements the JPEG 2000 lossy and lossless image compression standard. The JPEG 2000 standard offers an advanced quality and feature set, lending itself to a wide range of uses from digital cameras through to space imaging and other key sectors.





Alma Technologies offers a series of cryptographic functions, suitable for encrypting or decrypting a data stream or message, as well as authenticating the encrypted or decrypted data. Highly efficient implementations of Advanced Encryption Standard (AES) are included, and implementations of Hash functions such as the SHA256, SHA-1 and MD5.

The implementation of the AES is highly flexible and also implements the most commonly used Block Cipher Modes of operation, ECB, CBC, CFB, OFB, CTR and the GCM Authenticated Encryption / Decryption mode.



The product family of our AES IP cores includes implementations of the most commonly used Block Cipher modes of operation in two different bitwidths which affect the footprint and the throughput of the solution.

The following selection matrix illustrates what is featured in each IP core variation and can be used as a guide to select the best suited product for a given application.

### **AES Products Selection Matrix**

| ✓<br>×<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>×<br>128<br>11/13/15 <sup>2</sup>              | ✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓ | ✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓      | ✓<br>-<br>✓<br>✓<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>2<br>128 |
|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| ✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓ | ×<br>×<br>•<br>•<br>×<br>×<br>×<br>×<br>32                                                  | ・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・<br>・                               | -<br>-<br>-<br>-<br>-                                                               |
| •<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•                          | ●<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓                          | •<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•                               | -<br>-<br>-<br>-<br>-                                                               |
| •<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•                          | ●<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>✓                          | •<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•<br>•                               | -<br>-<br>-<br>-<br>-                                                               |
| ✓<br>✓<br>✓<br>✓<br>✓<br>✓<br>×<br>128<br>11/13/15 <sup>2</sup>                             | ×<br>×<br>×<br>×<br>32                                                                      | マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ<br>マ      | ✓<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>-<br>28                 |
| ✓<br>✓<br>✓<br>×<br>128<br>11/13/15 <sup>2</sup>                                            | ✓<br>✓<br>✓<br>✓<br>×<br>32                                                                 | マ<br>マ<br>マ<br>マ<br>マ<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス<br>ス | -<br>-<br>-<br>-<br>-<br>128                                                        |
| ×<br>×<br>128<br>11/13/15 <sup>2</sup>                                                      | ✓<br>✓<br>★<br>32                                                                           | ✓<br>✓<br>×                                                                                      | -<br>-<br>-<br>-<br>128                                                             |
| ×<br>×<br>128<br>11/13/15 <sup>2</sup>                                                      | ✓<br>✓<br>★<br>32                                                                           | ✓<br>✓<br>×                                                                                      | -<br>-<br>-<br>128                                                                  |
| 128<br>11/13/15 <sup>2</sup>                                                                | 32                                                                                          | •••                                                                                              | -<br>-<br>128                                                                       |
| 128<br>11/13/15 <sup>2</sup>                                                                | 32                                                                                          | •••                                                                                              | -<br>*<br>128                                                                       |
| 128<br>11/13/15 <sup>2</sup>                                                                | 32                                                                                          | •••                                                                                              | ✓<br>128                                                                            |
| 11/13/15 <sup>2</sup>                                                                       |                                                                                             | 128                                                                                              | 128                                                                                 |
|                                                                                             | 44/52/60 <sup>3</sup>                                                                       |                                                                                                  |                                                                                     |
| •                                                                                           |                                                                                             | 11/13/15 4                                                                                       | 12/14/16                                                                            |
| $\checkmark$                                                                                | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ✓                                                                                           | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ✓                                                                                           | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ✓                                                                                           | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ✓                                                                                           | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ✓                                                                                           | ✓                                                                                           | ✓                                                                                                | ✓                                                                                   |
| ic                                                                                          | ported<br>ional<br>Supported                                                                | norted<br>onal<br>Supported                                                                      | norted<br>onal                                                                      |





The AES-C core family implements the FIPS-197 Advanced Encryption Standard, and can be programmed to either encrypt or decrypt 128-bit blocks of data, with a 128-bit, 192-bit or 256-bit cipher-key.

The AES-C core is available in two variations, standard AES32-C and fast AES128-C. AES32-C has a 32-bit internal data path and the AES128-C has an 128-bit internal data-path. The AES32-C core is more compact in size but with lower throughput than the AES128-C core. For the AES32-C core, 44/52/60 clock cycles are required to encrypt or decrypt an input block with a 128/192/256-bit cipher key

respectively. 11/13/15 clock cycles are required for the AES128-C core.

During each step of the encryption / decryption processing, the core requires a previously calculated Round Key Value derived from the cipher-key using a key expansion algorithm. The Round Key Values must be stored to the internal Round Key Table, from which the core acquires the appropriate for each processing step value. Alternatively, instead of directly programming the Round Key Values to the Round Key Table, an optional Key Expander module is provided. The cipher-key is given to the core and the Key Expander module automatically calculates the Round Key Values and fills the internal Round Key Table.

An included configurable wrapper surrounds the AES-C cores, which implements a Block Cipher mode of operation. Block Cipher modes supported are: ECB, CBC, CFB, OFB and CTR.



## AES32-P, AES128-P

AES Encryption / Decryption, Programmable Block Cipher Mode



The AES-P core family implements the FIPS-197 Advanced Encryption Standard, and can be programmed to either encrypt or decrypt 128-bit blocks of data, with 128- bit, 192-bit or 256-bit cipher-key. The Block-Cipher mode of operation can also be run-time programmed to one of: ECB, CBC, CFB, OFB, CTR.

The AES-P core is available in two variations, Standard AES32-P and Fast AES128-P. AES32-P has a 32-bit internal data path and the AES128-P has an 128-bit internal data-path.

During each step of the encryption / decryption processing, the core requires a previously calculated Round Key Value derived from the cipher-key using a key expansion algorithm. The Round Key Values must be stored to the internal Round Key Table, from which the core acquires the appropriate for each processing step value. Alternatively, instead of directly programming the Round Key Values to the Round Key Table, an optional Key Expander module is provided. The cipher-key is given to the core and the Key Expander module automatically calculates the Round Key Values and fills the internal Round Key Table.

The AES-P cores include run-time programmable Block-Cipher mode selection between ECB, CBC, CFB, OFB and CTR modes.

## **AES-GCM128** GCM-AES, Authenticated Encryption / Decryption



The AES-GCM128 core implements the GCM-AES authenticated encryption / decryption function, as specified in NIST's SP800-38D recommendation for GCM and GMAC, and FIPS-197 Advanced Encryption Standard. The core can be programmed to either encrypt or decrypt 128-bit blocks of data, with a 128-bit, 192-bit or 256-bit cipher key. In addition a Hash value – the Tag – is calculated using the GHASH algorithm for the encrypted or additional plaintext data. In decryption mode, the calculated TAG is compared with the TAG that accompanies the ciphertext data, and a Fail or Pass flag is generated.

The AES-GCM128 core has a 128-bit data path size, meaning 1 clock cycle is required to load/unload each 128-bit plaintext/ciphertext block. A Key Expander is included for the AES in the AES-GCM128 core to automatically generate the Round Key Values for AES. Since the core has a 128-bit datapath and all internal operations are performed on 128-bit words, 12/14/16 clock cyles per HASH are required per 128-bit block. The AES-GCM128 core supports 96-bit Initialization Vectors and input / output Tags of configurable length.



Expanding to the area of Memory Controllers, Alma Technologies currently offers a rich in features SPI Flash Memory Controller, that offers reliable communication up to the very high bitrates supported by the most recent devices.

The SPI Flash Memory Controller is a mature design proven in multiple ASIC designs, and is highly configurable aiming to provide support even for future devices.



## **SPI Flash Memory Controller**

The SPI Memory Controller provides the necessary functionality to a host application in order to communicate with a serial SPI Flash memory device. The controller supports three types of memory accesses: read, write and erase, as well as custom instructions programmable during run-time.

### **Features**

#### Device Independent

- Automatic identification of a variety of memories.
- Configurable memory features to allow support for more serial flash devices.
- Programmable custom SPI instructions provide supporti for new SPI Flash device features.

#### High Performance

- Supports single SPI, dual output SPI, dual input / output SPI, quad input / output SPI.
- Separate Clock domains for SPI subsystem and host data buses.
- Programmable number of dummy cycles on Quad mode Read operations.
- Programmable capture delay for MISO inputs permits large SPI bus round-trip delays.
- Configurable internal FIFOs and programmable thresholds enable full bandwidth utilization.
- Automatic identification of maximum bandwidth access mode among single SPI, dual output SPI, dual input / output SPI, quad input / output SPI.

#### Flexible Access Model

- Registered Mapped I/O.
- Read access sizes from I byte up to memory density.
- Read accesses starting from any address offset.
- Write access sizes from 4 bytes up to memory density.
- Write accesses starting from any address offset that is multiple of 4.
- Erasure of:
- $\,\circ\,$  any sector (4KB),
- any block (64KB),
- whole chip.

#### Ease of Integration

- Auto-detection of a wide set of serial flash devices to minimize programming overhead.
- Auto detection of the fastest way to read or program the memory, to maximize bandwidth and minimize programming overhead.
- Deep Power-down Mode support to minimize power consumption.
- Optional Execute-on-the-Fly™ interface for better CPU connectivity.
- Optional Block Read interface that can transfer a block of data with minimal programming.
   Block Read can be automatically initiated after Power-On, or on demand during run-time.

A host can interface to the Serial Flash in a number of ways. Transferring data from the Flash memory to the host is done with minimum effort with a Block Read Interface, which uses a DMA mechanism to transfer a block of data to the host's memory space. Alternatively, the host can introduce a read request using the core's programmable registers. Then the core serves this request and sends the necessary instructions to the Serial Flash device. An additional RAM like interface which permits on-the-fly code execution is also available.

The design uses two clock domains. The first clock domain is used for the host interface of the core, while the second clock domain is used for generating the SPI clock and must be at least 2 times faster than the target SPI clock.

The SPI-MEM-CTRL core features configurable size read and write FIFOs and an interface to an external configuration memory. The external configuration memory is used to store parameters specific to the devices to be auto-detected by the core.



The SPI-MEM-CTRL core is a highly flexible Quad SPI mode controller and can be configured before synthesis or programmed during run-time to support a large number of SPI Serial Flash memories, even less standard ones. Support for newer serial flash devices with densities larger than 128Mbits is also included.





### Alma Technologies S.A.

Leoforos Marathonos 2, 190 09 Pikermi, Greece

T: +30 210 6039 850 F: +30 210 6036 034

info@alma-tech.com www.alma-tech.com