Victoria

 




As an Amazon Associate I earn from qualifying purchases.

CUDA GPU Concurrent (parallel) Programming

C, C++, and Python 3 code running asynchronously using

  • threads
  • queues
  • other concurrent programming techniques
Relevance
  • CUDA
  • OpenCL
  • Metal
  • OpenAcc
  • PyCUDA
  • jCuda
Hardware:
  • AMD
  • Apple
  • FPGA
  • multi-core CPUs

Pitfalls of Concurrent Programming

  • race conditions
    • the expected order of thread operations is not followed
  • resource contention
    • two or more threads attempt to modify the same memory
  • deadlock
    • one or more processes are blocked by waiting for a resource
  • live locks
    • two or more processes are stuck in a loop, but cannot finish while waiting for resources
  • resource over-utilization
    • too few or too many threads, context switching
    • memory required is too large
    • memory changes too often
  • resource under-utilization
    • sitting idle


Semaphore, for all intents and purposes, is an atomic variable that has more than one thread requiring it, which means that a predefined number of threads can use the semaphore to enter a critical section of code. 
A lock is the more restrictive parent asynchronous mechanism for a single thread to enter a critical section of code. Thus a semaphore is a more relaxed form of lock.


Concurrent Programming Algorithms

  • Dining Philosophers
    • multiple processes require the same resources to complete their jobs
  • Producer-Consumer
    • consumers need to read the data
      • in order
      • no duplication
    • Producers add data in order it needs to be processed
  • Sleeping Barber
    • customers are waiting
    • single barber
    • if the barber is sleeping customer should wake him
    • If there is no space in the queue, customers are not added
  • Data and Code Synchronization


References




As an Amazon Associate I earn from qualifying purchases.

FPGA

FPGA = Field Programmable Field Arrays

FPGA is not DSP processor, but add flexibility

ASIC = Application Specific Integrated Circuit


FPGA is used for designing microchip ASIC (i.e.  DSP, CPU, TPU)


Please see the video: https://www.youtube.com/watch?v=EVy4KEj9kZg&

The Augustus De Morgan law states that 

All digital logic design is created by using..

  • AND gate
  • OR gate 
  • NOT inverter




Image from Intel Technology: Architecture All Access: Modern FPGA Architecture


What is a clock?


A clock (or a square wave) is a signal that raises (ON) and falls (OFF) at the set frequency. 
The time between raises is called a period and it is constant.



What is a Flip Flop?

A digital flip flop (DFF) is a storage device that can be created by the fore-mentioned gates. 
The data IN input is captured only when the clock signal raises and it is stored as output OUT.
The value will not be replaced in OUT until the clock raises again.

What is LUT?


A LUT is a Lookup Table, built with Flip Flops, that allows logic to be programmed.
We create is by populating outputs of the logic functions for some number of input variables into a specific number of memory locations which we call LUT masks.
We set multiplexers to operate using a Truth Table.  See image below:





Images from Intel Technology: Architecture All Access: Modern FPGA Architecture t=342s


What is ALM?


An ALM is an Adaptive Logic Module or Configurable Logic Block 
which are composed of Adaptive LUT, Full Adders, and DFF.


What is Programmable Routing & Interconnect?

This is a programmable one-way wiring between logic blocks.
You buy FPGA for logic, but you pay for routing.



Which programming languages do we use for FPGA?


  • Data Parallel C++ using oneAPI
  • VHDL
  • Verilog

How to Begin a Simple FPGA Design






Topics to describe in the future


  • Oscillator
  • Integrator (scaled accumulator)
  • low-pass filters
  • LSB
  • comb filters
    • highly modified FIR filter
    • self-contained and can work in isolation
  • CIC FIlter
  • IIR Filter
  • Elliptic filter
  • Chebyshev Type II filter
  • Bessel filter
  • Butterworth filter
  • Parallel BiQuad IIR FIlter
  • TDM Time Division Multiplexing
  • Mac filters
  • transpose filters
  • Fast Furrier Transform FFT
    • Jean Baptiste Fourier 1807
    • any period signal could be made by adding together a series of pure tones
      • square ~= sine() + 1/3 + 1/5 + 1/7 + ...
  • ping-pong buffers
    • write to one while processing another one
    • uses twice as much memory
  • Windowing
  • FFT, recall DFT
  • Single Cycle Butterfly
  • RADIX-2 or RADIX-4 butterfly
Datapath vs Post Process
  • In datapath 
    • Each pixel/point delayed by X lines 
    • Frame buffers
    • Line buffers
    • filter operations
    • Camera pipeline
      • image sensor
      • image processing
      • video processing
      • compression
      • I/O
  • Stored Image - no delays
  • Interlaced
  • Progressive
Video Formats
  • SD 4:3 720x480i (interlaced)
  • SD 16:9 960x480p (progressive)
  • HD 720p 1280x720p  60fps
  • HD 1920x1080p
  • YUV - color video on black and white display
  • OSD on-screen display
  • PIP picture-in-picture
  • VDMA - video-specialized version of DMA
  • AXI interface in Xilings 
  • Motion Adaptive Noise Reduction (MANR) 
  • Xilinx FPGA has hardware acceleration for Object Segmentation for video, but it might be used for range image
  • Defective Pixel Correction
  • edge-adaptive image correction (not perfect) 
  • pixel adaptive, 
  • color correction 
    • D65 - daylight
    • 3x3 matrix correction
  • gamma correction
    • non-linear image brightness
    • LUT
    • better contrast
  • lens control
    • focus
    • auto-focus 
      • passive, looks for edges 
      • active, emits IR signal to gauge distance in the center of the lense
    • auto-exposure
    • auto-white balance












As an Amazon Associate I earn from qualifying purchases.

IR vision

Color Spectrum

As we all know, the regular day-light cameras operate in 3 colors, Red, Green, and Blue (RGB). The choice of these colors is dictated by what the human eye can recognize. Some animals can see, or sense, other colors, from ultraviolet for bees to infrared for vipers.

The computer, however, does not have any limitation and it can "see" in a full wavelength spectrum, provided the right sensors. 

Far Infrared

In this article, we will focus on infrared (IR), and specifically far-infrared (FIR), or long-wave infrared (LWIR). 

The infrared spectrum is especially interesting because it is equivalent to the heat emitted that contrasts with the surrounding environment.

The examples of the objects visible in the infrared are:
- people 
- animals
- car engines
- heaters and chimneys
- hot fluid and gas leaks
- and anything that is hotter than the surroundings

It is also worth noting that some objects, especially metal conductors can be "seen" as colder than the surrounding as they dissipate any local heat.

Not just the night vision

Many people think that infrared equates to the night vision, but that is not necessarily the truth. In the daylight, a well-camouflaged animal is equally well visible at night as in the daylight using an infrared camera.

The RGB + FIR systems.

The combination of visible light (RGB) and heat-sensing (IR) is especially useful as RGB allows us to perceive the shapes of things like terrain, vegetation, and other objects and the IR allows us to spot the people, animals and other head-emitting objects.


Practical applications

The primary goal for my research is automotive safely, detecting pedestrians and large animals may significantly reduce the number of accidents. 
As a personal anecdote, I would like to bring the observation, that in Michigan where I live, there is at least one deer collision for every 50 km (or miles) traveled. This observation is obvious for anyone who travels "Up North" on weekends.

The deer, moose, bear, elk, antelope, wolf, coyote, raccoon, and many smaller animals are very difficult to detect for humans, especially at dawn or dusk in poor visibility resulting in catastrophic and gruesome collisions.

Obviously, the RGB + IR has other uses, such as search and rescue - the military and other organizations have developed this technology for decades.


Convolutional Neural Networks (CNN)

  • MFNet
  • RTFNet
  • PST900: dual-stream (RGB and IR) method 


Representative Data Collection

You can train the machine models on a relatively modest dataset, but in order for it to generalize well, the data-sets have to be vast and representative. 

At this time, the open-source datasets are rare and of limited purpose as the training should be performed on a dataset taken with a specific and calibrated hardware.

Data Annotation
The collected data has to be annotated, usually by humans, in order to train and validate the results.


The biggest part of the effort
By far collecting the data sets is the most time consuming and expensive part of the effort.










REFERENCES:


As an Amazon Associate I earn from qualifying purchases.

My favorite quotations..


“A man should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”  by Robert A. Heinlein

"We are but habits and memories we chose to carry along." ~ Uki D. Lucas


Popular Recent Articles