Head_Report - DarshanNalwal/hello_world GitHub Wiki

Index

Phase	Topic
Phase 1: Literature survey	Introduction
	Research
	Low-level Vision Processing
	Visual Pathway
	Important features of different vision sensors
	Comparison of vision sensors

Phase 2: Requirement analysis	Requirements
	Engineering Specifications

Phase 3: Functional analysis	Black Box
	White Box
	Hardware architecture
	Software architecture

Phase 4: Conceptualization	Morphological chart
	Simulation
	Motor Selection

Phase 5: Prototyping	Bill of Materials

Phase 1 Literature survey

Introduction

Vision is perhaps the most important sense in human beings, and it is also a perceptual modality of crucial importance for building artificial creatures. In the human brain, visual processing starts early in the eye, and is then roughly organized according to functional processing areas in the occipital lobe. Fundamental problems that remain to be solved in several fields, such as
- Computer Vision: object segmentation and recognition;event/goal/action identification; scene recognition
- Robotics: Robot tasks involving differentiated actions;learning and executing tasks without initial manual setup
- Cognitive Science and AI: understanding and creation of grounded knowledge structures, situated in the world
- Neurosciences: a more profound understanding of biological brains and their macro and micro structures
Research
- DESIGN AND SIMULATION OF NEW LOW COST HUMANOID ROBOT
  - INTRODUCTION
    - CALUMA stands for CAssino Low Cost HUMAnoid Robot
    - The maximum dimensions of the CALUMA design are 962 mm height, 839 mm width and 413 mm depth
    - PLC is used to to control actuators
    - 14 actuators are used to achieve desired Degrees Of Freedom
  - HEAD MODULE
    - Consists of Telescopic Manipulator with 2-DOF
    - DC motor actuates pitch DOF of neck
    - Linear actuators actuates the up and down DOF of head
    - Head sub-system is a combination of serial telescopic manipulator + parallel manipulator
    - The parallel manipulator helps in attaching head to the trunk module
    - Commercial web-cam is used for vision capability, it is installed as end-effector on telescopic manipulator
    - Range of actuators = -90mm to 90mm
    - Dynamic Characteristics of sub-system(head) Weight = 7.24N ,Ixx=2.82E-3 kg-m^2,Iyy=2.42E-3 kg-m^2,Izz=3.78E-4 kg-m^2
  - VISION SYSTEM
    - Input from camera [obstacles] are categorized as
    1. Mountable 2. Unsurmountable 3. Step
    - The obstacle detection system is evaluated by ViGWaM(Vision Guided Walking Machine) emulation environment
    - Vision system acquires information for collision-free and goal-oriented locomotion
    - Installing cameras with different zooms on one camera head for precise calssification
    - Height = 1.8 m,Weight = 40 kg , DOF =17 ,Maximum walking speed = 2km/h , step length = 45cm
  - PERFORMANCE OF MOTION CONTROL BOARD
    - Maximum sampling rate that can be achieved
      - For sufficient minimal angle resolution 16-bit representation is used to index 360� workspace
      - With PD position control loop for 4 motors, time required to read actual position of motors of external pulse decoders,compute,apply new control signal=1300us USB communication = 240us In USB protocol packets can be sent only by every 4ms Therefore,Overall sampling rate for position control loop is 250Hz
Low level vision processing
- Cognitive Issues
  
  Transformation of visual representations starts at the very basic input sensor - in the eye - by having light separated into gray-scale tones by rod receptors, and in the color visible spectrum by cones (Goldstein, 1996). The brain's primary visual cortex receives then the visual input from the eye. These low-level visual brain structures change the visual input representation yielding more organized percepts. Such changes can be modelled through mathematical tools like Daubechies-Wavelet or Fourier transforms for spectral analysis. Multi-scale representations fine-tune such organization.
  
  Edge detection and spectral analysis are known to occur on the hypercolumns of V1 occipital brain's area. Cells at V1 have the smallest receptive fields. Vision and action are also highly interconnected at low visual processing levels in the human brain.
- Spectral Analysis
  - Spectral co-efficient of an image are often used in computer vision for segregation or for coding contextual information of a scene.
  - The selection of both the spatial and frequency resolutions poses interesting problems, as started by Heisenbergs Uncertainity Principle.
  - Short-Time Fourier Transform
    
    The Fourier transform is one of the techniques often applied to analyze a signal into its frequency components . A variant of it is the Short- Time Fourier transform (STFT). This transform divides the signal into segments. The signal truncated by each of these segments is convolved by a window such as a rectangular or Hamming window - and the Fourier Transform is then applied to it. When applied over different window sizes, this transform obtains spatial information concerning the distribution of the spectral components.
  - Wavelets
    - Another method widely used to extract the spatial distribution of spectral components is the Wavelet transform.
    - Features: Texture segmentation,image/video encoding and image noise reduction.
    - Distinct feature of wavelet encoding is that reconstruction of the original image is possible without loosing information.
    - The Continuous Wavelet Transform (CWT) consist of convolving a signal with a mathematical function at several scales.
    - CWT drawback: Not suited for discrete computation.
    - Discrete Wavelet Transform (DWT) overcomes the quantization problems and allows fast computation of the transform for the digitized signals.
    - Features: Provides perfect signal reconstruction, DWT produces sparse results, which is desired for image compression.
  - Gabor Filters
    
    Gabor filters are yet another technique for spectral analysis. These filters provide the best possible tradeoff for Heisenberg's uncertainty principle, since they exactly meet the Heisenberg Uncertainty Limit. They offer the best compromise between spatial and frequency resolution. Their use for image analysis is also biologically motivated, since they model the response of the receptive fields of the orientation-selective cells in the human visual cortex.
  - Comparative analysis and Implementation
    - STFT
      
      Windowing dilemma, processing windows and signal cannot be reconstructed.
    - Wavelets
      
      Wavelets output at a higher frequencies are generally not so smooth and orientation selectivity is poor. Wavelets are narrow in their capabilities.
      
      However, wavelets are much faster to compute than Gabor filters and provide a more compact representation for the same number of orientations, which motivated their selection over Gabor filters.
  - Chrominance/Luminance Attributes
    
    Color is one of the most obvious and pervasive qualities in our surrounding world, having important functions for perceiving accurately forms, identifying objects and carrying out tasks important for our survival.
    - Two of the main functions of color vision are:
      - Creating perceptual segregation
        
        This is the process of estimating the boundaries of objects, which is a crucial ability for survival in many species.
      - Signaling
        
        Color also has a signaling function. Signals can result from the detection of bright colors or from the health index given by a person's skin color or a peacock's plumage.
Visual Pathway
- Developmental Issues
  
  Object detection and segmentation are key abilities on top of which other capabilities might be developed, such as object recognition. Both object segmentation and recognition are intertwined problems which can then be cast under a developmental framework: models of objects are first learned from experimental human/robot manipulation, enabling their a-posteriori identification with or without the agents actuation.
- Object Recognition from Visual Local Features
  
  Recognition of objects has to occur over a variety of scene contexts. This led tothe development of an object recognition scheme to recognize objects from color, luminance and shape cues, or from combinations of them. The object recognition algorithm consists therefore of three independent algorithms. Each recognizer operates along (nearly) orthogonal directions to the others over the input space. This approach offers the possibility of priming specific information, such as searching for a specific object feature (color, shape or luminance) independently of the others. For instance, the recognizer may focus the search on a specific color or sets of colors, or look into both desired shapes and luminance.
Important features of different vision sensors
- Time Of Flight Imagers
  - Measures time of flight required for a single laser pulse to leave a camera and reflect back
  - Records full 3D scenes using single laser pulse
  - Rapid acquisition and rapid real-time processing of scene information
  - Phase-shift time of flight imagers measurement working range = 10-30m Direct time of flight imagers measurement working range = 1500m
  - Costs lower than that of LIDAR
  - Lower power consumptions with respect to laser scanners
  - Unlike stereo imaging, depth accuracy is practically independent of textural appearance
- Stereo Cameras
  - Type of cameras with 2 or more lenses with separate image sensor for each lens
  - Uses a method called stereo photography to capture images Stereo Photography: Method of distilling a noisy video signal into a coherent data set that a computer can process into actionable symbolic objects
  - They perform well in both indoor and outdoor applications
  - The sensor system generates a depth map in which each pixel represents a distance to the sensor. As a result, it can detect not only the presence of objects in the danger zone, but also their relative size and position
  - Because of its simple design, a stereo camera has considerable cost advantages over currently approved technologies. Two standard industrial cameras are fixed at a defined angle to one another. The adjustable lenses can be quickly adapted to fit the required scenario
  - A stereo camera can provide a higher reliability than is currently required. The sensor system has no moving parts, thus reducing wear and tear and making it less prone to errors.
- 3D LIDARS
  - 3D laser imaging systems operate at night in all ambient illuminations and weather conditions. These techniques can perform the strategic surveillance of the environment for various worldwide operations (up to long ranges)
  - Vertical accuracy 5-15 cm (1s), Horizontal accuracy 30-50 cm
  - Data collection independent of sun inclination and at night and slightly bad weather
  - LiDAR pulses can reach beneath the canopy thus generating measurements of points there unlike photogrammetry
  - Up to 167,000 pulses per second. More than 24 points per m2 can be measured. Multiple returns to collect data in 3D.
  - LiDAR also observes the amplitude of back scatter energy thus recording a reflectance value for each data point. This data, though poor spectrally, can be used for classification, as at the wavelength used some features may be discriminated accurately.
Comparison of vision sensors
- Vision Sensors
  - Intel RealSense 3D camera SR300
    - Components placement:
    - Components used:
      - Cable connector
      - Imaging ASIC
      - IR camera
      - LED
      - Colour camera
      - IR laser projector
      - Colour camera ISP
    - 3D Imaging system
      - The IR projector and IR camera operate in tandem using coded light patterns to produce 2D array of monochromatic pixel values
      - The values are processed by imaging ASIC to generate depth
      - The colour camera consists of chromatic sensor and image signal processor which capture and processes chromatic pixel values
      - These values generate colour video frames which are transmitted to the imaging ASIC and then transmitted to client values
      - Colour camera can function independently from thr infrared camera or synchronously to create colour+IR+ depth video frame
  - Intel RealSense 3D camera ZR300
    - Components used:
      - Left IR camera
      - Colour camera
      - IR laser projection
      - USB 3.0 connector
      - Right IR camera
      - Fisheye camera
      - IMU
      - MIPI + I^2C connector
        
        Components placement:
    - 3D imaging system:
      - They use left IR, right IR and IR laser projector to calculate the depth and then the left and right camera data is sent to ASIC
      - ASIC calculates the depth value for each pixel in the image
      - IR projector is used to enhance the ability of the system to calculate depth in scenes with low amount of texture
      - Camera module output type is a depth measurement from the parallel plane of the module and not the absolute range from the module cameras
    - Advantages of using fisheye camera
      - It creates an image that replicates a circle or barrel with 180 degree or 360 degree wide FOV.
      - The curve of the fisheye lens can see in all the directions at the same time

Phase 2 Requirement analysis

Requirements

Recognize objects and operators(users)
Calculate relative distance of objects from itself
Identify orientation of floor to aid in COG balancing and path planning
Recognize colors of objects and provide audio confirmation of color detected
Track object movement, and change its orientation to fit object into its FOV
Produce 3D-model of the environment
Work in real-time
Fit into the head model which resembles head of a six year old child
Have wide FOV
Compact size
Have noise cancellation feature so that no wrong commands are interpreted by the sensor
Detect command to be executed
Resemble human eyes
Show emotions by moving eyes while thinking(processing)
Provide audio output
Audio output should be clean and clear
Sustain load of all the components
Have inbuilt gear system so that complexity of the design is reduced
Have feedback system
Accurate and precise
Durable
Motor size should be small and compact, such that it fits into the space available
Motors placement should be modular
DOF should be such that it can resemble human head movement
Cost-efficient
Light-weight
Respond quickly
Have independent and modular controller
Design should be stable
Upgradability feature so that any changes necessary can be made
Follow robot laws to ensure safety of users and itself
Cognitive behavior

Engineering Specifications

Serial Number	Requirement	Nominal value	Ideal value
1	Recognize objects and operators(users)	-	-
2	Calculate relative distance of objects from itself	2.5m	3m
3	Identify orientation of floor to aid in COG balancing and path planning	-	-
4	Recognize colors of objects and provide audio confirmation of color detected	390nm-700nm	390nm-700nm
5	Track object movement, and change its orientation to fit object into its FOV
6	Produce 3D-model of the environment	-	-
7	Work in real-time
8	Fit into the head model which resembles head of a six year old child	14x12x10	12x12x12
9	Have wide FOV	166(deg)	180(deg)
10	Compact in size	<14x12x10	12x12x12
11	Have noise cancellation feature so that no wrong commands are interpreted by the sensor	-	-
12	Detect command to be executed	80dB	140dB
13	Resemble human eyes	-	-
14	Show emotions by moving eyes while thinking(processing)	-	-
15	Provide audio output	144dB	140dB
16	Audio output should be clean and clear	-	-
17	Sustain load of all the components	3.5kg	2kg
18	Have inbuilt gear system so that complexity of the design is reduced	-	-
19	Have feedback system	-	-
20	Accurate and precise	-	-
21	Durable	8yrs	10yrs
22	Motor size should be small and compact, such that it fits into the space available	<14x12x10	12x12x12
23	Motors placement should be modular	-	-
24	DOF should be such that it can resemble human head movement	2	6
25	Cost-efficient	25000	20000
26	Light-weight	3kg	2kg
27	Respond quickly
28	Have independent and modular controller	-	-
29	Design should be stable	-	-
30	Upgradability feature so that any changes necessary can be made	-	-
31	Follow robot laws to ensure safety of users and itself	-	-
32	Cognitive behavior	-	-

Phase 3 Functional analysis

Black Box

White Box

Hardware architecture

Software architecture

Phase 4 Conceptualization

Morphological chart

Functions	Solution 1	Solution 2	Solution 3	Solution 4
Actuation	Maxon geared DC motor	Cube servo
Converting electrical signals into sound signals	VEX Speaker
Converting sound signals into electrical signals	Lapel microphone
Perception	Intel realsense SR300	Intel realsense ZR300	ZED
Connectivity	ethernet	wifi	Bluetooth	usb
Power transmission	Gears	belt
Controllers	Arduino	Raspberry PI	Beaglebone	Toradex
Motors	Cytron G-15 cube servo	MG996R digital servo	Dynamexell

Simulation

Importing robot to gazebo

Placing obstacle in gazebo

Depth image of obstacle in rviz

Path generation

Obstacle avoidance

Motor Selection

	Dynamixel MX28-t	Dynamixel MX28-t	Cube Servo
Operating Voltage	12v	14.8v	6.5-17.8v
Torque	25.5kg-cm	31.6kg-cm	15kg-cm
No-load Speed	55 rpm	67 rpm	63 rpm
Current	1.4 A	1.4 A	1.5 A
Weight	72gm	72gm	63gm
Operating Angle	360deg	360deg	360deg
Protocol	TTL Asynchronous Serial	TTL Asynchronous Serial	Half duplex serial with daisy chain
Position Feedback	Yes	Yes	Yes
Load voltage Feedback	Yes	Yes	Yes
Temperature Feedback	Yes	Yes	Yes

Conceptualization

Head_Report - DarshanNalwal/hello_world GitHub Wiki

Index

Phase 1 Literature survey

Introduction

Research

DESIGN AND SIMULATION OF NEW LOW COST HUMANOID ROBOT

INTRODUCTION

HEAD MODULE

VISION SYSTEM

PERFORMANCE OF MOTION CONTROL BOARD

Maximum sampling rate that can be achieved

Low level vision processing

Cognitive Issues

Spectral Analysis

Short-Time Fourier Transform

Wavelets

Gabor Filters

Comparative analysis and Implementation

STFT

Wavelets

Chrominance/Luminance Attributes

Two of the main functions of color vision are:

Creating perceptual segregation

Signaling

Visual Pathway

Developmental Issues

Object Recognition from Visual Local Features

Important features of different vision sensors

Time Of Flight Imagers

Stereo Cameras

3D LIDARS

Comparison of vision sensors

Vision Sensors

Intel RealSense 3D camera SR300

3D Imaging system

Intel RealSense 3D camera ZR300

3D imaging system:

Advantages of using fisheye camera

Phase 2 Requirement analysis

Requirements

Engineering Specifications

Phase 3 Functional analysis

Black Box

White Box

Hardware architecture

Software architecture

Phase 4 Conceptualization

Morphological chart

Simulation

Importing robot to gazebo

Placing obstacle in gazebo

Depth image of obstacle in rviz

Path generation

Obstacle avoidance

Motor Selection

Conceptualization

Phase 5: Prototyping

Bill of Materials