Você está na página 1de 42

Open Standard APIs for

Vision and Camera Processing


Neil Trevett
Vice President Mobile Ecosystem, NVIDIA
President, Khronos Group
Copyright Khronos Group 2014 - Page 1

Khronos Connects Software to Silicon


Open Consortium creating
ROYALTY-FREE, OPEN STANDARD
APIs for hardware acceleration
Defining the roadmap for
low-level silicon interfaces
needed on every platform
Graphics, compute, rich media,
vision, sensor and camera
processing
Rigorous specifications AND
conformance tests for crossvendor portability

Acceleration APIs
BY the Industry
FOR the Industry

Well over a BILLION people use Khronos APIs


Every Day
Copyright Khronos Group 2014 - Page 2

Khronos Standards
3D Asset Handling
- 3D authoring asset interchange
- 3D asset transmission format
with compression

Visual Computing

- 3D Graphics
- Heterogeneous Parallel Computing

Over 100 companies defining royalty-free


APIs to connect software to silicon

Acceleration in HTML5

Sensor Processing

- 3D in browser no Plug-in
- Heterogeneous computing for JavaScript

- Vision Acceleration
- Camera Control
- Sensor Fusion

Copyright Khronos Group 2014 - Page 3

Visual Computing = Graphics PLUS Vision

Vision
Processing

Imagery

Enhanced sensor
and vision
capability deepens
the interaction
between real and
virtual worlds

Data
Graphics
Processing

Real-time GPU Compute


Research project on CUDA-enabled laptop
High-Quality Reflections, Refractions, and Caustics in Augmented
Reality and their Contribution to Visual Coherence
P. Kn, H. Kaufmann, Institute of Software Technology and Interactive
Systems, Vienna University of Technology, Vienna, Austria
https://www.youtube.com/watch?v=i2MEwVZzDaA

Copyright Khronos Group 2014 - Page 4

Mobile Visual Computing = New Experiences


Need for advanced sensors
and the GPU throughput to
process them

Computational
Photography and
Videography

Face, Body and


Gesture Tracking

3D Scene/Object
Reconstruction

Augmented
Reality
Copyright Khronos Group 2014 - Page 5

Vision Pipeline Challenges and Opportunities


Growing Camera Diversity

Diverse Vision Processors

Sensor Proliferation

Capturing color, range


and lightfields

Driving for high performance


and low power

Diverse sensor awareness of


the user and surroundings
Light / Proximity
2 cameras
3 microphones

Touch

Camera sensors >20MPix


Novel sensor configurations
Stereo pairs
Plenoptic Arrays
Active Structured Light
Active TOF

Flexible sensor and camera


control to generate
required image stream

Camera ISPs
Dedicated vision IP blocks
DSPs and DSP arrays
Programmable GPUs
Multi-core CPUs

Use best processing available


for image stream processing
with code portability

19

Position
- GPS
- WiFi (fingerprint)
- Cellular trilateration
- NFC/Bluetooth Beacons
Accelerometer
Magnetometer
Gyroscope
Pressure / Temp / Humidity

Control/fuse vision data


by/with all other sensor data
on device
Copyright Khronos Group 2014 - Page 6

Vision Processing Power Efficiency


Depth sensors = significant processing
- Generate/use environmental information

Advanced
Sensors

Wearables will need always-on vision


- With smaller thermal limit / battery than phones!

GPUs has x10 CPU imaging power efficiency


- GPUs architected for efficient pixel handling
Traditional cameras have dedicated hardware
- ISP = Image Signal Processor on all SOCs today

Potential for dedicated sensor/vision silicon


- Can trigger full CPU/GPU complex

But how to program specialized processors?


Performance and Functional Portability

X100
Power Efficiency

SOCs have space for more transistors


- But cant turn on at same time = Dark Silicon

Wearables

X10

X1

Dedicated
Hardware
GPU
Compute
Multi-core
CPU
Computation Flexibility
Copyright Khronos Group 2014 - Page 7

OpenVX Power Efficient Vision Acceleration


Out-of-the-Box vision acceleration framework
- Low-power, real-time, mobile and embedded
Performance portability for diverse hardware
- ISPs, Dedicated vision blocks,
DSPs and DSP arrays, GPUs, Multi-core CPUs
Suited for low-power, always-on acceleration
- Can run solely on dedicated vision hardware
Foundational API for vision acceleration
- Can be used by middleware or applications

Application

OpenCV open
source library

Other higher-level
CV libraries

Complementary to OpenCV
- Which is great for prototyping

Khronos open source sample implementation


- To be released with final specification

Open source sample


implementation

Hardware vendor
implementations

Copyright Khronos Group 2014 - Page 8

OpenVX Graphs The Key to Efficiency


Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
- Processing can be tiled to keep data entirely in local memory/cache
VXU Utility Library for access to single nodes
- Easy way to start using OpenVX by calling each node independently
EGLStreams can provide data and event interop with other Khronos APIs
- BUT use of other Khronos APIs are not mandated

Native
Camera
Control

OpenVX
Node

OpenVX
Node
OpenVX
Node

OpenVX
Node

Downstream
Application
Processing

Example OpenVX Graph


Copyright Khronos Group 2014 - Page 9

OpenVX 1.0 Function Overview


Core data structures
- Images and Image Pyramids
- Processing Graphs, Kernels, Parameters
Image Processing
- Arithmetic, Logical, and statistical operations
- Multichannel Color and BitDepth Extraction and Conversion
- 2D Filtering and Morphological operations
- Image Resizing and Warping

Core Computer Vision


- Pyramid computation
- Integral Image computation
Feature Extraction and Tracking
- Histogram Computation and Equalization
- Canny Edge Detection
- Harris and FAST Corner detection
- Sparse Optical Flow

Widely used extensions


adopted into future
versions of the core

OpenVX Specification
Evolution
OpenVX 1.0 defines
framework for
creating, managing and
executing graphs
Focused set of widely
used functions that are
readily accelerated
Implementers can add
functions as extensions
Copyright Khronos Group 2014 - Page 10

Example Graph - Stereo Machine Vision


OpenVX Graph
Camera 1

Stereo
Rectify with
Remap

Camera 2

Stereo
Rectify with
Remap

Compute Depth
Map
(User Node)

Detect and
track objects
(User Node)

Image
Pyramid

Object
coordinates

Compute
Optical
Flow
Delay

Tiling extension enables user nodes (extensions) to also optimally run in local memory

Copyright Khronos Group 2014 - Page 11

OpenVX and OpenCV are Complementary

Governance

Community driven open source


with no formal specification

Formal specification defined and


implemented by hardware vendors

Conformance

No conformance tests for consistency and


every vendor implements different subset

Full conformance test suite / process


creates a reliable acceleration platform

Portability

APIs can vary depending on processor

Hardware abstracted for portability

Scope

Very wide
1000s of imaging and vision functions
Multiple camera APIs/interfaces

Tight focus on hardware accelerated


functions for mobile vision
Use external camera API

Efficiency

Memory-based architecture
Each operation reads and writes memory

Graph-based execution
Optimizable computation, data transfer

Use Case

Rapid experimentation

Production development & deployment

Copyright Khronos Group 2014 - Page 12

OpenVX Participants and Timeline


Provisional 1.0 specification released November 2013 for industry feedback
- An update to the provisional spec published in July
OpenVX 1.0 final release planned for 2014
- With conformance tests

Itseez is working group chair (the convener of OpenCV)


- Qualcomm and TI are specification editors

Copyright Khronos Group 2014 - Page 13

NVIDIA VisionWorks Uses OpenVX


VisionWorks library contains diverse vision and imaging primitives
Leverages OpenVX for optimized primitive execution
Can extend VisionWorks nodes through CUDA accelerated primitives
Applications and Middleware

Provided with sample library of fully accelerated pipelines


Vision Pipeline Samples
Object
Detection

SLAM

3rd Party Pipelines

VisionWorks
Framework

VisionWorks Primitives
Classifier

Corner
Detection

3rd Party

CUDA Libraries
Tegra K1
Copyright Khronos Group 2014 - Page 14

OpenCL Portable Heterogeneous Computing


Portable Heterogeneous programming of diverse compute resources
- Targeting supercomputers -> embedded systems -> mobile devices
One code tree can be executed on CPUs, GPUs, DSPs and hardware
- Dynamically interrogate system load and balance work across available processors

OpenCL = Two APIs and C-based Kernel language


- Platform Layer API to query, select and initialize compute devices
- Kernel language - Subset of ISO C99 + language extensions
- C Runtime API to build and execute kernels
OpenCL
across multiple devices
Kernel
OpenCL
Code
Kernel
OpenCL
Code
Kernel
OpenCL
Code
Kernel
Code

GPU
DSP

HW

CPU
CPU

Copyright Khronos Group 2014 - Page 15

OpenCL as Parallel Language Backend

JavaScript
binding for
initiation of
OpenCL C
kernels

Language for
image
processing and
computational
photography

MulticoreWare
open source
project on
Bitbucket

Embedded
array
language for
Haskell

Java language River Trail


extensions
Language
for
extensions to
parallelism
JavaScript

Compiler
directives for
Fortran,
C and C++

PyOpenCL
Python
wrapper
around
OpenCL

Harlan
High level
language
for GPU
programming

SPIR
Standard Portable
Intermediate Representation
(extending LLVM for parallel computation)

SPIR 2.0 Released here at SIGGRAPH

OpenCL provides vendor optimized,


cross-platform, cross-vendor access to
heterogeneous compute resources

Copyright Khronos Group 2014 - Page 16

Mixamo - Avatar Videoconferencing


Real time facial animation capture on mobile ported directly from PC
Animate an avatar while conferencing
Full GPU acceleration of vision processing using OpenCL

NVIDIA Tegra K1 Development Board

Copyright Khronos Group 2014 - Page 17

Khronos APIs for Vision Processing


GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1)
Pervasively available on almost any mobile device or OS
Easy integration into graphics apps no compute API interop needed
Program in GLSL not C
Limited to acceleration on a single GPU

General Purpose Heterogeneous Programming Framework


Flexible, low-level access to any devices with OpenCL compiler
Single programming and run-time framework for CPUs, GPUs, DSPs, hardware
Open standard for any device or OS being used as backed by many languages and frameworks
Needs full compiler stack and IEEE precision

Out of the Box Vision Framework - Operators and graph framework library
Can run on dedicated hardware no compiler needed
Easier performance portability to diverse hardware
Suited for low-power, always-on acceleration
Fixed set of operators but can be extended

It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices


Copyright Khronos Group 2014 - Page 18

Kari Pulli, NVIDIA Research

Copyright Khronos Group 2014 - Page 19

Advanced Camera Control Use Cases


High-dynamic range (HDR) and computational flash photography
- High-speed burst with individual frame control over exposure and flash
Subject isolation and depth detection
- High-speed burst with individual frame control over focus

Rolling shutter elimination


- High-precision intra-frame synchronization between camera and motion sensor
Augmented Reality
- 60Hz, low-latency capture with motion sensor synchronization
- Multiple Region of Interest (ROI) capture
- Synchronized stereo sensors for scene scaling
- Detailed feedback on camera operation per frame

Time-of-flight or structured light depth camera processing


- Aligned stacking of data from multiple sensors

Copyright Khronos Group 2014 - Page 20

Typical Imaging Pipeline


Lens, sensor, aperture control

Pre-processing

Image Signal Processor


(ISP)

Bayer

Postprocessing

App

RGB
YUV

CMOS sensor
Color Filter Array
Lens

Pre-processing is non-existent in basic use-cases


Pre- and Post-processing can be done on CPU, GPU, DSP
ISP controls camera via 3A algorithms
Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
Copyright Khronos Group 2014 - Page 21

High Dynamic Range (HDR)


HDR works by combining differing exposures into the same image
A variety of methods for HDR, based on application
- Multiple frame HDR (requires frame memory)
- Interlace HDR
- Multiple Zone HDR
Short
exposure

Optional mid
exposure

Long
exposure

HDR processing

HDR requires
- Precise control over camera parameters (exposure)
- Fast capture and processing of multiple images
- Note: with interlace HDR, only 1 image is needed
Copyright Khronos Group 2014 - Page 22

Image stitching, panoramic images

Made with

Requires processing of multiple images


Requires position / geometry information
Requires control of camera (e.g. AE lock)
Copyright Khronos Group 2014 - Page 23

Typical Burst Sequence Applications

Copyright Khronos Group 2014 - Page 24

Pipelined Sensor Model


Traditional one-shot sensor model
- Need to know which parameters were used
- reset pipeline between shots slow
Viewfinding / video mode:
- Pipelined, high frame rate
- Settings changes take effect later
Need new model for Computational
Photography
- Need parameterized SEQUENCE of images
to feed advanced algorithms
Real image sensors are pipelined
- While one frame exposing
- Next one is being prepared
- Previous one is being read out
Copyright Khronos Group 2014 - Page 25

Need for Camera Control API - OpenKCAM


Advanced control of ISP and camera subsystem with cross-platform portability
- Generate sophisticated image stream for advanced imaging & vision apps
No platform API currently fulfills all developer requirements
- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays
- Cross sensor synch: e.g. synch of camera and MEMS sensors
- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI
- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Defines control of Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Auto Exposure (AE)
Auto White Balance (AWB)
Auto Focus (AF)

Image Signal
Processor (ISP)

EGLStreams

Image/Vision
Applications

Copyright Khronos Group 2014 - Page 26

OpenKCAM API Requirements


Provide functional portability for advanced camera applications
- Reduce extreme fragmentation for ISVs wanting more than point and shoot
Application control over ISP processing (including 3A)
- Including multiple, re-entrant ISPs

Control multiple sensors with synch and alignment


- E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras
Enhanced per frame detailed control
- Format flexibility, Region of Interest (ROI) selection
Global timing & synchronization
- E.g. Between cameras and MEMS sensors
Flexible processing/streaming
- Multiple input and output streams with RAW, Bayer or YUV Processing
- Streaming of rows (not just frames)
Enable advanced camera functionality not available on current platforms
Copyright Khronos Group 2014 - Page 27

OpenKCAM is FCAM-based
FCAM (2010) Stanford/Nokia, open source
Capture stream of camera images with precision control
- A pipeline that converts requests into image stream
- All parameters packed into the requests - no visible state
- Programmer has full control over sensor settings for each frame in stream
Control over focus and flash
- No hidden daemon running

Control ISP
- Can access supplemental
statistics from ISP if available
No global state
- State travels with image requests
- Every pipeline stage may have different state
- Enables fast, deterministic state changes
Copyright Khronos Group 2014 - Page 28

OpenKCAM Design Philosophy


C-language API starting from proven designs
- e.g. FCAM, Android camera platform
Design alignment with widely used hardware standards
- e.g. MIPI CSI

Focus on mobile, power-limited devices


- But do not preclude other use cases such as automotive, DSLR
Minimize overlap and maximize interoperability with other Khronos APIs
- But other Khronos APIs are not required
Provide support for vendor-specific extensions

Copyright Khronos Group 2014 - Page 29

Potential Adoption on Android


Android Exposes Java camera APIs to developers
- Controls underlying Camera HAL
Camera HAL v1 API simplified basic point and shoot apps
- Difficult or impossible to do much else

Camera HAL v3 API is a fundamentally different API


- Streams-based to enable more sophisticated camera applications
OpenKCAM builds on FCAM with a goal of
being forward compatible with Android
architecture

Camera API

Open source
project developed
by Nokia and
Stanford
HAL V3 adopts many
FCAM ideas and can use
EGL in its implementation

OpenKCAM may be used to IMPLEMENT Android


Camera HAL and provide an advanced native
camera API in NDK
Copyright Khronos Group 2014 - Page 30

Participating Companies and Milestones

Group charter
approved

Specification
ratification

3Q14

Apr13
Jul13

Sample
implementation and
tests

1Q15

Copyright Khronos Group 2014 - Page 31

OpenKCAM Working Group


Royalty free API for portable access to advanced mobile camera functionality
- Reduce fragmentation and encourage more advanced camera applications
Control for the new wave of sensors to enable advanced imaging and vision
- Multiple sensors, depth cameras, synchronized sensors

Provide sophisticated camera functionality not available on todays platforms


- But work to enable easy adoption by platform vendors
Eager to contribute? Join Khronos OpenKCAM WG!
- http://www.khronos.org/camera
Mikal Bourges-Svenier
- msevenier@aptina.com

Copyright Khronos Group 2014 - Page 32

Neil Trevett, NVIDIA

Copyright Khronos Group 2014 - Page 33

Sensor Industry Fragmentation

Copyright Khronos Group 2014 - Page 34

Sensor Types
Basic sensor data:
- Acceleration, Magnetic Field, Angular Rates
- Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light
- Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer
Sensor fusion
- Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration
- Position
Context awareness
- Device Motion: general movement of the device: still, free-fall,
- Carry: how the device is being held by a user: in pocket, in hand,
- Posture: how the body holding the device is positioned: standing, sitting, step,
- Transport: about the environment around the device: in elevator, in car,

Copyright Khronos Group 2014 - Page 35

Low-level Sensor Abstraction API


Apps request semantic sensor information
StreamInput defines possible requests, e.g.
Read Physical or Virtual Sensors e.g. Game Quaternion
Context detection e.g. Am I in an elevator?

Apps Need Sophisticated


Access to Sensor Data

Advanced Sensors Everywhere

Without coding to specific


sensor hardware

Sensor Discoverability
Sensor Code Portability

Multi-axis motion/position, quaternions,


context-awareness, gestures, activity
monitoring, health and environmental sensors

StreamInput processing graph provides


optimized sensor data stream
High-value, smart sensor fusion middleware can connect
to apps in a portable way
Apps can gain magical situational awareness

Copyright Khronos Group 2014 - Page 36

StreamInput: Platform Integration


Applications

OS Sensor APIs

Middleware

(E.g. Android SensorManager or


iOS CoreMotion)

(E.g. Context-awareness engines,


gaming engines)

Flexible native API to


integrate where needed
depending on existing
platform sensor stacks

Low-level native API defines portable access to


fused sensor data stream and context-awareness

Sensor
Sensor

Sensor
Sensor
Hub
Hub

Copyright Khronos Group 2014 - Page 37

Sensor OSP Announcement


Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput
- Sensor Platforms is StreamInput Spec Editor

Copyright Khronos Group 2014 - Page 38

EGL 1.5 Released at GDC 2014


EGL 1.5 brings functionality from
multiple extensions into core
- Increased reliability and portability
EGLImages
- Sharing textures and renderbuffers

Applications
API Interop
EGL provides efficient
transfer of data and events
between Khronos APIs

Context Robustness
- Defending against malicious code
EGLSync objects
- Improved OpenGL /OpenCL interop
Platform extensions
- Standardized interactions for multiple
OS e.g. Android and 64-bit platforms

sRGB colorspace rendering

Application Portability
EGL abstracts graphics context
management, surface and
buffer binding and rendering
synchronization

OS and Display
Platforms
Copyright Khronos Group 2014 - Page 39

Potential EGL Future Directions


EGLImageStream extensions are very powerful today
- But need wider implementation in drivers
- Stream other types of data unformatted buffers for metadata and more
- GPU-to-GPU streaming and invoking client API activities directly from other client
APIs without CPU intervention
Separation of traditional context/surface functionality from hub functionality
Support for new Khronos APIs where appropriate
- Streaming video + image processing + display use case

Copyright Khronos Group 2014 - Page 40

Khronos APIs for Augmented Reality


AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together

MEMS
Sensors

Sensor
Fusion

Application
on CPUs, GPUs
and DSPs

Vision
Processing

Precision timestamps
on all sensor samples

Advanced Camera
Control and stream
generation

Audio
Rendering

EGLStream stream data


between APIs

3D Rendering and Video


Composition
On GPU

Copyright Khronos Group 2014 - Page 41

Summary
Khronos is building a family of interoperating APIs for portable and
power-efficient vision processing
OpenVX 1.0 has been provisionally released and non-members are invited to
provide feedback on the forums
- http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General

OpenKCAM and StreamInput APIs are currently in design and complement and
integrate with OpenVX
Any company is welcome to join Khronos to influence the direction of mobile
and embedded vision processing!
- $15K annual membership fee for access to all Khronos API working groups
- Well-defined IP framework protects your IP and conformant implementations

www.khronos.org
- ntrevett@nvidia.com

Copyright Khronos Group 2014 - Page 42

Você também pode gostar