Você está na página 1de 5

ARCHITECTURAL QUALITIES AND PRINCIPLES OF MULTIMODAL SYSTEMS

MS. R.GRACE , M.Sc., P.G.D.C.A, M.C.A, Mhil. Prof.,Department of Computer Science, Adaikalamatha college.
E-mail graceraju0303@hotmail.com

ABSTRACT
Multimodal systems process two or more combined user input modes in a coordinated manner with multimedia system output. The development of novel multimodal systems have been enabled by the myriad input and output technologies currently available, including new devices and improvements in recognitionbased technologies. Multimodal systems can be classified based on concurrency of information processing and sensory fusion. The design space includes both input and output modalities. Multimodal input system corresponds to the meaning level of abstraction. Modalities can be used simultaneously or sequentially on the temporal axis. Information from different communication channels can be used independently or combined at different data fusion levels. Four possible values Alternate , Synergistic, Exclusive, and Concurrent represent the classes used for multimodal systems taxonomy . The proposed generic platform for synergistic multimodal systems defines three components. Concurrency of data processing is achieved at different levels of data abstraction. The sensory information fusion defines three levels. Syntactic and semantic fusion requires a unified textual representation. The resulting advantage is that natural language processing methods can be applied to extract the semantics of human computer interaction.

INTRODUCTION
The growing interest in multimodal interface design is inspired largely by the goal of supporting more transparent, flexible, efficient and powerfully expressive means of human computer interface.Multimodal systems process two or more combined user input modes such as speech ,pen,manual gestures,touch,gaze and head and body movements in a coordinated manner with multimedia system output.This new class of interfaces aims to recognize naturally occurring forms of human language and behaviour,which incorporate at least one recognition based technology.The development of noval multimodal systems has been enabled by the myriad input and output technologies currently becoming available,including new devices and improvements in recognition-based technologies. Multimodality allows erroneous speech recognition to be screened out ,and ambiguous gestures to be resolved.Probabilities are associated with each modal input, the highest unified score interpretation being selected. This allows grammer based techiniques employed in natural language processing to be used in the semantic integration process. Multimodal interfaces are expected to be easier to learn and use ,and are preferred by users for many applications.They have the potential to expand computing to more challenging applications to be used by a broader spectrum of everyday people and to accommodate more adverse usage conditions than in the past.. Future adaptive multimodal- multisensor interfaces have the potential to support new functionality, to active unparalleled robustness and to perform flexible as a multifunctional and personalized mobile system.

ARCHITECTURAL QUALITIES AND PRINCIPLES


A framework for Multimodal systems was proposed by Nigay and Coutz.Multimodal systems can be classified based on concurrency of information processing and sesory fusion. They are the resulting design space along tthree dimensions,level of abstraction,use of modality and data fusion with two level variables. TYPICAL INFORMATION PROCESSING FLOW IN A MULTIMODAL ARCHITECTURE DESIGNED FOR SPEECH AND GESTURE
microphone pen glove laser hand Speech recognition

gesture recognition

Gesture understanding

Natural language processing

Connect management

Multimodal integration

Dialogue manager Response planning Application invocation and coordination

App1

App2

App3

The design space includes both input and output modalities.The level abstraction is a two-level variable, meaning,which applies to symbolic information extraction [such as in speech and gesture recognition],and no meaning,which applies to plain representation of i/o information.A multimodal input system corresponds to the meaning level of abstraction.Modalities can be used simultaneously or sequentiallyon the temporal axis.Information from different communication channels can be used independentlyor combined at different data fusion levels. Four possible values-Alternate,Synergistic,Exclusiveand Concurrent-represent the classes used for multimodal systems taxonomy.

The proposed generic platform for synergistic multimodal systems defines three components
DESIGN SPACE FOR MULTIMODAL SYSTEMS USE OF MODALITIES SEQUENTIAL COMBINED INDEPENDENT ALTERNATE EXCLUSIVE MEANING NO MEANING FUSION PARALLEL SYNERGISTIC CONCURRENT MEANING NO MEANING LEVELS OF ABSTRACTION

A -The low level interaction component, which denotes the software and hardware platforms related to the user computer interaction ,manages events from different media and perform lexical analysis. B-The presentation techniques component, which defines the application interface with the multimodal interaction objects. C-The dialogue controller, responsible for task-level sequencing, and syntactic\semantic multimodal fusion. The layered model for multimodal input in virtual environments was proposed by Asthemer [1994]. The low level interaction component techniques convert interaction items to application command language units. The interface dialogue controller processes command language units and generates application tasks. The three levels of sensory information fusion defined are a) Lexical fusion which is the binding of hardware primitives with software events. It involves only temporal issues, such as data synchronization. b) syntactic fusion represents the sequencing of information units to obtain meaningful commands. c) Semantic fusion ,which is the binding of information units at the functional level. Multimodal information blending can be obtained through syntactic and semantic fusion. Modality blending needs to be implemented at the lower levels to assure the separation of multimodal interface modeling from the application. syntactic and semantic fusion requires a unified textual representation of modal input. The resulting advantage is that natural language processing methods can be applied to extract the semantics of human computer interaction.

MULTIMODAL INPUT INTEGRATION AND SYNCHRONIZATION


Integration and synchronization of multimodal input consists of mapping the modal input data into a semantic interpretation.

Steps of multimodal input integration 1. Unified Representation


2. Temporal Alignment 3. Semantic Analysis The first step is unification of data representation. A unified representation of the modal input information stream considers each modality as a sequence of tokens contributing to the semantic of HCI. Each token is time stamp market. The second step the second step is data fusion. There are three levels of data fusion. At this level the tokens are aligned on the time axis and input to the semantic analysis module. The semantic analysis uses natural language processing methods to extract control sentences. Several mechanisms were proposed for multimodal integration , frames, neural nets and agents. Integration and synchronization of input modalities are thr foundation of intelligent user interface(IUI) design .modalities cognition generates intelligent interfaces which increase user interaction in VE. Software agent mediate interaction modalities in order to provide a natural interaction . Grammar based methods and expert systems added on top of the syntactic representation of multimodal input , extract the semantic of human VE interaction , enhancing the interface . According to Oviatt (1999) users have strong preference for multimodal interaction rather than unimodal one. This preference is most pronounced in the spatial applications domain. Experiments with a 2 D map application show that users are likely to express commands multimodaly when describing spatial information (location, number, size, orientation, or shape of objects). For many applications speech is not the primary input mode. User commands carrying spatial information are best sent to the computer through gestures Simple speak and point interfaces were considered too rigid and have a limited practical use. Robust multimodal architectures integrate the input modes synergistically for disambiguation of signals, they take advantage of modality-specific information content of redundancy, as well as of complementary input modalities.

MULTIMODAL ARCHITECTURES
Development of multimodal systems presents several hardware and software integration challenges. There is no toolkit available to help developers integrate different hardware interfaces for multimodal interaction. Even in cases when the input device has an associated API, integration is not easy. Most of the APIs were not developed for multimodal input interfaces, and have limitations when used as research tools. Additionally, real time speech recognizers gesture recognizers and haptic rendering systems represent a significant computation load . These require either multi-processor single platforms or distributed computing platforms. The main workload on a multimodal system typically run on separate machines. Input devices attached to their own computation platforms running a driver and data processing software communicate through a network interface.

VE with multimodal feedback


Most of the early VR systems aimed at improving the quality of the visual channel since it provides additional simulation cues and increased user interactivity. 3D sound rendering was incorporated in a headmounted display by Whenzel et al. This system displayed 3D soundscapes correlated with the 3D graphics environment. Architectural acoustic for CAD\CAM uses direct room geometry and incorporated absorption\reflection properties. Storms et al implemented a 3D sound system to be used in the distributed virtual environment of NPSNET.

Several medical applications incorporate force feedback to enhance the user-VR interface. A dexterous haptic interface was used for liver palpation training for the detection of malignancies. Another medical application is a training system for the diagnosis of prostate cancer. The VR simulator of a prostate cancer palpation used a PHANTOM haptic interface coupled with an SGI workstation rendering the graphics and the force feedback. The blood vessels were modeled using the Visible Human dataset and TELEOS. Physical modeling was used to create realistic physiological response including varying blood flow, heart rate and contractility, puncturing the blood vessel and arteriosclerotic plaque.

CONCLUSION
Multimodal interfaces are just beginning to model human like sensory perception. They are recognizing and identifying actions language and people that have been seen , heard, or in other ways experienced in the past. They literally reflect and acknowledge the existence of human users , empower them in new ways , and create a voice. They also can be playful and self-reflective interfaces that suggest new forms of human identity as we interact interact face to face with animated persons representing our own kind. In all of these ways novel multimodal interfaces, as primitive as their early bimodal instantiations may be, represent a new multidisciplinary science , a new art form, and a social-political statement about our collective desire to humanize the technology we create.

REFERENCES
www.nuance.com/ www.phaware.com/calligrapher/index.html www.guir.berkeley.edu HTK speck platform www.htk.co.uk/. adojoudani, a ., & benoit, C., (1995). Audio- Visual speech recognition compared across two architectures..Euro speech conference ,Spain Bangalore, s., & Johnson, M.(2000) integrating multimodal language processing with speech recognition International conference on spoken language. Bers, U., Miller, s., & Makhoul, J., Designing conventional interfaces with multimodal interaction.

Você também pode gostar