Escolar Documentos
Profissional Documentos
Cultura Documentos
OScreen Interaction
Karampis Panagiotis
Supervisor: Sebastian Boring
Department of Computer Science, University of Copenhagen
Universitetsparken 1, DK-2100 Copenhagen East, Denmark
lwh738@diku.alumni.dk
April 2015
ii
Abstract
Modern desktop paradigms are operated through a set of keyboard combinations,
mouse clicks even mouse pad gestures, in which users are tied with and naturally
provide after so many years of usage a fluent interaction. Despite the vast evolution
of the available ways used to interact with Virtual Reality, the fundamental principle
of interaction remains always the same: usage of the concrete, well-known physical
devices (keyboard, mouse) attached to the computer. We present Oscreen Interaction, a system to utilize the spatial space around the screen, which serves as windows
storage area while we interact with the computer screen through a pluggable, gesturerecognizing device. The aim is to comprehend how users react to the existence or lack
of any form of visual feedback and whether grouping windows while positioning affects the performance when no feedback is given. In a user study, we found that the
most efficient and eective way of interaction was when visual feedback was given;
in the case of no visual feedback, we observed that participants achieved the highest
performance by grouping windows or applying some subjective methodology.
iii
iv
Acknowledgements
To family and Lela who endlessly supported through all this eort...
vi
Contents
1 Introduction
1.1
Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
1.4
Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Related Work
2.1
2.2
2.3
10
2.4
Mid-air Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3 Design
13
3.1
Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.2
14
3.3
Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.4
Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.5
Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4 Implementation
17
4.1
Application Description . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4.2
18
vii
4.3
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.4
20
4.5
Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
4.6
22
4.7
24
4.8
Application Identification . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.9
25
4.9.1
25
4.9.2
Implementation of boxes . . . . . . . . . . . . . . . . . . . . . .
26
27
29
5.1
Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
5.2
Experiments Procedure . . . . . . . . . . . . . . . . . . . . . . . . . .
30
5.2.1
30
5.2.2
31
5.2.3
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.2.4
Demographic Information . . . . . . . . . . . . . . . . . . . . .
33
5.2.5
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
5.3
6 Results
37
6.1
Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
6.2
Error Rate
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
6.3
Peculiar Observations . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
6.4
Subjective Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
7 Discussion
43
45
8.1
viii
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
8.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Code snippets
46
53
53
53
53
54
54
54
55
55
56
57
59
ix
List of Figures
1.1
1.2
1.3
1.4
User drag and drop the window in the o-screen area. Foreground
is occupied by another window. . . . . . . . . . . . . . . . . . . . . . .
3.1
14
3.2
15
3.3
16
3.4
16
4.1
19
4.2
19
4.3
20
4.4
21
4.5
22
4.6
23
4.7
23
4.8
26
4.9
27
1.5
xi
xii
5.1
29
5.2
29
5.3
32
5.4
32
5.5
32
5.6
33
6.1
39
6.2
Participants 1 grouping. . . . . . . . . . . . . . . . . . . . . . . . . . .
40
6.3
41
6.4
42
List of Tables
5.1
5.2
31
5.3
Demographic information . . . . . . . . . . . . . . . . . . . . . . . . .
33
6.1
37
6.2
38
6.3
39
xiii
xiv
Chapter 1
Introduction
Window switching is one of the most frequent tasks a user performs and can occur
several hundred times per day. Numerous window operations are performed when we
work and run multiple windows, such as moving, resizing and switching. Management
of such activity, would help enhancing users computer experience and eectiveness.
Window switch can be evolved as a really complicated task, since for example a
developer may need to switch to browser and look for documentation, switch back
to IDE to write code, switch to terminal test and push the code, check for emails
and update completed tasks. Window switch is unavoidable, even on larger screens
users tend to consume all available screen space and create even more windows to
navigate to [23]. Generally, users rely on the Operating Systems window manager to
provide a convenient way to manage open windows according to their needs so that
they are easy to retrieve. Switching windows is divided into two subtasks: find the
desired window and then bring it to the foreground. Widely accepted techniques to
achieve both subtasks are performed either directly with mouse or with keyboards
keys combinations.
1.1
Problem Description
Interaction with objects scattered across the screens using the mouse can be troublesome as mouses movement is limited to the area of desk. In addition, it gets even
more problematic when drag and drop is involved, where one miss-click restarts the
process. Furthermore, mouses cursor is limited to the screen thus any interactions
are restricted within this plane. 3D interaction with screen has been researched and
several input devices have been developed that enable users to manipulate virtual
reality (VR), for example virtual hand and depth cursor techniques. Such techniques
-along with corresponding input devices- have been considered as inadequate, appropriate to use only under specific circumstances. Additionally, the hardware used is
way too expensive such as systems used to track bodies in the cinema industry [53],
[42].
1
Several Window Managers (WM) have been implemented to visualize this process.
Windows 7 presents each window on its own frame even if they are windows of the
same application. Macs mission control tiles opened windows so that they are visible
at once and simultaneously stacks windows of the same application, for all applications
that belong to the same desktop or workspace. Gnome 3 assigns dierent windows on
its own frame but presents windows under the same application stacked. The way to
navigate between open windows varies between Window Managers. The combination
of Alt+Tab (Cmd+Tab) is across all systems for consistency reasons, although cycling
between windows of same application varies from one Window Manager to another.
However, Robertson et al. [47] showed that stacking windows under same application
confuses many users, because the applications windows may not be related to the
same task, where a task is defined as a collection of applications organized around a
particular activity.
1.2
We propose a dierent technique for switching between tasks and involve humancomputer interaction (HCI) in the function of windows switching. As there are many
HCI patterns including body posture, hand/finger gestures, speech recognition, eyes
movement and so on, we chose to use hand gestures as they are the most trivial and
easy for users to get familiar with.
Using MacOSXs Mission Control to switch between applications can be cumbersome.
For example, windows are re-arranged every time another window gets focus or a new
one is created. In the following scenario the user is writing a document, therefore
the application Pages is selected, then user wants to switch to Sublime and activates
the Mission Control by pressing F3 and selects Sublime from the upper left corner.
Figure 1.1 shows the windows initial arrangement.
At that point, user wants to switch back to the Pages and presses the F3. Figure 1.2
shows now a completely dierent screen, where the user has to gaze and find where
2
From that point and after, switching between Pages and Sublime does not trigger
re-arrangement, unless a third task interleaves. Then again, user will have to find the
correct window in a rearranged list.
We experienced also this behavior by using the common paradigm of Cmd+Tab: after
switching between tasks using keyboard, Mission Control has rearranged the windows.
Preliminary experiment conducted with users both familiar and unfamiliar with MacOSX, showed that users were frustrated by the mechanic, were unable to understand
the pattern and stated that they expected pattern of windows order to be clear on
first sight rather than having to spend time into investigating how windows are arranged. All users reacted positively in arranging windows in a linear form such as in
rows or columns.
The contribution of this research is to design a new Human-Computer interaction
utilizing gestures in the oscreen spatial space. This would allow a more eective
task-switching paradigm by placing and retrieving desired windows in the spatial
space around a computer screen. This spatial space is defined by the cone area where
the LEAP Motion operates in. Figures 1.3, 1.4, 1.5 demonstrate this functionality.
Figure 1.4: User drag and drop the window in the o-screen area. Foreground is
occupied by another window.
Figure 1.5: Window has been dragged in the oscreen area is now available to be
retrieved by reversing the process.
1.3
the term 2.5D we mean that although we interact with a typical 2D computer screen, the
actual pointing is happening in a 3D space, thus the term 2.5D
Another factor that would improve vastly the performance and the eectiveness
is grouping according to subjective patterns or following some specific subjective
methodology when positioning windows in the oscreen area. As we explain
in the experiment 5.2.3, we would like to see the existence (or not) of such
formations; which we consider would give us directions for future research.
1.4
Thesis Structure
The structure and the content of the thesis is shortly described below. This work is
consisted of 8 Chapters, including this chapter, which are structured as follow:
Chapter 2: Publications related to several forms of interaction (Secondary
means 2.1, Virtual Space management 2.2, using space around displays 2.3, midair interaction 2.4), are presented. We quote papers that are highly relevant to
our work and others that are in similar field, as this study merges a variety of
considerations. In this chapter we target to help the reader comprehend current
limitations and current approaches in the area as well as introduce the reader
to this field.
Chapter 3: We present the design of our implementation and analyze the considerations taken behind the Oscreen Interaction while backgrounding choices
made with scientific facts. The major design choices (oscreen design, gesture
choice and feedback modes), which are listed and explained, will help the reader
to understand better our work and act as a preliminary but required step before
reading the implementation.
Chapter 4: We present the implementation of the prototype application built
to test the Oscreen Interaction. We provide thorough details on the way it
works, architecture and workflows as well as on the way the prototype was
built. Small code snippets can be found in this section while the functions that
we consider as the most important can be found in the appendix A.
Chapter 5: All the information regarding the User Study we performed, are
included. We present and explain the phases of the experiment along with its
modes for each participant that took part in the study. Statistical demographic
results are also shown in the end of the chapter.
Chapter 6: We present the results of the User Study, separated in corresponding categories. Each category represents a specific metric we tested. We also
provide a form of visualization through graphs and state our observations after
investigating the log files containing data regarding positioning and order of
windows during the experiment.
Chapter 7: First we set our hypotheses on the results. Secondly we discuss
the results based on the findings of Anova analysis performed previously and
on the participants assessment. Finally we comment on whether the overall
concept and the hypotheses assumed hold.
5
Chapter 8: In this chapter, the final conclusions are presented along with
identified and proposed research areas for future work.
Chapter 2
Related Work
The work described in this thesis aims to develop a prototype model for interacting
with computer displays using an external device that bridges humans 3D environment
with the desktop metaphor of the computers screen for performing task switching
operations.
This chapter provides an overview of existing techniques, technologies and implemented frameworks. Although there is no clear distinguish between the several themes
mentioned below, we categorized existing work into four themes: Using secondary
means, space around displays, virtual space management and mid-air interaction.
2.1
Interacting with a display by using secondary means indicates that the user can
manipulate and interact to what is shown in a display through another interface.
Such interface can be a mobiles, PDAs, touchpads screen or a tabletop by using
gestures with the dominant or both hands.
A secondary display, such as the PDAs display, allows a user to remotely switch
between top-level tasks and scroll in context using the non-dominant hand while
dominant hand operates with mouse.[39] While the system provides such advantages,
it is imperative to indicate that switching tasks through a PDAs screen can be cumbersome and prone to errors of not selecting the desirable task due to small resolution
size. Although current technology enhances PDAs with higher resolution, fast switching between tasks is limited by having to focus on PDAs screen, identify the correct
task from the list, select it and focus back on computers screen.
A numerous researchers have explored the manipulation of objects within the design
and animation computer area. Multi-point touch tablets have existed for long ago[34],
yet the Multi-finger Cursor Techniques[38] -implemented on a modified touchpad- and
Visual Touchpad[36], which projects hands gestures into a computers screen, allow
7
2.2
Virtual space management refers to the concept of enhancing user experience (UX)
by applying algorithms and operations in existing or newly opened windows in the
desktop. Several researches have been conducted on optimizing the way windows
resize, categorize windows according to user need or even provide their own desktop
metaphor.
The pioneer in that area, Rooms[21], is a system that similar tasks are assigned in
a separate virtual space (room) whilst sharing of windows between similar tasks is
supported. The Task Gallery: A 3D window manager[48], a successor of Rooms,
provides a 3D virtual environment for window and task management where a 3D
stage shows the current task while other tasks are placed in the edges of the stage
8
namely in the floor, ceiling and walls. Each task comprises of its related windows.
To allow users to have an overview of open tasks, a navigational system is introduced
where users can go back or go forward and thus see more tasks or focus more on one.
While the system provides advantages and animation techniques, there are also disadvantages with respect to aiding users who desire to complete many dierent tasks
simultaneously in a small time span. If for example a user wants to switch between
two or more tasks, they will have to move backwards until they find the required task,
select it, find the desired window from the loose stack and operate on it. To switch
to previous task the user will have to reverse the whole process. We can see that the
complexity of such navigation increases according to number of tasks the user needs
to perform. Furthermore the time required to switch between tasks is increased by
the one-second animations between moving forward/backward and opening/closing
tasks.
Elastic Windows[29] and New Operations for Display Space Management and Window Management[25] endorse the need for new and advanced windowing operations
for users with many tasks. The Elastic Windows implementation is based on hierarchically managing and displaying windows with root window to be the desktop. The
system supports inheritance where some characteristics of the parent window can be
inherited to its children. The philosophy of the system is that windows under the same
parent cannot overlap but they consume all the available space. Some side-eects of
this philosophy are that when one window resizes, all of its children and windows of
same parent are also resized leading to a whole desktop re-arrangement with only one
resize. Another side-eect directly arisen from the no overlapping rule is that while
the number of child windows increases, the eectiveness of displayed information decreases. Similarly, QuickSpace[24] automatically moves windows so that they will not
be overlapped by a resizing window. All techniques rely on the existence of empty
space, which may not often be available even on multiple monitor systems as shown
by Hutchings et al.[23]
Scwm[8] is a parametrized window manager for the X11 window manager and is
based on defining constraints between windows. These constraints are presented as
relationships between windows and are user defined. While this system provides some
advantages such as operating on two related windows as one, it is susceptible to multitasking requirements where the number of windows increases so does the number and
complexity of relationships that have to been defined and maintained. Yamanaka et
al.[60] approach the virtual space management in a dierent way. Instead of creating
algorithms to adjust window space upon creation new window or resize eect, the
Switchback Cursor exploits the z-index axis of overlapped windows on a Windows 7
operating system. Mouse cursor upon specific movement in conjunction with specific
keyboard press, traverses and selects windows that are below the visible one.
One approach to address that users work on dierent tasks in parallel and switch
back and forth between dierent windows is to analyses user activities and assign
windows to tasks automatically based on if windows overlap or not[59]. Another
approach addresses the users fast switch by analyzing the window content, relocation,
2.3
The theme, space around displays, refers to a concept where the physical (empty)
space close to the interaction target, such as a computer or mobile screen, is used in
conjunction with external sensors like depth cameras and hand gesture recognizing
devices to provide an interaction channel the between human and the computer.
The Unadorned Desk[20] which is our inspiration, is an interaction system that utilities the free space in a regular desk enhanced with sensor. It virtually places the
o-screen and secondary workspace onto the desk providing more screen space for
the primary workspace and thus acting as an extra input canvas. The experiments
conducted with the unadorned desk showed that interaction with virtual items on
the desk is possible when items are of reasonable size and number with or without
on-screen feedback.
The usage of a Kinect depth camera mounted on top of the desk limits the mobility
of the system and requires that user is pinned in a specific place in the desk. As
Kinect camera by nature is prone to false detections when sunlight, the camera has
to be placed attentively. Although the system works well when few items are virtually
placed in the extra input canvas, the desks surface has to be clean of physical objects
which is not always a valid case as desks tend to be messy.
Virtual Shelves[35] and Chen et al.[12] combine touch and orientation sensors to
create a virtual spatial space around the user that allows invocation of commands,
menu selection and mobiles applications interaction in an eye-free manner. Wang et
al. [55] demonstrated the benefits of using hand tracking for manipulation of virtual
reality and 3D data in CAD environment using two webcams to track 6DOF for both
hands. However, such tasks are restricted to controlled, small areas targeting specific
frameworks (CAD). Usage of Kinect camera is widespread when comes to augmented
and virtual reality. MirageTable[9], HoloDesk[22], FaceShift[57], KinectFusion[26] is
a set of applications supporting real-time physics-based interactions however as noted
by Spindler et al.[51], tracking with depth cameras, still has limited precision and
reliability. SpaceTop[33] also uses a depth Kinect camera but also a see-through
display. Although it allows 3D spatial input, a 2D touchscreen and keyboard are
also available for input. The unclear visual representation for guidance as noted by
authors is a subject that we need to take seriously. Physical space requires extra
sensors, extra cameras and most importantly to be free of several objects. Our goal
10
is to move the interaction area from solid objects such as a desk into a virtual area
around the screen and thus eliminate usage of additional sensors and cameras which
make the system less portable.
2.4
Mid-air Interaction
There are occasions where interaction with displays has to be done from greater
distance than standing in front of the screen. By mid-air interaction we mean that
the communication channel between user and display is the air/empty space and done
through usage of laser pointers, Wiimote, virtual screens or even worn gloves.
Uses of mid-air interaction varies from point and click to manipulation and selection of
3D data. The latter, was implemented by Gallo et al[14] using a Wiimote controller
to manipulate and select 3D data in a medical environment. Even if system was
not evaluated, the system was able to dierentiate between two states: pointing and
manipulation. In the pointing state, the Wiimote acts as a point and click laser
pointer whilst in the manipulation state, Wiimote interacts with a 3D object with
available actions of rotating, translating and scaling.
In [28], the authors utilize both Kinect depth camera and skeletal tracking[30] to
identify pointing gestures made by a standing person in front of the Kinect camera.
The spatial area in front of the camera is considered as virtual touch screen where
the user can point. To detect the direction of the pointing gesture, they detect and
track the pointing finger using a minimum bounding rectangle and Kalman filtering.
Interaction techniques using laser have the advantage of low cost and compose the
best known perspective-base technique. A laser beam can act as a control for point
in a multi-display system. Simultaneously, laser beams suer from the limitation
that there are no buttons to augment the single point of tracked input rendering
mouse operations impossible. Additionally, laser pointer techniques suer from poor
accuracy and stability [41], and can be very tiring for sustained interactions. Olsen
et al[41] proposed a non-direct input system and explored the use of dwell time to
perform mouse operations. However, the installation cost and complexity of most
systems are prohibiting when increasing scalability.
Kim et al.[32] tried to approach these limitations by researching ways to embed sensor in body, specifically a wrist-worn device, which recognizes body movements thus
reduce the need for external sensors. The wrist-worn device consists of a depth camera combined with an IMU which recognizes preachingly fingers movements through
usage of biomechanics.
Laser interaction doesnt allow generally recognition of gestures and although Jing et
al. [28] implemented a point and click system, that system is stationary. Wiimote
suers from the lack of identifying finger gestures whilst worn sensors and gloves
require that user embeds an external device upon skin, which might be uncomfortable.
We propose a system that is mobile, with no skin attached devices which identifies
11
hand gestures for both hands and in extension provides the required functionality of
augmented buttons if that is required.
2.5
Summary
We have seen that there is a variety of ways; among them air, third interfaces and
cameras in order to interact with a computer screen, using a huge range of methods
and tools such as worn devices, fingers biomechanics, laser pointers, mobiles, game
controllers and so on. Each aforementioned theme has situational advantages and
drawbacks and comes with implementations that provide an alternative user experience and interaction. The interaction with the computer screen comes at a cost of
introducing either rather complex, non-mobile interaction systems or ways of window
management that rely on algorithms and not on users desires.
We thus propose a work that combines features from the virtual space management
and mid-air interaction by keeping the interaction mean (mid-air) as simple as possible. At the same time we provide a trivial window managing metaphor of select and
then show or hide, giving the user the ability to resize and position windows the way
they want.
12
Chapter 3
Design
On this chapter we describe the considerations to follow in an Virtual Reality interaction according to our needs, the oscreen area, gestures, feedback modes and choice
of framework that supports the interaction.
3.1
Considerations
Selection on screen: The initial event which enables the interaction by identifying the collision of a virtual object, any -partially- visible window, with the
mouse cursor upon a grab gesture.
Selection o screen: The initial event which enables the interaction by identifying the the hand coordinates in the oscreen area upon a grab gesture.
Drop in oscreen: The second stage event that upon release gesture in the
o screen area removes focus from the selected window (hide).
Drop in screen: The second stage event that upon release gesture in the main
screen area gives focus to window selected (show).
Select o screen & drop o screen: A two stage event that cognitively
moves an already hidden window to another box in the oscreen area allowing
organisation of windows.
Worth mentioning is that since our system is considered as a 2.5D pointing system
with no need to implement interaction in the Z axis, we employ manipulation of
virtual objects with four degrees of freedom (4DOF), namely up / down, left / right.
13
3.2
The interaction box of the physical input device provides us enough space to extend
cognitively the screen area on the top side; space that serves for cognitively saving
data for assigned windows. This area has as much width as the screen width and is
scalable1 to the screen proportions.
Smith et al. [50] observed that the average number of open windows a user keeps per
session is 4 on single display while the dual monitor users keep 12 opened windows.
Based on such observations we chose to divide the oscreen area in 8 boxes thus
allowing to save state for 8 windows.
3.3
Gestures
Gestures implemented are classified as concrete. They are evaluated after they have
been completely performed, e.g. Selection on screen is only valid when hand from
1 Up
14
to a limit
Figure 3.2: Grab gesture sequence. Release sequence is the reversed grab.
Furthermore, mouse cursor can only be under one window, thus only one window
can be identified per grab. In extension to this, interaction is performed only by one
hand. Either the dominant or the secondary, according to user preference.
15
3.4
Feedback
We have designed the application to operate in three dierent modes: Full feedback,
single feedback and no feedback.
The full feedback mode, as shown in Figure 3.3 , provides a window with information
about all boxes, which shown when hand is in the oscreen area. The selected box
is visually identified by red border and in case of a window is cognitively saved, a
screen shot taken at the grab event is shown to help the user identify the window
when required.
The single feedback mode (Figure 3.4), provides a window with information about
the specific box which is associated with current hand position in the oscreen area.
Apart from the screen shot shown, user gets extra visual help by having as window
title the applications title e.g. Chrome.
The no feedback mode provides no informational window nor any other information
when the hand is in the oscreen area.
3.5
Frameworks
The nature of this research is such that forces us to use low level programming languages, frameworks and libraries as close as possible to the operating system while
using up-to-date programming paradigms (Object-oriented programming). For such
reason languages that would require a wrapper to access native calls such as Java or
languages mainly targeting web development such as JavaScript have been excluded
from consideration even if the physical input device supports them.
16
Chapter 4
Implementation
Oscreen Interaction is an application that operates on a standard Mac with any
screen size which runs OS X version 10.7 or higher. Oscreen Interaction cannot be
ported to Windows or Linux based systems because of a) OS X native libraries are
used and b) targets as a replacement of Mission Control.
The implementation has been developed in Objective C language using Xcode. Xcode
was developed by Apple for both OS X and iOS, which contains on the fly documentation for all libraries alongside with other utilities that serve to enhance coding
efficiency and experience such as debugger, tester, profiler, analyzation tool etc. Introduced at June 2, Swift is a new programming language for coding Apple software.
Xcode supports also developing of AppleScript which stands for a scripting language
built into OS X since version 7.
Important code snippets that were crucial for the implementation can be found in the
Appendix A.
4.1
Application Description
Oscreen Interaction is an application which has been built to test the capabilities
and the limits of interacting with desktops applications. It aims to provide an area
above the screen plane where user can interact with, in order to organize and switch
between windows. The interaction with the user is performed through hand gestures
received by a motion tracking input device. Although Oscreen Interaction is not a
complete application, it embeds all the required infrastructure needed to support the
new interaction we are proposing. The basic scenarios are the following:
When user grabs a window and releases in the oscreen area we should be able
to identify which window this is, save information in memory and hide it.
When user grabs a window from the oscreen area and releases it in screen, we
should be able to identify which window it was by checking memory data and
17
4.2
Objective-C [40]. Primary programming language for developing OS X applications. Objective-C as name states is an Objected-oriented Programming
language (OOP), successor of C++.
Cocoa [13]. High level API that combines three other frameworks: Foundation,
AppKit and CoreData included by header file Cocoa.h that automates many
application features to comply with Apples human interface guidelines.[50]
Quartz [46]. Provides access to core to lower graphic services, composed by
Quartz 2D API and Quartz Extreme windowing environment.
AppleScript [7]. Scripting and narrative language for automation of repetitive
tasks.
Accessibility API [5]. Extra libraries targetting to assist users with disabilities.
Leap SDK ver 2.2.2 [49]. Provides access to the physical input device chosen
for this application (LEAP Motion) through Objective-C calls.
Carbon (Legacy) [6]. Legacy API, acting as bridge between Cocoa and Accessibility API.
4.3
Hardware
For identifying gestures and bridging the real with virtual space, the Leap Motion
Controller is used. The controller is a small peripheral 3 x 1.2 x 0.5 inches no much
heavier than a common USB stick. It utilizes two cameras and three infra-red LEDs
18
serving as light sources to capture motion and gesture information. The system is
capable of tracking movement of hands, fingers or several other objects within an area
of 60cm around the device in real time. It can detect small motions and has accuracy
of 0.01mm [15].
The small cameras that Leap Motion Controller comes with cannot extract as many
information as other systems that come with large cameras. Because embedded algorithms extract only the data required, the computational analysis of images is less
considerable and therefore the latency introduced by the Leap Motion Controller is
very small and negligible. The fact that the controller is small and mostly software
based, makes it suitable for embedding it in other more complicated VR systems.
Figure 4.1: Leap Motions size and upon operation of hand skeletal tracking.
Although a few details are known regarding algorithms and its advanced principles
for the Leap Motion, as is protected by patental restrictions, Guna et al. [16] attempt
to analyze its precision and reliability for static and dynamic tracking. Official documentation [1] states that it recognises hands, fingers, and tools, reporting discrete
positions, gestures, and motion.
J. Samuel [27] categorizes the Controller as an optical tracking system based on stereo
vision principle, where the interaction box of the controller is shown in the Figure
4.3 below. The size of the InteractionBox is determined by the Leap Motion field of
view and the users interaction height setting (in the Leap Motion control panel). The
controller software adjusts the size of the box based on the height to keep the bottom
corners within the filed of view. [1] with a maximum height of 25cm.
Figure 4.3: Leap Motion Interaction box: A reversed pyramid shape formulated by
cameras and LEDS. [2]
4.4
The Leap Motion runs over USB port as a system service that receives motion tracking
data from the controller. Using dylib (dynamic libraries) on Mac platform exposes
these data to the Objective-C programming language. Furthermore, the software
supports connections with a WebSocket interface in order to communicate with web
applications.
Figure 4.4, shows the architecture of the Leap Motion Controller, which consists of:
Leap Service, receives and processes data from the controller over the USB
bus and makes data available to a running Leap-enabled application.
Leap control panel, runs separately from service allowing configuration of the
controller, calibration and troubleshooting.
Foreground application, receives motion tracking data directly from the service while application has focus and is in the foreground.
20
4.5
Basic workflow
module oer the variation between the three feedback modes, thus allowing user to
get familiarized first with the cognitive oscreen area before actually conducting the
experiment.
4.6
On this work, we have implemented an algorithm that performs actions of hide / show
of opened applications as well as organizing them in the oscreen area based on hand
coordinates and the two gestures of grab and release. Figures 4.6 and 4.7 demonstrate
how algorithm works in general before getting into details on the next sections. The
algorithm starts upon hand is recognized by the Leap service and executes per frame
based on tracking data given (coordinates and handStrength).
It is vital to refer to an exposed variable by the API named handStrength, part
of the LeapHand class, which indicates how close a hand is to appear as a fist
by measuring how curly fingers are. Based on that, we can say that fingers that
arent curly will reduce the grab strength and therefore reduce the probability of
identifying a grab gesture whilst fingers that are curly will increase the grab strength
and therefore reduce the probability of identifying a release gesture. This variable has
domain values [0..1] and experimentally decided that value >=0.8 indicates a grab
22
and <=0.2 indicates a release, allowing us to identify the gesture in real time and not
only when a gesture suddenly appears.
Figure 4.6: Gesture algorithm workflow: From initial state to grab state.
Figure 4.7: Gesture algorithm workflow: From grab state to release state.
23
4.7
Leap Controller provides coordinates of fingers in units of real world (mm), thus
it is vital to translate coordinates in screen pixels, according to screen resolution.
The SDK provides methods to normalize values in the [0..1] range and get screen
coordinates. We need to keep in mind that top left corner in OS X is (0,0) whilst the
(0,0) in the Leap is on bottom left, thus we need to flip the Y coordinate.
L e a p I n t e r a c t i o n B o x * iBox = frame . i nteracti onBox ;
LeapVector * n or ma li z ed Po in t = [ iBox normaliz ePoint : leapPoint clamp : YES
];
int appX = n or ma l iz ed Po i nt . x * screenWidth ;
int appY = n or ma l iz ed Po i nt . y * screenHeight ;
appY = screenHeight - appY ;
Having the screen coordinates, it is then trivial to control the mouse with hand:
CGEventRef move1 = C G E v e n t C r e a t e M o u s e E v e n t ( NULL , kCGEventMouseMoved ,
CGPointMake ( appX , appY ) , k C G M o u s e B u t t o n L e f t ) ;
CGEventPost ( kCGHIDEventTap , move1 ) ;
4.8
Application Identification
The next algorithm shows the steps to identify a window under mouse cursor, after a
grab gesture is performed in the in-screen area:
GrabI sFinish ed AND isInScreen
Get mouse location using Cocoa
Convert mouse location to CGPoint using Carbon API
Get AXUIEle mentRef by CGPoint using Accessibility API
Extract application PID from AXUIEle mentRef
Extract application Title from AXU IElement Ref
Generate image of Application given then Window Title .
Scan windows in screen
If k C G W i n d o w O w n e r N a m e is equal to extracted Title
Get the windowID
Create bitmap of application by windowID
Save data in custom object for using upon release .
As we can see, three APIs (Cocoa, Carbon, Accessibility) are cooperating for a single
window identification.
24
4.9
In the Design chapter(3.2), we defined abstractly the oscreen area. Now, we show
how this cognitive area is implemented by combining screen coordinates and the Leap
Controllers coordinate system.
4.9.1
In section 4.7 we normalize hand coordinates to fit screen dimensions, thus we limit
X and Y values within the screen. In order to overcome this, we also save the hand
coordinates (X,Y) values to variables before the normalization takes place. Given
that the interaction box of Leap starts at 82.5mm which corresponds to pixel 0 and
ends at 317,5mm which corresponds to screen height (fig. 4.8), we can say that prenormalized values higher than 317,5 refer to pixels cognitively greater than the screen
size. Indeed, using the mathematical formula p = (yk y0 ) yn h y0 where p is the
(cognitive) pixel, yk is the hand position in millimeters, h is the screen height (=900)
and y0 ,yn the bottom and upper height of the interaction box, proves that for hand
position of e.g. 350, we refer to the pixel 1024 which in our occasion is in oscreen
area.
Having explained mathematically the translation from pixels to millimeters, we can
now define the oscreen area code-wise by applying if statements:
if ( appY ==0&& yy >= LOWER_BOUND && yy <= UPPER_BOUND && appX >0 && appX <
screenWidth )
4.9.2
Implementation of boxes
We thus have the unique coordinations of any box, allowing us to build an ObjectiveC class and store it in a NSMutableDictionary using these unique coordinations as
identifying key. The instance of the box class (Figure 4.9) is created upon a release
gesture in order to save specific windows data such as title, PID and screen shot
which were generated when user performed a grab in an open window.
4.10
Having defined key algorithms and workflows above, we can now define the abstract
workflow of our implementation as shown in 4.4.
Initi alizati on
Load modules ( as described in Chapter 4.5)
End
Main Loop ( per frame ) onFrame ()
While hand found do
Move mouse according to normalized hand position
If hand is in offscreen area AND feedback mode is not none
Show feedback window
End
Measure hand grab strength
When grab strength <0.2 OR >0.8 gesture is finished
Get gesture and position of event
If gesture was grab
Apply algorithm as described in Fig .4.6
End
If gesture was release
Apply algorithm as described in Fig .4.7
Free resources
End
End
End
End
27
28
Chapter 5
5.1
Experimental Set-up
In order to interact with the content and evaluate the interaction, we have set-up a
testing area on which a fixed Mac with 17 retina display placed in a table, easily
accessible by the participants sitting in front of it. In front of the laptop, the Leap
Motion controller is placed in a way that potential cable twist will not lean device
forwards or backwards aecting user experience.
The device is attached either on the left or on the right side of the laptop depending
to participants dominant hand and their preference.
5.2
Experiments Procedure
In this chapter, the procedure of the experiment is discussed and thoroughly analyzed.
We conducted a user study to test and analyze the OScreen Interaction. Each user
participating in this study, should follow the instructions given to them, complete the
experimental part and finally provide us an assessment. The procedure followed for
each participant consist of the following parts.
Welcome - Brief explanation of interaction - User Learn Mode - Experiment - Demographic information - Assessment
These parts will be discussed, interpreted and visualized with appropriate figures.
Duration of each experiment was calculated to last around 25 minutes plus time
for the instructor to explain the concept and the participant to perform the final
assessment. We ended up with a study that requires approximately 40 minutes.
Though, the time required for the whole procedure to be completed was variant, as
was depended on how much time the participant used in the User Learn Mode in
order to feel comfortable with the oscreen area. We also logged data other than
timing and errors, data that might help us understand potential position patterns of
windows placed in the oscreen area.
5.2.1
In the beginning of the experiment the experimenter introduces himself, welcomes the
participant and explains their rights. Participants are allowed to withdraw from the
experiment at any time they want or if they feel uncomfortable or tired. After that
we thank them for participating and helping us in this study and finally introduce
them to the purpose of this study, as well as stating which are the goals we are trying
to achieve.
At that point, we start explaining the various aspects of the experiment. We explain
first, the ways they can interact with the system and explained the two gestures.
Afterwards the three feedback modes were thoroughly presented in conjunction with
an in-depth explanation of the oscreen area. Before proceeding to the actual experiment, we made sure that the participant understood the concepts of the interaction
by encouraging them to use the User Learn Mode (5.2.2) in all feedback modes
and familiarize themselves before the experiment.
Furthermore, participants were asked to get a close look to the specific applications
that they will operate on, as our prototype application could take screen shots of the
majority of applications but not all. Thus, we were forced to use specific applications,
some of which, participant might not be familiar with.
30
5.2.2
For reasons we will explain below (5.2.3), it is vital for the participant to spend a
few minutes in the User Learn Mode, where no timings and errors are logged. In this
part, user takes in turn the Full Feedback, Single Feedback and No Feedback mode
while they get brief explanation of screen visualizations and their components where
this is applicable. When participants were ready, they could initiate the experiment
by pressing the ESC key.
At this point and afterwards, the experimenter interferes no more with the participant
as on screen instructions are given.
5.2.3
Experiment
Principles
The experiment itself consists of cycling between the three feedback modes where on
each feedback, each participant is asked to operate on a random windows sequence.
When the experiment is initiated, the three modes follow the Balanced Latin Square
algorithm [52]:
Participant
1
2
3
First Mode
Full Feedback
No Feedback
Single Feedback
Second Mode
Single Feedback
Full Feedback
No Feedback
Third Mode
No Feedback
Single Feedback
Full Feedback
Table 5.1: Feedback cycling for the first 3 participants. Afterwards, it repeats itself.
The reason for encouraging participants to use time in the User Learn Mode, is that
even numbered participants (2,4,6,8), would have to conduct the experiment starting
with the No Feedback mode, a mode that is considered hard especially if one does not
understand position of boxes in the oscreen area.
The same behavior is applied on the window list. The windows to operate on, are
stored in a 8x8 structure where each row contains window titles following the Balanced
Latin Square algorithm as shown in Table 5.2.
#
1
2
3
4
Wnd
Wnd
Wnd
Wnd
1
2
3
4
Wnd
Wnd
Wnd
Wnd
2
3
4
5
Wnd
Wnd
Wnd
Wnd
8
1
2
3
Window Titles
Wnd 3 Wnd 7
Wnd 4 Wnd 8
Wnd 5 Wnd 1
Wnd 6 Wnd 2
Wnd
Wnd
Wnd
Wnd
4
5
6
7
Wnd
Wnd
Wnd
Wnd
6
7
8
1
Wnd
Wnd
Wnd
Wnd
5
6
7
8
they want. Simultaneously, they get a notification of the number of windows that are
still inscreen and their title (fig. 5.3). In order to allow participants to pick windows of
their choice, all windows are shown, randomly resized and then shued. This ensures
us that there are windows overlapping and thus a participant can grab partially shown
windows if they desire. Fig. 5.5 demonstrates this concept. When no windows are left
visible in the desktop, the prototype asks the participant to show a specific window
(fig 5.4) based on the previous randomly picked row of windows order. At any time
the participant can ask the system which window was the requested by pressing the
ESC key.
To successfully complete a trial, the participant has to grab the application they were
asked from the oscreen area and then release it in the inscreen area. Two errors can
happen:
Wrong window. Participant grabbed a window other than the one requested,
error that is visualized by notification only after a release.
Nothing. Participant did a grab gesture in a box that does not contain a
window. This happens when one makes a gesture in a box which now is empty
because that window has already become visible in a previous trial. Error
notification is visualized immediately.
(a) Chrome
(b) Firefox
(c) Eclipse
(d) Sublime
(e) Xcode
(f) Pages
(g) Skype
(h) Calendar
5.2.4
Demographic Information
3. Number of tasks
4. Computer experience
The Demographic form as subsection of the Assessment form can be found in Appendix B.
5.2.5
Assessment
Having completed all trials in all feedback modes, participants instructed to fill a
questionnaire in a five point Likert scale where they rate and optionally comment on
each of the feedback mode the encountered. The questionnaire (Appendix C) was
divided in three parts, one for each feedback mode. In particular the participants had
to provide an assessment for the following type of feedbacks:
1) Full feedback, 2) Single feedback, 3) No feedback
The questionnaire assessment has the same structure and is independent of the mode
of the feedback. Each assessment consists of 10 questions, which are explained below
along with their importance.
1. Grabbing windows during the experiment was: The participant had to
rate how easy could grab a window for the specific feedback mode. This question
will help us identify if grab window gesture aects or not the question #5.
2. Releasing windows during the experiment was: The participant had to
rate how easy could release a window for the specific feedback mode. This
question will help us identify if release window gesture aects or not the question
#6.
3. Mouse movement smoothness during experiment was: The participant
had to rate the smoothness during the experiment. Smoothness is a subjective
factor and with this question we can assess whether smoothness of mouse while
moving hand inscreen interferes with what window user wants to select.
4. Dividing the o screen area cognitively was: The participant had to rate
how difficult it was to imaginary think of the oscreen area and split it into
boxes. The importance of this question is, that we can assess if splitting the
oscreen in two rows is a feasible solution.
5. Grabbing (oscreen) and releasing (inscreen) the correct window during the experiment was: The participant had to rate how difficult it was to
perform an operation from oscreen to in screen.
6. It was easy to release (oscreen) in the area I wanted: The participant
had to rate how difficult it was to target the box they wanted in the oscreen
area.
Questions 5 and 6 are of high importance since we get feedback about the nature
of the interaction.
34
5.3
Participants
(a) Distribution of
participants age.
(b) Distribution of
participants gender.
(a) Distribution of
participants experience.
(c) Distribution of
simultaneous tasks.
implies interaction with these tasks, as it is also possible windows that compose other
tasks to be opened but user not using them (inactive).
36
Chapter 6
Results
This chapter presents the results we collected after performing the user study on the
Oscreen Interaction. We decided to focus on 2 basic categories, which include the
most significant information we collected and analyzed. We present the results for
Completion Time and Error Rate. We present and illustrate the mean values for all
the parts of the assessment as discussed in Section 5.2.5. We also refer to the feedback
modes of the trials with the abbreviations presented in Table 6.1.
Feedback mode
Full
Single
None
Abbreviation
FF
SF
NF
6.1
Completion Time
Completion time is a parameter that counts the time a participant spent from the moment they were asked to show a window until they actually show that window. That
means, the timer is triggered upon system notification which informs the participant
for the requested window title and stops when this window is shown in screen. When
the correct window is shown, new system notification is triggered and participant
proceeds with the next timed interaction.
We applied the ANalysis Of VAriance (ANOVA) [4] statistical model, using the aggregated values of time. We used ANOVA to determine if the mean values for completion
time per feedback mode are statistically dierent. We could export basic information
about the means by comparing them, but we want to know how these dierences in
the mean values aect our results and if these dierences are significant.
Using repeated measures ANOVA we found significant main eect in feedback modes
37
(F1.064,8.512 =9.905, p<0.012, with Greenhouse-Geisser correction). Post-hoc comparisons using Bonferroni corrected p-values found significant dierences between FF
and NF(p<0.034 ), SF and NF (p<0.047 ) but no significant dierences between FF
and SF with confidence level of 95%.
We found that Mauchlys Test of Sphericity is violated for all feedback modes. That
means that the type of the mode, has a significant eect on completion time. We
found that Grand Mean is 12.903 seconds (STD=3.248s). Full feedback (M=2.750s,
STD=0.426 ) was really fast, followed by Single Feedback (M=5.111s, STD=2.060s)
and slowest the No Feedback (M=30.847s, STD=8.733s).
Table 6.2 presents the mean time dierences between all possible combination of the
type of feedback mode. As we can see, and as already stated, Mean dierences are
really significant between Full/Single Feedback and No Feedback.
Feedback mode (i)
FF
FF
SF
Sig.
0.644
0.034
0.047
6.2
Error Rate
Error rate is a parameter that counts the number of errors a participant did from the
moment they were asked to show a window until they actually show that window.
That means, an error is registered when user is required to show window X but
instead grabs and releases window Y or grabs Nothing. As described before, Nothing
is an error when participant performs a grab gesture within a box that previously
contained a window, but now is empty. User gets information about the error by
system notification, otherwise they proceed with the next interaction.
We applied again the ANOVA statistical model, using the aggregated values of error
rates to determine if the mean values for errors per feedback mode are statistically
dierent.
Using repeated measures ANOVA we found significant main eect in feedback modes
(F2,7 = 79.620, p<0.001 ). Post-hoc comparisons using Bonferroni corrected p-values
found significant dierences between FF and NF (p<0.001 ), SF and NF (p<0.001 )
but no significant dierences between FF and SF with confidence level of 95%.
We found that Mauchlys Test of Sphericity is not violated for the feedback modes and
Grand Mean is 0.312 errors (STD=0.035 ). Full feedback (M=0.090, STD=0.033 )
was least erroneous interaction, followed by Single Feedback (M=0.145, STD=0.062 )
and most errors reported by the No Feedback (M=0.701, STD=0.062 ).
Table 6.3 presents the mean error dierences between all possible combination of the
38
type of feedback mode. As we can see, and as already stated, Mean dierences are
really significant between Full/Single Feedback and No Feedback.
Feedback mode (i)
FF
FF
SF
Sig.
1
<0.001
0.047
6.3
Peculiar Observations
At the process of applying ANOVA, we observed two cases worth mentioning, cases
we were looking specifically for. Two of our participants outperformed the rest participants in terms of speed in the NF mode. The 7 participants have average time of
completion of 38s whilst the two special cases report of 3.63s and 6.88s respectively
whilst keeping still high error rates of 42.85% and 40%. These two events intrigued
us to find the reason behind this observation and after an in depth examination of
the log file, we discovered a) subjective grouping when positioning windows in the
oscreen area and b) pattern in positioning.
Figure 6.1 shows that participant one managed to position windows in all feedback
modes in the same box. This is achieved by their own will and is not some random
39
behavior as this is strengthened by log file entries (Listing 6.1). The final result of
grouped applications is shown in Figure 6.2.
To give meaningful meaning in the log entries presented below, we will try to explain
them by commenting:
1: C : Chrome :(2 ,0) //Put Chrome in box (2,0)
1: C : Skype :(0 ,0) //Put Skype in box (0,0)
1: C : Sublime Text 2:(0 ,1) //Put Sublime Text 2 in box (0,1)
1: C : eclipse :(1 ,1) //Put eclipse in box (1,1)
1: C : Pages :(2 ,1) //Put Pages in box (2,1)
1: C : Firefox :(3 ,0) //Put Firefox in box (3,0)
1: C : Calendar :(2 ,1) //Put Calendar in box (2,1), Pages pops out
1: C : Xcode :(0 ,0) //Put Xcode in box (0,0), Skype pops out
1: C : Pages :(0 ,0) //Put Pages in box (0,0), Xcode pops out
1: C : Xcode :(1 ,0) //Put Xcode in box (1,0)
1: C : Skype :(3 ,1) //Put Skype in box (3,1)
Participant 2 on the other hand, followed a dierent methodology. They filled the
bottom row first and then the upper row. We were unable to identify whether the
positioning in such pattern as shown in Figure 6.3 was accidentally or predicted as
there are no pop outs (Listing 6.2).
2: C : Pages :(0 ,0)
2: C : Chrome :(1 ,0)
2: C : Skype :(2 ,0)
2: C : eclipse :(3 ,0)
2: C : Sublime Text 2:(0 ,1)
2: C : Firefox :(1 ,1)
2: C : Calendar :(2 ,1)
2: C : Xcode :(3 ,1)
40
6.4
Subjective Preferences
In Figure 6.4, the mean values for all the parts of the assessment presented in section
5.2.5 are illustrated and presented. On each diagram the vertical axis represents the
mean values of the corresponding category, which are ranged from 1 to 5 (5 point
Likert scale) whilst X axis represents the mode. Generally, higher values mean that
participants positively rated the specified category.
FF mode scored repeatedly well across all categories, better than any other mode.
Precisely, Full Feedback (Blue) was ranked higher concerning the Overall Interaction
Experience (3.66 / 5) Figure 6.4h and the General Comfort (3.11 / 5) Figure 6.4g.
However, most importantly, NF (Orange) performed badly in a large degree compared to the rest of modes through all categories except wrist fatigue, Figure 6.4d
-category to which all modes performed mediocre with ST Dmean = 0.231. Arm fatigue (Figure 6.4c) especially in NF, indicates a possible future research topic with
target to diminish the eect. Finally, grabbing from oscreen/inscreen and releasing inscreen/oscreen respectively, as well as smoothness, which operates uniformly
(mean = 3.92, ST Dmean = 0.16) ranged high in all modes (Figures 6.4e, 6.4f, 6.4a),
which proves that our prototype application is well structured and corresponds well
to hand movement and gesture identification.
41
Figure 6.4: Mean Values assessed by participants for each feedback mode.
42
Chapter 7
Discussion
We conducted a study and presented the results, now we need to discuss and interpret
these results. What glues together the experiment (Chapter 5) and the results (Chapter 6) are the hypotheses made, thus we will commence the discussion by stating the
hypotheses and discuss if they hold.
We analyzed results in the Section 6.3 where we presented two cases varying
significantly from others. Indeed our analysis of data validates H3 as two participants managed to improve completion time (best=3.73s) by a factor of 5 from
the fastest participant who achieved 15.13s.
Taking into consideration the above discussed hypotheses, we perceive that all of them
hold. It this though imperative to discuss the influence of H3 in the No Feedback
mode before we can conclude:
Since our window list was chosen such as to allow grouping of windows and strengthened by our findings, we could assume directly from H3 that if there is correlation
between tasks, systems that provide No Feedback might behave no significantly worse
than systems with visual feedback. This assumption though is not taking into consideration error rates and a longer lasting usage of a No Feedback system, which
due to high arm fatigue (Fig 6.4c) would increase significantly the error rate and user
frustration resulting in a no usable system with slow completion time, prone to errors.
In the additional assessment users provided us with (Appendix C), we asked them to
put the 3 dierent modes in ascending order depending on which was the easiest to
use. Users dominantly rejected the No Feedback mode as the most tiring and least
eective to use which definitely depicts user preferences. Observing the results we
got from the error and completion time parameters, we realize that the No Feedback
is outperformed by all other modes regarding all measured parameters. Results indicate no significant dierence between the Full feedback and the Single feedback but
nonetheless Full feedback is slightly more efficient.
Our findings contradict findings of Gustafson et al. [17] where they developed and
studied a viable system without any visual feedback. This contradiction is interpreted
by the fact that our interaction is a mixed interaction between a computer screen and
spatial space whilst their system interacts only in the spatial space. Similarly, Po et
al. [45] state and i quote
Our findings suggest that pointing without visual feedback is a potentially
viable technique for spatial interaction with large-screen display environments that should be seriously considered.
We would like here to dierentiate our study from studies for large-screen displays as
our mechanics dont apply in such displays, research area that we will address in the
section below.
After stating the above observations, and combining H1, H2, H3 we conclude that
a system for interacting between desktops windows and the oscreen space can a)
be seamlessly achieved as long as there is a form of visual feedback to the user and
b) that the existence of some (subjective) form of grouping windows or positioning
methodology in the oscreean area decreases the interaction time but not the error
rate when no visual feedback is provided.
44
Chapter 8
8.1
Conclusion
the procedure and the context of the user study. We had 9 participants taking part
in this study. Each participant had to go through the verbal instructions, fill in the
Demographic information form, familiarize themselves with the interaction through
the User Learn Mode (5.2.2), accomplish the three feedback modes (3.4) for the experiment consisting of 8 open windows and finally fill in an assessment form for each
of the 3 dierent type of modes. We then presented statistical information regarding
the demographic information of the participants that took place in our study.
We then presented the computed results for the two basic parameters: completion time
and error rate. We stated our observations from the log file regarding the positioning
in the oscreen area followed their significance. The results and arguments on the
validity of the hypotheses were also discussed.
Finally, we provided evidence and supported that all the hypotheses hold and concluded that a) a system for interacting between desktop windows and the oscreen
space is feasible given that user receivers some form of feedback and b) no feedback
is a feasible solution for small term usage when windows are placed oscreen after
planing but it is not optimal.
8.2
Future Work
Based on the results obtained, the future work include several improvements and
additional features to the current prototype to manage a seamless interaction. One
way to achieve improvement, would be to minimize as much as possible the fatigue
and increase the general comfort. Given that our prototype introduces no significant
wrist fatigue, we believe that by translating the mouse movement to wrist movement
would be a promising topic for researching the interactions behavior utilizing the
wrist and not the whole arm.
Although the No Feedback mode gives no visual feedback on the user, user gets
indirectly some form of feedback from the mouse cursor. This indirect feedback can
be researched to identify to which extend we can get useful information about X
position in oscreen. We know that when hand is in oscreen area, mouse cursor
is on the top edge of the screen. If there was some mechanism that stickied mouse
cursor to the X center of the underlined imaginary box, then user would know exactly
in which box their hand is pointing at.
Since the Leap Controller is connected in the USB, this provides a maximum distance
of 30 meters with repeaters or even further through USB over Ethernet technology.
Therefore, another research area would be to investigate further the interaction with
large displays from greater distance. Finally we would like to introduce more oscreen
areas, not only above screen, but also to the left and right which will give more spatial
space for the interaction. This would allow us to support more windows in order to
investigate further the influence of grouping windows when no visual feedback is
given.
46
Bibliography
[1] Leap Motion Controller (accessed on 6 February 2015).
leapmotion.com.
https://www.
https://developer.apple.com/technologies/mac/
[14] Luigi Gallo, Giuseppe De Pietro, and Ivana Marra. 3d interaction with volumetric
medical data: experiencing the wiimote. In Proceedings of the 1st international
conference on Ambient media and systems, page 14. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2008.
[15] Lee Garber. Gestural technology: Moving interfaces in a new direction [technology news]. Computer, 46(10):2225, 2013.
[16] Joze Guna, Grega Jakus, Matevz Pogacnik, Saso Tomazic, and Jaka Sodnik.
An analysis of the precision and reliability of the leap motion sensor and its
suitability for static and dynamic tracking. Sensors, 14(2):37023720, 2014.
[17] Sean Gustafson, Daniel Bierwirth, and Patrick Baudisch. Imaginary interfaces:
Spatial interaction with empty hands and without visual feedback. In Proceedings
of the 23Nd Annual ACM Symposium on User Interface Software and Technology,
UIST 10, pages 312, New York, NY, USA, 2010. ACM.
[18] Mark Hancock, Sheelagh Carpendale, and Andy Cockburn. Shallow-depth 3d
interaction: design and evaluation of one-, two-and three-touch techniques. In
Proceedings of the SIGCHI conference on Human factors in computing systems,
pages 11471156. ACM, 2007.
[19] Mark Hancock, Thomas Ten Cate, and Sheelagh Carpendale. Sticky tools: full
6dof force-based interaction for multi-touch tables. In Proceedings of the ACM
International Conference on Interactive Tabletops and Surfaces, pages 133140.
ACM, 2009.
[20] Doris Hausen, Sebastian Boring, and Saul Greenberg. The unadorned desk:
Exploiting the physical space around a display as an input canvas. In HumanComputer InteractionINTERACT 2013, pages 140158. Springer, 2013.
[21] D Austin Henderson Jr and Stuart Card. Rooms: the use of multiple virtual
workspaces to reduce space contention in a window-based graphical user interface.
ACM Transactions on Graphics (TOG), 5(3):211243, 1986.
[22] Otmar Hilliges, David Kim, Shahram Izadi, Malte Weiss, and Andrew Wilson.
Holodesk: direct 3d interactions with a situated see-through display. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages
24212430. ACM, 2012.
48
[23] Dugald Ralph Hutchings, Greg Smith, Brian Meyers, Mary Czerwinski, and
George Robertson. Display space usage and window management operation comparisons between single monitor and multiple monitor users. In Proceedings of
the working conference on Advanced visual interfaces, pages 3239. ACM, 2004.
[24] Dugald Ralph Hutchings and John Stasko. Quickspace: New operations for the
desktop metaphor. In CHI02 extended abstracts on Human factors in computing
systems, pages 802803. ACM, 2002.
[25] Dugald Ralph Hutchings and John T Stasko. New operations for display space
management and window management. 2002.
[26] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew
Davison, et al. Kinectfusion: real-time 3d reconstruction and interaction using
a moving depth camera. In Proceedings of the 24th annual ACM symposium on
User interface software and technology, pages 559568. ACM, 2011.
[27] Samuel Jimenez. Physical interaction in augmented environments.
[28] Pan Jing and Guan Yepeng. Human-computer interaction using pointing gesture based on an adaptive virtual touch screen. International Journal of Signal
Processing, Image Processing, 6(4):8192, 2013.
[29] Eser Kandogan and Ben Shneiderman. Elastic windows: Improved spatial layout and rapid multiple window operations. In Proceedings of the Workshop on
Advanced Visual Interfaces, AVI 96, pages 2938, New York, NY, USA, 1996.
ACM.
[30] Abhishek Kar. Skeletal tracking using microsoft kinect. Methodology, 1:111,
2010.
[31] Jeroen Keijser, Sheelagh Carpendale, Mark Hancock, and Tobias Isenberg. Exploring 3d interaction in alternate control-display space mappings. In 3D User
Interfaces, 2007. 3DUI07. IEEE Symposium on. IEEE, 2007.
[32] David Kim, Otmar Hilliges, Shahram Izadi, Alex D Butler, Jiawen Chen, Iason
Oikonomidis, and Patrick Olivier. Digits: freehand 3d interactions anywhere
using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, pages 167176. ACM, 2012.
[33] Jinha Lee, Alex Olwal, Hiroshi Ishii, and Cati Boulanger. Spacetop: integrating
2d and spatial 3d interactions in a see-through desktop environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages
189192. ACM, 2013.
[34] SK Lee, William Buxton, and KC Smith. A multi-touch three dimensional touchsensitive tablet. In ACM SIGCHI Bulletin, volume 16, pages 2125. ACM, 1985.
[35] Frank Chun Yat Li, David Dearman, and Khai N Truong. Virtual shelves: interactions with orientation aware devices. In Proceedings of the 22nd annual ACM
49
[48] George Robertson, Maarten van Dantzich, Daniel Robbins, Mary Czerwinski,
Ken Hinckley, Kirsten Risden, David Thiel, and Vadim Gorokhovsky. The task
gallery: A 3d window manager. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, CHI 00, pages 494501, New York, NY,
USA, 2000. ACM.
[49] LEAP SDK. https://developer.leapmotion.com/documentation/csharp/
devguide/Leap_Overview.html.
[50] Greg Smith, Patrick Baudisch, George Robertson, Mary Czerwinski, Brian Meyers, Daniel Robbins, and Donna Andrews. Groupbar: The taskbar evolved. In
Proceedings of OZCHI, volume 3, page 10, 2003.
[51] Martin Spindler, Wolfgang B
uschel, and Raimund Dachselt. Use your head:
tangible windows for 3d information spaces in a tabletop environment. In Proceedings of the 2012 ACM international conference on Interactive tabletops and
surfaces, pages 245254. ACM, 2012.
[52] Balanced Latin Square. http://en.wikipedia.org/wiki/Latin_square.
[53] TSeries Vicon. http://www.vicon.com/System/TSeries.
[54] Manuela Waldner, Markus Steinberger, Raphael Grasset, and Dieter Schmalstieg. Importance-driven compositing window management. In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems, pages 959
968. ACM, 2011.
[55] Robert Wang, Sylvain Paris, and Jovan Popovic. 6d hands: markerless handtracking for computer aided design. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pages 549558. ACM, 2011.
[56] Frank Weichert, Daniel Bachmann, Bartholomaus Rudak, and Denis Fisseler.
Analysis of the accuracy and robustness of the leap motion controller. Sensors,
13(5):63806393, 2013.
[57] Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. Realtime performancebased facial animation. In ACM Transactions on Graphics (TOG), volume 30,
page 77. ACM, 2011.
[58] Pierre Wellner. Interacting with paper on the digitaldesk. Communications of
the ACM, 36(7):8796, 1993.
[59] Quan Xu and Gery Casiez. Push-and-pull switching: window switching based
on window overlapping. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, pages 13351338. ACM, 2010.
[60] Shota Yamanaka and Homei Miyashita. Switchback cursor: mouse cursor operation for overlapped windowing. In Human-Computer InteractionINTERACT
2013, pages 746753. Springer, 2013.
51
52
Appendix A
Code snippets
A.1
A.2
A.3
Inscreen area
if ( appY >=0&& appY <= screenHeight && appX >0 && appX <= screenWidth )
{
53
// Do rest of a l g o r i t h m
}
A.4
Oscreen area
if ( appY ==0&& yy >= LOWER_BOUND && yy <= UPPER_BOUND && appX >0 && appX <
screenWidth )
{
// Do rest of a l g o r i t h m
}
A.5
- ( void ) u p d a t e C u r r e n t U I E l e m e n t {
NSPoint cocoaPoint = [ NSEvent mouseLocation ];
if (! NSEqualPoints ( cocoaPoint , _ la st Mo u se Po i nt ) ) {
CGPoint poi ntAsCGP oint = [ self
c a r b o n S c r e e n P o i n t F r o m C o c o a S c r e e n P o i n t : cocoaPoint ];
AXUI ElementR ef newElement = NULL ;
if ( A X U I E l e m e n t C o p y E l e m e n t A t P o s i t i o n ( _systemWideElement ,
poin tAsCGPoi nt .x , poi ntAsCGP oint .y , & newElement ) ==
k AX Er ro r Su cc es s
&& newElement
&&([ self currentAppID ]!=[ self e x t r a c t P I D F r o m U I E l e m e n t :
newElement ])
&& ([ self c u r r e n t U I E le m e n t ] == NULL || ! CFEqual ( [ self
c u r r e n t U I E l e m e n t ] , newElement ) ) ) {
[ self s e t C u r r e n t U I E l e m e n t : newElement ];
[ self s et Cu r re nt Ap p ID :[ self e x t r a c t P I D F r o m U I E l e m e n t :
newElement ]];
NSImage * tmpImg =[ self generateImage :[ self getTitle ]];
[ self s et Cu r re nt Im a ge : tmpImg ];
}
_ la st Mo u se Po in t = cocoaPoint ;
}
}
A.6
54
{
NSString * ap pl ic a ti on Na m e = [ entry objectForKey :( id )
k C G W i n d o w O w n e r N a m e ];
int pid = [[ entry objectForKey :( id ) k C G W i n d o w O w n e r P I D ]
intValue ];
N S R u n n i n g A p p l i c a t i o n * app = [ N S R u n n i n g A p p l i c a t i o n
runningApplicationWithProcessIdentifier
: pid ];
[ app a c t i v a t e W i t h O p t i o n s : N S A p p l i c a t i o n A c t i v a t e A l l W i n d o w s ];
NSString * script = [ NSString s t r i n g W i t h F o r m a t : @ " tell
application \" System Events \" to tell process \"% @ \" to
perform action \" AXRaise \" of window \"% @ \"\ ntell
application \"% @ \" to activate \ n " , applicationName ,
subWindowName , a pp li c at io nN a me ];
NSAppleScript * as = [[ NSAppleScript alloc ] in itWithSo urce :
script ];
[ as c o m p i l e A n d R e t u r n E r r o r : NULL ];
[ as e x e c u t e A n d R e t u r n E r r o r : NULL ];
returnValue = TRUE ;
}
}
CFRelease ( windowList ) ;
return returnValue ;
}
A.7
-( NSString *) getTitle {
CFTypeRef _title ;
AXUI ElementR ef app = A X U I E l e m e n t C r e a t e A p p l i c a t i o n ([ self currentAppID
]) ;
if ( A X U I E l e m e n t C o p y A t t r i b u t e V a l u e ( app , ( CFStringRef )
NS Acc es sib ili tyT itl eA ttr ibu te , ( CFTypeRef *) & _title ) ==
k AX Er ro r Su cc es s ) {
NSString * title = ( __bridge NSString *) _title ;
return title ;
}
return @ " " ;
}
A.8
Generate image
55
}
imageOptions = k C G W i n d o w I m a g e D e f a u l t ;
singleWindowListOptions = kCGWindowListOptionIncludingWindow ;
imageBounds = CGRectNull ;
CGImageRef windowImage = C G W i n d o w L i s t C r e a t e I m a g e ( imageBounds ,
singleWindowListOptions , windowID , imageOptions ) ;
N S B i t m a p I m a g e Re p * bitmapRep = [[ N S B i t m ap I m a g e R e p alloc ]
i ni tW it h CG Im ag e : windowImage ];
NSImage * image = [[ NSImage alloc ] init ];
[ image a d d R e p r e s e n t a t i o n : bitmapRep ];
return image ;
}
A.9
56
Appendix B
Demographic Information
Form
57
Participant ID
_____________
General Information
Age:
Gender:
Job title:
Yes !
No !
Yes !
No !
Yes !
No !
Highly experienced
Appendix C
59
Ranking
Please rank the aforementioned systems
Full Feedback:
_______
Single Feedback:
_______
No feedback:
_______