Research – Multimedia & Sensors Lab, Georgia Tech

Research – Multimedia & Sensors Lab, Georgia Tech

Research Topics

  • Perceptual Video Quality Assessment in Streaming and Video-Conferencing Applications
  • Color-based Quality and Aesthetics Assesment and Enhancement
  • Directional Transform and Their Applications in Image Processing
  • Seismic Interpretation: Fault and Salt Dome Detection
  • Similarity Indexes for Large Visual Data Sets
  • Enabling Technologies for Autonomous Vehicles Perception
  • Similarity Indexes for Large Visual Data Sets
  • Multimedia processing and communications
  • Multi-sensor processing and networking
  • Immersive processing and communications

Tools and Codes

  • HHF [GUI] [Paper] This code is an implementation of the Hierarchical Hole Filling (HHF) demo.
  • MIQM [Code] [GUI] [Paper] This code is an implementation of the Multi-Camera Image Quality Measure (MIQM) between two images.


Current Projects

Perceptual Video Quality Assessment in Streaming and Video-Conferencing Applications

In this project, we analyze the impact of network losses on high efficiency video coding (HEVC) and the resulting error propagation. We develop no-reference and reduced-reference quality assessment measures for HEVC and H.26X videos. Our work estimates channel-induced distortion in the video assuming we have access to the decoded video only without access to the bitstream or the decoder. We also do not make any assumptions on the coding conditions, network loss patterns or error concealment techniques in our work. The proposed approaches rely only on the temporal variation of the power spectrum and motion across the decoded frames. We show that certain statistical features can be used to capture and quantify channel-induced errors.

Color-based Quality and Aesthetics Assesment and Enhancement

The way we perceive the world is not just black and white. However, the colour information is usually ignored and the structure becomes the centre of attention. At MSL, in addition to the structural information, we use colour not only to detect the quality and aesthetic values, but also to enhance images and videos that lead to a higher quality of experience.

High Color Range (HCR) Imaging

HCR is a method developed at MSL that uses color information of the images and videos to provide a better quality of experience. We show that HCR can be used for variuos applications including but not limited to vision enhancement, aesthetics enhancement and HDR-like applications.

1. Vision Enhancement

Enhancement algorithms usually consider the structural components of the scene to enhance the view. At MSL, we integrate enhancement algorithms with HCR to take color information into account. In the following figure, you can see the snapshots of the videos. Original video is captured in a dark environment. When we perform contrast-based enhancement, we obtain the RGB profile result. However, the RGB profile leads to saturation over the frames with high motion. When we embed HCR into contrast-based enhancement, we can obtain LCH and CMYK profile results that partly eliminates the saturation effect and leads to a better vision.

2. Aesthetics Enhancement

HCR can be used to perform aesthetics enhancement to make the images/videos more appealing for the end user. In the following figure, we have the original image and the modified images that are proceesed with HCR in different configurations. Aesthetics is a subjective concept and when we need to make aesthetics enhancement, it needs to be customized. Therfore, HCR povides enhancement in different profiles to make sure that it responds to variations in the user preferences.

As an example of aesthetics enhancement using HCR, we show the original video and the processed ones using different profiles of HCR algorithm.

Original Video

HCR-CMYK Profile

HCR-Lab Profile

3. HCR as an alternative to HDR

We can use HCR as an alternative to HDR. In here, we show several images of the same scene that are captured under various exposure settings. HDR methods fuse these images to generate a high-dynamic range iamge. As an example, we use state of the art HDR algorithms to illustrate how the HDR images looks like. Input images that are fed to HDR algorithms are shown below.

Input Images

Output Images

In here, we show the resulting images using HDR, HDR with tone mapping and HCR algorithms.

Directional Transfrom and Their Applications in Image Processing

With the huge increase in the amount of visual data in the last two decades, there is a growing need to develop tools that can efficiently represent large amounts of visual data. Sparsity-promoting multi-scale directional transforms, such as Curvelets and Contourlets, allow for the efficient representation of images, by compressing the energy of such images into a small subset of coefficients. We study the applications of such directional transforms in image classification and retrieval, and focus specifically on their application to seismic signal processing

The curvelet transform is a recently introduced directional transform that provides an efficient representation of edges. The transform works By dividing the spatial content in images into different frequency sub-bands. In this project our goal is to adapt the divisions of the frequency domain to better represent a given set of images.

Seismic Interpretation: Fault and Salt Dome Detection

A fault, a common geological structure, is formed by a displacement between neighboring tectonic plates and has close relationship with the formation of petroleum reservoir. The goal of this project is to implement the semi-automatic detection of faults in seismic datasets by involving digital image processing techniques. In 3D seismic datasets, we detect fault lines in seismic sections and time sections, respectively. In seismic sections, we first apply Hough transform on the highlighted fault points to detect faults features. Then we remove false features by considering geological constraints of faults and obtain an initial fault line by connecting the remaining faults features. In the last stage, by incorporating the discontinuity information, we tweak the initial fault line to obtain more accurate and reliable results.

In time sections, we first apply local thresholds on the discontinuity map of every time section to highlight the likely fault regions. Then, we enhance the highlighted fault regions within one time section by combining the similar fault regions from neighboring time sections. This similarity measure is based on the strong geological coherency of faults. Finally, we perform a thinning process to extract fault lines by using a weighted skeletonization method based on geological constraints.

Similarity Indexes for Large Visual Data Sets

The imaging and computing techniques have been advancing at an increasingly fast pace. As a result, tremendous amount of visual data in the form of images and videos are being created and distributed everyday nowadays. Retrieval is one of the most common applications for exploration of these data, which aims to identify within an available database data sets similar to a given data set in terms of some certain criterion.

In this work, our objective is to develop similarity indexes for retrieval of large scale visual data sets. Naturally, a good similarity index should be able to produce accurate retrieving results. To achieve high accuracy, we first explore perception inspired low level visual features that adapt to the human visual system. We also utilize high level semantic attributes to help further characterize the visual data. At the same time, we try to incorporate useful knowledge or constraints specific to a certain application domain to enhance the similarity index.

In addition to the accuracy, a reasonable computational efficiency is also very critical, especially when large data sets are of interest. To accomplish this, we explore several aspects involved in the similarity evaluation. First of all, simple low-dimensional features may be available for visual characterization. Secondly, compact representation of high-dimensional features may be obtained by applying techniques such as dimensionality reduction, sketch construction, etc. Moreover, it is also beneficial to employ computationally inexpensive techniques for comparing the features. When developing efficient similarity indexes, our emphasis is on the scalability and parallelizability.

Initially, we applied our work on seismic data sets, where established databases are retrieved for seismic structures similar to a given query. The purpose is to enable the re-use of historical findings to assist new explorations. Our research plan will include applications such as online visual search, video surveillance, and medical imaging, just to name a few.

Enabling Technologies for Autonomous Vehicles Perception

In this project, we investigate different technologies to enable better perception of surrounding environment for autonomous vehicles applications. The perception module performs many operations including; Scene understanding, Context-based detection and tracking among other tasks shown in figure, which ultimately enables an intelligent controller in the vehicle to make informed decisions about its surroundings. Multiple sensory technologies can capture different aspects of surrounding environment, among the candidate technologies are Thermal Infrared, Laser Scanner, RADAR, and Stereo-vision. First, raw captured data are processed to provide scene understanding which describes what kind of structures and objects available at a given scene. Then, objects-of-interest are detected and tracked across the scene. The detection and tracking operations can be improved when based on context of the given scene. For example, scenes captured during daylight with good lighting conditions need to be handled differently than ones at night with limited lighting sources. Along with these issues, real-time detection and tracking in urban environments, pedestrian detection and tracking, and sensor fusion algorithms are addressed in this project.

Previous Projects

Quality Enhancement for Depth-based 3D Videos

In this project, we propose a new approach for disocclusion removal in depth image-based rendering (DIBR) for 3DTV. The new approach, Hierarchical Hole-Filling (HHF), eliminates the need for any preprocessing of the depth map. HHF uses a pyramid like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involves a pseudo zero canceling plus Gaussian filtering of the wrapped image. Then starting backwards from the lowest resolution hole-free estimate in the pyramid, we interpolate and use the pixel values to fill in the hole in the higher up resolution image. The procedure is repeated until the estimated image is hole-free. Experimental results show that HHF yields virtual images that are free of any geometric distortions, which is not the case in other algorithms that preprocess the depth map. Experiments have also shown that unlike previous DIBR techniques, HHF is not sensitive to depth maps with high percentage of bad matching pixels. For more details, please refer to our publications on the subject.

Quality Measurement of Depth-based 3D Videos

In this project, we present a new method for objectively evaluating the quality of stereoscopic 3D videos generated by depth-image-based rendering (DIBR). First we derive an ideal depth estimate at each pixel value that would constitute a distortion-free rendered video. The ideal depth estimate is then used to derive three distortion measures to objectify the visual discomfort in the stereoscopic videos. The three measures are temporal outliers (TO), temporal inconsistencies (TI), and spatial outliers (SO). The combination of the three measures will constitute a vision-based quality measure for 3D DIBR-based videos, 3VQM. 3VQM was presented and verified against a fully conducted subjective evaluation. The results show that our proposed measure is significantly accurate, coherent and consistent with the subjective scores. We also developed full-reference, reduced-reference and no-reference versions of this quality measure.

6DMG: 6D Motion Gesture Database

The goal of this database is to provide the user with comprehensive data of motion gestures, including the position, orientation, accelerations, and angular speeds. The 6DMG database also comes with sample programs (C++) to access and visualize these recorded motion gestures. 6DMG is published at, and we hope this motion gesture database can be a useful platform for researchers and developers to build their recognition algorithms as well as a common test bench for performance comparisons.

An Integrated Framework for Universal Motion Control

Motion-based interactions have become popular nowadays. Taking into account all the general interactions required on graphic user interfaces, we propose an integrated framework for motion control, which seamlessly supports 2D, 3D and motion gesture interactions. We categorize the general tasks and define four corresponding operating modes: 2D cursor, 3D manipulation, 3D navigation, and motion gesture. In the implementation, a hybrid of optical and inertial sensing is used to achieve precise 6 DOF motion tracking. We develop two interesting applications to demonstrate the usability of the integrated motion control framework between the first three operating modes. The motion gesture mode is proposed but still under implementation.

Statistical Analysis and Encoder Optimization Tools for H.264-Like Video Coders

Video coding standards such as H.264/AVC define converting a raw video source into a specified bitstream. A typical compliant video coding system consists of several modules that perform operations such as motion compensated prediction, transform coding, rate control, run-length coding and entropy coding. Although the encoded bitstream syntax is specified by a standard, many of the encoder modules are left open. This allows manufacturers to optimize the encoding operations or to tailor it to a particular application. In this project, we aim to develop encoder optimization tools by statistical analysis of the video sources. We also aim to develop new video coding methods for the emerging video coding standards such as the High Efficiency Video Coding (HEVC).

Nonparametric Multimodal Characterization of Social Networks and Implicit Communication Disclosure

Over the past decade, social networks have become the most influential phenomenon to reshape the Internet and communication systems. However, the characterization and modeling of these networks remains an open field of investigation which calls for a rigorous measurement of the variations and parameters that describe the behavior of entities in these systems. This project introduces a new paradigm to characterize and understand the dynamics of a complex social network where we set up a mathematical platform that captures the network dynamics. We introduce a novel generic non-parametric model to characterize a general system of social communicators. We divide the network into low-level entities, each of which has some independent features. The different entities are then combined using Bayesian nonparametric statistics, namely Dirichlet processes mixture models (DPMM). This statistical theory is used to flexibly and precisely characterize the random variate in the system. In addition to being generic to serve as an engine for any social network, this approach has the efficiency needed for unsupervised learning scenarios. This set up was tested using a simulated case study where we show examples of its utility for behavior characterization and predictions. We also show how this approach allows the discovery of hidden social and behavioral associations between the users. Furthermore, this modeling framework takes into account rare and unpredictable social occurrences such as the Black Swan Events. We also address the important rule of the media in characterizing a social network and how this framework is amenable to media inputs and interpretations.

Compression of Seismic Data

Seismic Surveys scan the earth subsurface in search for particular earth sub-surface structures. These data are relied upon in the process of oil, natural gas, and mineral deposits exploration. An efficient compression technique is essential to develop communication systems for transmitting such data for storage and processing. Due to the huge volume of the data sets resulting from those surveys (10-100Gbytes per square kilometer), development of a successful compression algorithm is also necessary for subsequent data management and storage. In this project our goal is to develop algorithms specific for the nature of the data acquired from these surveys. Both “lossy” and “loosless” compression approaches are investigated.

Hand Gesture Control

This work is concerned with the development of hand gesture recognition systems. Facing the annoyance looking for a remote controller, the vision-based hand gesture recognition system that is based on advanced signal processing algorithms can provide you a control of electronic products, such as TV set, air-conditioner, lightening system. We also envision the software module to be a plug-in into many applications that require interaction with multimedia, such as IPTV, Facebook, and Youtube. The key in developing a successful system is to develop recognition algorithms that are robust and reliable in various conditions such as extreme lighting conditions, complex settings with many individuals’ movements, and un-optimized positioning of capturing cameras.

Measuring Visual Quality of Experience in Immersive Systems

Although several subjective and objective quality assessment methods have been proposed in the literature for images and videos from single cameras, no comparable effort has been devoted to the quality assessment for multi-camera images and videos for immersive systems. The quality of images, which are captured by a multi-view system, are affected by multiple factors such as camera configuration, number of cameras, and the calibration process. In this work we investigate the quality of experience for multi-camera image and videos for 3DTV, free viewpoint TV, and multiview panorama.

Analysis of Tracking Systems for Immersive Applications

Motion trackers nowadays utilize different types of sensing technologies. According to our survey, there is no such standard to compare these trackers. Our goal is to develop methods to measure and define metrics to evaluate the performance of the motion trackers. This understanding will help in processing collected tracking data within demanding systems such as collaborative and immersive environments.

Processing and Transmission of Interaction Signals

The rise in popularity of virtual environments has signaled the need for new and more efficient methods to network together three dimensional worlds. Currently dead reckoning is the preferred transmission algorithm for virtual environments and is used to keep state information synchronized between hosts. The purpose of this research is to design and test dead reckoning and convergence algorithms for use in collaborative virtual environments (CVEs).

Multi-Array Imaging in Distant Learning

In this project, we utilize a multi-array camera that consists of twenty four VGA cameras. A video capture system grabs the twenty four streams at 30 Hz onto a single computer. The captured streams are processed in real time and composited into a single high resolution video at the pixel level. The resulting mosaic is 7.5 megapixel panoramic video which is four times the resolution of a single HDTV video camera. The current system is installed in an experimental classroom where the camera is looking at an instructor’s area of 32′ by 8′. An interactive desktop application is demonstrated in which the student can digitally manipulate a virtual camera with pan, tilt, zoom, roll and select whole or part of the streamed video. The video stream is broadcasted over the Internet where students could connect to the video after identity verification. Our current work focuses on image enhancements such developing novel real time image processing techniques for this special camera such as sharpening, uniform contrast, color smoothing, and image denoising.

Next Generation Mobile IMS Applications

The increasing penetration of smart mobile devices is shifting the balance of how users access information. More users are connection to sources of information from smaller mobile devices than ever before. To adapt to these new trends mobile service providers have heavily adopted current networking protocols for a new next generation IMS network. The new network infrastructure requires new software techniques and paradigms to be designed for application development. A typical next generation application that harnesses the power of the new infrastructure and new devices has yet to be design. We are currently researching different end user applications that connect users to information in new and unique ways. The research merges the fields of user interaction tightly with the networking infrastructure allowing users to interpret the Internet as a truly ubiquitous entity.

Transmission Algorithms for Collaborative Virtual Environments

Although single user virtual environments have become useful in learning and training, the networking of multiple virtual environments adds a greater sense of immersion to virtual reality. When multiple virtual environments are networked, users have the opportunity to cooperate or compete with other live users. Interacting with human users more realistically models the actual world on which the virtual environment is based. To network virtual environments a networking protocol must be designed and implemented. The networking protocol is concerned with the proper transmission and reception of the virtual environment state information. An important part of the networking protocol is the transmission algorithm. The transmission algorithm dictates when and what state information is to be transmitted across the network. Currently we are studying the effects the network delay, jitter, and loss will have on different transmission algorithms used in collaborative virtual environments. Because of the different input devices, such as data gloves and magnetic trackers, collaborative virtual environments state information changes in a different fashion than other 3D environments. How well traditional transmission algorithms work as well as newly designed algorithms is the primary focus of this research.

Remote Rehabilitation with Virtual Reality

Everyday more individuals become injured from sports, accidents, and disease. To recover from the injury sustained many individuals need some form of rehabilitation therapy. Often times the rehabilitation therapy involves the patient performing various physical exercises or activities to rehabilitate the injured region, as well as the motor control, of their body. In the course of the patient’s rehabilitation two general problems occur. For some patients, such as elderly stroke victims, the act of commuting to the physical therapists office is often difficult due to the injury’s effect on coordination. The patient often must require another individual to bring them in for the therapy session. The ability to perform rehabilitation at home or another remote location is therefore important to the patient. Another common problem that often occurs in a patient’s rehabilitation is that the patient often does not perform the exercises they are suppose to at home. By not exercising or exercising improperly at home patients increase their recovery time and the quality of their recovery. In this case the ability of the physical therapist to record the user’s progress at home would ensure more users perform the exercises as prescribed. In addition recording the user’s sessions at home would give the therapists more data into the progress of the patient’s recovery. To combat these problems a remote virtual reality system for rehabilitation has been designed and built. The system focuses on arm and hand movements for patients who need upper extremity rehabilitation. For example stroke victims often perform arm exercises to recover from damage to certain portions of their brain effecting hand and arm coordination. The system is composed of two data gloves and magnetic tracking equipment to interact with the virtual environment software so the user can perform various exercises. Specifically the software implements connecting blocks which the user is asked to manipulate with their arms in various fashions. Currently we are researching networking and overall software system design to build better and easier to use rehabilitation virtual reality systems. A special emphasis of this research is to make the complex system easier to use for non computer savvy individuals.

Networking Collaborative Environments

Through speech, gestures, and diagrams people are able to communicate their thoughts to others. With the advent of telecommunication systems people were better able to communicate over vast distances. First through speech and later through video conferencing and the Internet telecommunications has brought individuals together as never before. As technologies have synergized a new form of communication is emerging. This new form of communication is effectively known as remote collaboration. By harnessing many different forms of communication collaboration creates an environment that mimics the actual world. Users are brought together in a virtual environment to communicate, share ideas, and ultimately collaborate with each other. This collaboration can take many different forms. From a simple remote desktop application to an immersive networked virtual environment modern collaborative systems are bringing people together as never before. To network these complex systems an intelligent networking protocol is needed. Currently we are researching networking technologies that make collaboration more immersive and practical. We are specifically interested in the study of 3D collaborative environments.

Networked Mobile Gaming

Game designers and programmers have worked for years to make video games challenging and engaging to users. Sophisticated AI techniques and multi player game play are used to add more excitement and intelligent interaction. Until recently gamers had to be located at the same location to play games together. The advent of the Internet has made it possible for gamers to play with others often connected from around the world. Designing a networking protocol for modern video games is a challenge because each game is a unique application. Games are designed to be unique and creative to interest hardcore as well as casual gamers. Currently each game design has its own networking protocol designed specifically to keep game play natural across typical Internet connections. As the complexity of games increases users will request richer games with larger network bandwidth and real time requirements. In addition the ubiquitous nature of games has made games penetrate into even the smallest hand held device. Designing a networked game for a mobile hand held device requires the networking protocol to tolerant of higher amounts of jitter and to optimize the network bandwidth better. To deal with these difficulties a game networking protocol needs to be developed that will utilize the bandwidth effectively and deal with a large amount of jitter. Currently we are researching the networking of current and next generation games that stress the network but still allow ubiquitous mobile play.

Hybrid Variable Length Coding (HVLC) for Image and Video Compression

Existing video coding standards, such as MPEG-2/4, H.263, and H.264/AVC, commonly adopt the so-called block-based hybrid video coding approach, where motion-compensation prediction is used to exploit the temporal redundancy, transform coding of the prediction residual is used to exploit the spatial redundancy, and entropy coding is adopted to exploit the statistic redundancy of the quantized transform coefficients. Variable length coding (VLC) is widely used for entropy coding due to its efficiency and simplicity, where the entropy encoder assigns one variable length codeword to each of the symbols, and VLC tables are designed such that symbols appearing more often are encoded by shorter codeword, thus resulting in a short average code length. However, both Run-Length VLC in H.263 and CAVLC in H.264 are inefficient in coding clustered nonzero coefficients, which commonly appear in the low-frequency region of transform coefficients, especially for high-resolution, high-complexity image/video. Instead, Hybrid variable length coding (HVLC) is proposed to takes advantage of the clustered nature of the quantized nonzero coefficients in the low-frequency (LF) region and the scattered nature of the quantized nonzero coefficients in the high-frequency (HF) region by employing two types of VLC schemes. Since conventional RL-VLC is efficient to code scattered nonzero coefficients, it is adopted by HVLC for coding HF region, while several new efficient schemes for coding LF region are proposed, such as 2DP1DA to code the run of zero coefficients and run of nonzero coefficients as a pair, JPAC to jointly code the 2D position and amplitude information, and 3D-VLC to code the run of zero coefficients, run of nonzero coefficients, and the number of trailing ones in the nonzero cluster as a triple. Compared with CAVLC in H.264 with fixed 8×8 transform, 3-4% coding reduction is achieved by HVLC scheme. Context-adaptive HVLC scheme is in investigation.

Distributed Estimation in Resource-Limited Wireless Sensor Networks

A common goal in most WSN applications is to reconstruct the underlying physical phenomenon, e.g., temperature, based on sensor measurements. Distributed Estimation of unknown deterministic parameters by a set of distributed sensor nodes and a fusion center has become an important topic in signal processing research for wireless sensor networks, where sensor nodes collect real-valued data, perform a local data compression, and send the resulting messages to the fusion center, while the fusion center combines the received messages to produce a final estimation of the observed parameter. Subject to the severe resource (bandwidth and energy) constraints in wireless sensor networks, each sensor is allowed to transmit only a quantized version of its raw measurement with limited transmission energy, so we need to optimally design the quantization rule, fusion rule, and bandwidth and energy allocation to minimize the estimation distortion. In this project, we first introduced a concept of equivalent unit-resource MSE function, where the resource can be bandwidth for rate-constrained distributed estimation, or energy for energy-constrained distributed estimation. Based on the equivalent unit-resource MSE function, quasi-optimal distributed estimation algorithms are proposed, which is within a small factor (about 2) of the theoretical lower bound.

Network Lifetime Maximization in Multi-Hop Wireless Sensor Networks

Network lifetime is a critical concern in the design of wireless sensor networks. In the literature, many different lifetime definitions are used, such as, duration of time until the first sensor failure due to battery depletion, fraction of surviving nodes in a network, and mean expiration time etc. However, these notions of network lifetime mainly focus on the time until the first node or a fraction of nodes deplete even though the remaining network may be still functional from the application perspective. Instead, we introduce a notion of function-based network lifetime, which focuses on whether the network can perform a given task, where the network is considered functional if it can accomplish a task within the distortion requirement, otherwise it is nonfunctional. The function-based network lifetime is defined as the task cycles accomplished before the network becomes nonfunctional. Subject to the flow conservation and energy constraints in wireless sensor networks, generally, the function-based network lifetime maximization problem turns to be a non-linear programming problem, which calls for joint source coding and routing optimization.

Energy-Efficient Cluster-Based Distributed Estimation in Wireless Sensor Networks

In this project, we consider the cluster-based distributed estimation in wireless sensor networks where the whole sensor field is divided into several clusters, the cluster members only communicate with their cluster head, and the cluster head makes local estimation and communicates with the fusion center, where the final estimation is made. Here, the major challenge is to determine the optimal cluster size and the number of clusters in order to minimize the energy cost. During this project, we first introduce a hybrid cluster-based estimator, which can potentially save energy while keep the estimation performance. Then we address the optimal tradeoff between the cluster size and the number of clusters for a special network topology – ring network, where all the sensors are uniformly located on a circle whose center is the fusion center. Further, we propose two greedy algorithms to cluster the general sensor networks to minimize the total energy cost of cluster-based estimation scheme by modeling the network as a directed or undirected graph. Simulation results not only show that the energy cost is reduced by cluster-based estimation scheme compared with the parallel estimation scheme, but also show that more energy is saved by our proposed clustering methods compared with the k-mean clustering method.

Progressive Streaming for Textured 3D Models

A well known technique in 3D graphics is texture mapping where an image is mapped to a mesh surface. Texture mapping is effective in adding reality or desired surface details that are expensive to present by solely using geometry. The nature of texture mapping, however, complicates the compression of textured 3D models. Essentially, the geometric accuracy of the mesh may no longer be the primary fidelity metric since they can be compensated by the mapped texture. The problem becomes more complex when the textured model is transmitted in a rate-constrained environment such as bandwidth-limited channels or rendered by devices with limited computing capability. In these cases, the textured model needs to be compressed into a hierarchical bitstream where the number of bits transmitted to the client depends on the available resources. In this project, a joint mesh and texture optimization framework was studied for rate-constrained transmission/rendering of textured 3D models. We developed a fast quality measure (FQM) to estimate the quality difference of textured models with simplified meshes and resolution-reduced textures. Based on the proposed quality measure, bit-allocation algorithms were developed to find optimal bit distributions between the mesh and the texture under constrained bit rates.

Latency-Minimized Delivery of 3D models in Lossy Networks

Three-dimensional (3D) meshes are used intensively in distributed graphics applications where model data is transmitted on demand to users’ terminals and rendered for interactive manipulation. For real-time rendering and high-resolution visualization, the transmission system should adapt to both data properties and transport link characteristics while providing scalability to accommodate terminals with disparate rendering capabilities. In this project, we study transmission systems and protocols for on-demand delivery of 3D models over lossy networks. The first result of this research is a 3D transmission protocol, named 3TP, which transmits important data reliably using TCP and the remaining, less important, data using UDP. Moving further from 3TP, we studied a transmission system using hybrid unequal error protection (UEP) and selective retransmission for multi-resolution 3D models. In the proposed system, hierarchal data batches of the multi-resolution mesh are protected preferentially according to their distortion-rate performance, network parameters, and channel statistics estimated by the transport layer. To minimize the response time, the transmission mechanism is designed to have linear computational complexity. In addition, by integrating TCP-friendly congestion control into the system, the proposed system achieves smooth performance over time as well as bandwidth fairness for parallel applications in the network. Compared with our previously proposed 3D transmission protocol (3TP), the proposed system achieves 20-30% reduction in transmission latency while delivering the same level of rendering quality.

Multi-Streaming of 3D Scenes with Scalable Partial Reliability

Preceding research efforts focused on the scalable coding and transmission of individual 3D models. When considering 3D graphical scenes, which contain pluralities of objects in one geometric space, interactions of the objects with and within the view space need to be taken into account. Depending on factors such as the object coordinates and the view space, the objects may require different LODs to be displayed with desired quality, or may not need to be rendered at all if, for example, the object falls outside of the view space. Scalable and high-quality presentation of the 3D scene under resource constraints therefore requires a joint consideration on the 3D objects in combination with multi-resolution coding. In this research, we studied an application and transport cross-layer mechanism for streaming 3D scenes. Incorporating a new IP transport protocol named the stream control transmission protocol (SCTP), we developed a multi-streaming framework for scalably encoded 3D scenes with rate-distortion optimized partial reliability. To preserve the manipulation independency of multiple objects in data delivery while provide preferential treatment for different objects as well as different layers of each object, transmission of the objects is performed over respectively sequenced streams. A rate-distortion optimization framework is then developed, which determines an optimal level of transport reliability for every chunk of data in each stream, taking into account the rendering importance of the object, the distortion-rate performance of the data chunks, and the statistics of the network link. Empirical studies showed that the proposed framework maximizes the display quality of the scene while minimizing the amount of data that needs to be processed by the client’s rendering engine.

Parity-Object Embedded Streaming for Synthetic Graphics

In scalably coded 3D graphics, multiple objects are encoded and decoded independently, and multiple objects, as well as multiple LODs of each object, have unequal importance with respect to the display quality. For example, our earlier research has shown that, under the same overall bit rate, two pairs of meshes and textures with different LODs yield a substantial quality difference for the displayed 3D model. Efficient transmission of 3D data in a lossy environment should properly account for both properties. On one hand, the coding independencies of multiple objects suggest that objects be packetized separately and packets from different objects be delivered in respective sequences. Thus, losing one packet will only corrupt or delay the decoding of a particular object, while decoding of other objects can still proceed. On the other hand, unequal error resilience is desired for multi-resolution objects in order to provide preferential error protection for more important objects as well as more important layers of each object. In this project, unequal error protection (UEP) schemes were first proposed for single mesh or image object. Joint unequal error protection for multiple graphic objects were then studied. An object-oriented mechanism was proposed, in which the packets of graphic objects are protected concurrently while also preferentially by a plurality of FEC codes. The parity data of each FEC code is treated as a separate object parallel to graphic objects. Based on weighted distortion-rate properties, an optimization framework performs rate allocation between graphic objects and parity objects and generates parity data correspondingly. Finally, all the objects are transmitted in an interleaved manner to allow equally fast access to each object at the receiving end.

Vector Quantization for Multi-Resolution Mesh Compression

using edge-collapse operations, predicting the coordinates of collapsed vertices, and coding prediction residuals along with connectivity information that tells which edges should be split to recover the collapsed vertex. Entropy coding was conventionally used in coding coordinate residuals in separate spatial dimensions, while vector quantization (VQ) was only used in single-resolution mesh compression to code the vertex geometry jointly. In this project, we studied the incorporation of vector quantization with a multi-resolution hierarchy, and proposed a VQM algorithm for multi-resolution mesh compression, which improves compression efficiency considerably compared to its preceding algorithms. VQM focuses on coding individual mesh objects with a pre-generated codebook using separate training models. In contrast to individual models, a 3D scene database comprises various objects interacting in the same view space. Simply applying VQM to code all the objects independently would not be optimal, as different objects have unequal display importance resulting from their interactions. For maximized scene quality, generation of the codebook should be considered jointly with the coding process, taking into account the unequal importance of different objects. To provide a solution, we proposed a scene-adaptive coding system, which uses mesh objects contained in the scene database to generate the codebook. To account for unequal display importance of different objects, a weighted codebook training algorithm is designed, and a rate-distortion optimization framework is developed to code multiple objects jointly under rate constraints.