Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing and becoming available ubiquitously. However, searching and visualizing 3D content remains a great challenge. In this paper, we propose and present the development of a novel approach for creating hypervideos, which ease the 3D content search and retrieval. It is called the dynamic hyperlinker for 3D content search and retrieval process. It advances 3D multimedia navigability and searchability by creating dynamic links for selectable and clickable objects in the video scene whilst the user consumes the 3D video clip. The proposed system involves 3D video processing, such as detecting/tracking clickable objects, annotating objects, and metadata engineering including 3D content descriptive protocol. Such system attracts the attention from both home and professional users and more specifically broadcasters and digital content providers. The experiment is conducted on full parallax holoscopic 3D videos “also known as integral images”.
Three-dimensional (3D) imaging system remains an attractive topic for the scientific community, entertainment and display industry, opening a new market [
H3D imaging is first proposed by Lippmann [
This paper presents the dynamic hyperlinker, which is a software tool that enables the visualization and search of a H3D multimedia repository. It simplifies and easies the H3D multimedia search and retrieval process by creating hypervideos. Hypervideos is a holoscopic 3D (H3D) video clip with an associated XML based header file which contains 3 dimensional descriptions of the H3D videos, such as positioning information, objects in the scene, search result of objects in the scene, objects annotations, related video links and several other metadata.
The XML-header file is pre-processed and prepared by multiple 3D modules, which are Centre-view, segmentation, depth map, content based search and retrieval and metadata synchronization and engineering.
Digital information processing, especially search and retrieval, has been a popular research topic for the past few years. A number of EU funded projects have been completed that aimed to develop frameworks for multimodal processing, unedited multimedia indexing/annotating, and information extraction [
2D multimedia search and retrieval faces great challenges considering reliability and accuracy. Thanks to the 3D imaging technology that overcomes some of challenges such as depth, size measurement and more than a single perspective in 2D multimedia processing and also increasing the demand that accompanies many new practical applications, such as multimedia search on media asset management systems. Diverse requirements derived from these applications impose great challenges and incentives for research in the field.
The dynamic hyperlinker is an advanced holoscopic [
ease the 3D search and retrieval. It links objects in the scene to similar/matching object(s) in the repository, while the video clip is being played. The user clicks on an object in the scene to search for similar objects in the repository. Also it allows professional users to add a new annotation such as object description and relevant video links. The changes are synchronized throughout the video clip for the selected objects so the user does not need to repeat the process for every frame.
As hypervideo generation involves multiple intensive 3D processing, it does pre-processing for creating hypervideos that is analyzing and annotating H3D video content and then creating and synchronization metadata to prepare a header of H3D videos in following steps:
・ Generation of center viewpoint images and 3D depth map images of H3D video images that are used by the object segmentation module to generate a segmentation mask with its bounding box.
・ Generation of metadata of H3D video images using a segmentation mask and bounding box in the content based search and retrieval module.
・ Synchronization of the generated metadata files to create an xml-header file, which is associated with the H3D video to produce a hypervideo.
・ Apply the hypervideo in the hyperlinker tool, which creates hyperlinks and easies 3D content search and visualization.
All the above processes are performed online for creating hypervideos using 3D content descriptive protocol as shown in
Center-views and 3D depth map are generated from H3D videos that are used for segmenting 3D objects as well as creating bounding boxes for 3D objects. Then these 3D depth maps, segmentation masks, and bounding boxes related information are used to perform a content based search for associating or creating linking information “Metadata” for identical objects in the H3D videos. The content based search and retrieval module generates a single xml-indexed for every H3D video frame. The index xml file holds complete information of each frame, such as the position of objects in the scene, including object search results, as well as search result descriptions and target URL/paths. All these metadata files are fed into the metadata synchronization module, which reprocesses and merges the metadata files to create a single optimized xml-header file for the input H3D video clip. The xml-header file with H3D video are combined together to create a hypervideo, which is replayed by the hyperlinker tool to facilitate interactive 3D content search and retrieval as well as visualization, as shown
in
The dynamic hyperlinker player loads the H3D video clip with the associated xml-header file. The dynamic linker module uses xml-header file to create hot links on the screen, while the H3D video is being played. It highlights clickable 3D objects with a redline box if the highlight feature is enabled in the settings. In addition, it monitors the mouse cursor movements and when a region of selectable or clickable objects are hovered over, the cursor icon is changed from default one to hand icon to alert the user that the object is clickable or hyperlinked as shown in
・ Open and play a hypervideo.
・ Play/pause/stop H3D video clip.
・ Add/remove/update the xml-header file that is automatically updated.
・ Add/remove/update title of bounding boxes.
・ Export/import/save/save-as the xml-header file.
・ Highlight selectable objects.
・ Show search and retrieval results including thumbnail.
・ Link selectable objects to any destination resource.
・ Play search result items on user click from the particular scene/frame.
・ Preview hyperlinked video.
・ User feedback e.g. adding/deleting search results.
Create a new bookmark for selectable objects that is automatically synchronized to the whole scene.
The H3D video gets paused automatically when a selectable object is clicked, as the system invokes the content-based search and retrieval and the found results appear in the list box. At this stage, the user can remove any irrelevant results from the result list if necessary or add a new bookmark for the selected object. The changes are synchronized to the xml-header automatically and saved which overwrites any existing ones, unless the “save as” feature is selected. The system re-indexes the whole sequence of frames.
The re-index works only on objects in the scene. It is valid until the object disappears from the scene. If the object goes away from scene and then comes back, the system treats it as a new object. This is because the content based search and retrieval module retrieves objects based on their visual information.
All the components process H3D video offline including the metadata engineering and synchronization. The hyperlinker tool plays a hypervideo and performs a real-time hyperlinking using the xml-header. The hyperlinker’s performance is monitored and analyzed.
The 3D content descriptive protocol “3DCDP” is a descriptive meta-language that describes and annotated holoscopic video images content using Extensible Markup Language “XML”. It is used for 3D video content indexing including tagging selectable objects in the video clip.
In addition, it is used for annotating H3D video content as well as meta-information exchange between components e.g. segmentation, content based search & retrieval, and hyperlinker tool which use it to exchange action-message in a single format in the system.
The proposed 3DCDP has enough elements and attributes shown in
Description | Time in sec. |
---|---|
Bounding Box Loading Time | 0.0009766 |
A single Object Highlighting Time | 0.0019531 |
Adding New Item―Re-indexing Time | 0.0742187 |
Removing an Item―Re-indexing Time | 0.0019532 |
Loading/Playing First Frame Play | 0.2792969 |
Loading/Playing First Frame Play with Highlighting Enabled | 0.2841797 |
Search and Retrieval Time on an Object Click | 0.0175782 |
Element | Description |
---|---|
CDP | 3D Content descriptive protocol element |
Content Info | Content information |
Search Result Item | Research result item |
Max | Maximum x, y values of bounding box |
Min | Minimum x, y values of bounding box |
Bounding Box | Bounding box values |
Operator | Operator/classifier |
Operator Desc | Operator description |
Annotation | Annotation element |
Scene | Scene element |
Content | Content element |
Search Result | Search result item |
UUID | Unique ID |
seen,
An annotation has a given an ID as well as start-frame and end-frame which is its validity. The attributes are used to identify the annotation if there is more than one annotation. As seen in
The proposed 3DCDP shown in
H3D image offers multi-angular views and to reduce exhausted visual processing and complexity, it is proposed to use center-view of H3D image(s) (see
In addition, supposing an extreme situation with a wide viewing angle some users might see a certain object sooner than others would, e.g. if one object appears from behind another object: while a viewer seeing the scene from the side might already see two objects, those watching from a front view would not see the object in the background. In other words, for every available perspective any given object would be visible and thus clickable
at a different time. In that case it would be very difficult to determine when an object would be clickable (frame x to frame z from view A vs. frame v to frame y from view B, etc.) and as an editor could not edit the video for every possible view. It is decided to neglect this theoretically complex situation and assume a simpler situation where all viewers see any given object at roughly the same time. To this end center viewpoints were extracted, which form the basis of all subsequent steps of the hyperlinking process.
In order to extract the 2D high resolution centric-view from holoscopic 3D video image, the barrel distortion is corrected to avoid any errors in the centric-view and also this is to ensure the H3D image is distortion free.
The depth map is generated from H3D video images and this work has been successfully published recently in [
The segmentation process finds objects and segments them from the other objects and background in the scene. It uses center-view images of H3D images because it simplifies the process massively, due to angular information of H3D images. In addition, it generates bounding-box metadata of the segmented objects that describe the position of object(s) in the scene and its output result is consumed by hyperlinker and content based search and retrieval module. The hyperlinker uses a bounding-box to detect user mouse hover as well as to pick up object information when the user clicks it, whereas content base search and retrieval use the segmented object(s) to perform a search for similar objects in the repository.
The Search and Retrieval Tool is executed offline to prepare H3D video sequences metadata for interactive
search and navigation. In particular, the search and retrieval performs low level features similarity in the multimodality level to find associated similar objects in the H3D video and generate metadata, which represent the H3D video content and this enables clicking on objects in the scene to perform a search for similar objects.
The necessary input files are: 1) the H3D video sequence; 2) the center-view images of the H3D video frames; 3) the depth maps for all H3D video frames; 4) the corresponding segmentation mask with its metadata (that contains the bounding boxes information for the clickable objects in the scene). A brief schematic overview of
the data flow is presented in
The input data files fall under the S&R data structure format, such as various visual data descriptions e.g. H3D images, depth map, viewpoint images, and low level features. This facilitates the easy visual parsing for the Search and Retrieval Framework.
The system analyses every frame of input H3D video clip and it generates a single metadata file for each frame. The metadata file contains a single annotation, which has one or more bounding-boxes depending on the objects in the scene. It performs a visual similarity search for object(s) and embeds the search result as an element in the bounding-box node.
H3D videos frames are analyzed and processed by the segmentation and content based search & retrieval module, which generate a metadata file that describes H3D video frames in structured manner using 3D-CDP meta-language (see
The bounding-box is valid for a single frame as it is for video objects. Therefore object positions are in constant change from frame to frame. In addition, 3D objects in the scene will appear differently from different perspectives so the content based search and retrieval system may do the matching with the particular perspective of the object. As a result, there are hundreds of metadata files containing meta-description of scenes. To overcome this issue exhausted and complex metadata files management, we propose a multimodal metadata synchronization technique for re-engineering the metadata file to create a single optimized xml-header of H3D video. The proposed metadata synchronization aims at merging all metadata files by removing redundant sections and then putting meta-nodes in a structured way, which will not overlap each other and also it structures the meta-nodes in such way that is more manageable in terms of re-manipulating it. The process creates a single xml-header file of the H3D video and its file structure is shown in
It imports all frames metadata files and decomposes the file nodes as low level as annotation. It then creates a new metadata file (xml-header), which has a single scene with multiple annotations. Each frame metadata file has a single annotation; therefore it is treaded as an annotation node in a newly created file. It reforms a new scene and a new annotation with correct frame locations, which also presents statistical data such as scene-dura- tion, and annotation start/end-frame. The statistical data can be easily processed to generate a visualization graph e.g. a particular object (bound-box) live on the scene and number of objects in the scene/video as well as complexity of the video content. Also this can be used for navigating the video content without replaying the video and such video summarization techniques are widely adopted by consumers who like to view the summary of the video before they start watching it, such as movie trailer.
The proposed tool supports holoscopic 3D videos and
It is further developed to support editing metadata of xml-header file as well as importing/exporting and saving it as a new project, which can be later opened without losing the changes and without having to save it on its
original version. In addition, it has been revised to support center-views of H3D videos because holoscopic 3D content requires a special “H3D Display”, which is not available widely and also it opens to those who want to replay 2D version of H3D content. The screenshot of playback of centric view of H3D video is shown and as seen objects in the scene are highlight and titled if the object(s) has an associated title.
As it allows editing metadata of the H3D video, the user can select an object and add a new associated search result or delete a search result. The system propagated the changes to the whole scene automatically so next when one click on the object, the updated search result is shown. In addition, the user can back-up the changes by exporting the metadata without saving it to the original file. The exported file can be imported later if necessary. That shows the screenshot of search result screen that shows list of similar objects in the repository excluding the current video. It also shows the search results descriptions, video names and the scene frame number in which the object is found. At this stage, any of the search result items can be clicked to play it and the system will start playing the video from this particular scene/frame.
In this paper, dynamic hyperlinker which is an innovative solution for 3D video search and retrieval includes a 3D content descriptive protocol for 3D search and retrieval that enables users to search, retrieve and visualize holoscopic 3D video clips by clicking on selectable object in the scene while the video clip is being played. At this stage, the holoscopic 3D videos are preprocessed by 3D operators such as center viewpoint extraction, depth map creation, segmentation and content based search and retrieval module. The 3D operators use a 3D content descriptive protocol to exchange meta-message as well as annotating the media content. The proposed system advances user interaction and easies the multimedia content search and retrieval. The experiment is conducted on holoscopic 3D content as well as its 2D center-view content and it is applicable and scalable to any other 3D content. It is worthwhile mentioning that the dynamic hyperlinker performs well on 3D video sequences and it is an interactive tool for 3D content search and retrieval as well as visualization. Furthermore, it enhances 3D data visualization and retrieval for content providers such as broadcasters as it allows to bookmark objects and allows the insertion of tags, textual descriptions or/and links to the objects in a scene.
This work was supported by the EU under the ICT program as Project 3D VIVANT (3D LiVe Immerse Video-
Audio Interactive Multimedia) under EU-FP7 ICT-2010-248420.
Mohammad Rafiq Swash,Amar Aggoun,Obaidullah Abdul Fatah,Bei Li, (2016) Dynamic Hyperlinker: Innovative Solution for 3D Video Content Search and Retrieval. Journal of Computer and Communications,04,10-23. doi: 10.4236/jcc.2016.46002