Exploring the World of Multimedia Information Retrieval Systems: A Comprehensive Guide
The article provides an introduction to Multimedia Information Retrieval (MIR) systems, which are designed to search and retrieve information from multimedia sources like images, audio, and video.
It explains the different components of MIR systems, such as feature extraction, indexing, and retrieval, and discusses some of the challenges associated with MIR, including the subjective nature of multimedia data and the need for efficient algorithms to handle large datasets.
Overall, the article offers a helpful overview for those interested in learning about MIR systems.
What is Multimedia?
We currently live in a digital age when finding information is incredibly simple with just one click due to the development of the Internet. We are now able to conduct information searches from any place.
High-speed Internet connection allows everyone to utilize software to search for and manage information from any sources in the high-tech world. Every day, people engage with digital data, and this interaction helps to produce specific types of data, such as multimedia data. Data that spans multiple mediums is referred to as multimedia. It often refers to data that represents various media formats used to record details and impressions about specific things and occasions. The most widely used formats of data are numbers, alphanumeric characters, text, images, audio and video.
In common usage, people refer to a data set as multimedia only when time-dependent data such as audio and video are involved.
Three subclasses of the multimedia data are distinguished: multidimensional (also known as spatial), dynamic and static.
As a medium for interpersonal communication, text information has predominated. However, as computing capabilities like as disk and memory space, processor power, and other factors increase, other media formats, including audio, image, and video, are gaining greater value. The amount of photo content, available worldwide, is enormous, since many individuals own smartphones with cameras, and webcams, motion cameras and tablets with cameras are widely used all over the world.
This year people will shoot an estimated 1.5 trillion images, according to predictions [How Many Photos Will Be Taken in 2021?]. There is a need to search among various types of media because there is a massive amount of multimedia material in the globe and frequently a lot of multimedia content in a typical family (i.e., images). The systems that implement this market request are referred to as Multimedia Information Retrieval (Multimedia IR) systems.
Multimedia IR systems often offer services like multimedia information storage, indexing, searching and distribution. Additionally, they might include functions such as the extraction of descriptive information from multimedia data. The textual and non-textual information are vastly different from one another. Therefore, depending on the nature of multimedia data, several strategies and engines may be employed.
In reality, text content is what Multimedia IR systems use most frequently. Examples include Bing, Google and others. As a result of the fact that finding information of any kind online is the most valued human activity, other sorts of information are becoming increasingly significant. Statistics shows that numerous resources, including photos, videos and other content, are constantly appearing and disappearing from search requests in the Internet.
What are the Multimedia IR systems?
Let's make a brief overview of the Multimedia IR systems. Information retrieval (IR) is the process of finding information system resources from a collection of those resources that are pertinent to an information demand, according to Wikipedia. Furthermore, Multimedia information retrieval (MMIR or MIR) is a research area of computer science that aims to extract semantic information from multimedia data sources, according to Wikipedia as well. Information is gathered from a variety of sources, including directly perceivable media like audio, image, and video, indirectly perceivable sources like text, semantic descriptions, and biosignals, as well as not perceivable sources such as bioinformation, stock prices, etc.
A computer system for browsing, searching, and retrieving images from a sizable database of digital images is called an Image Retrieval system. In order to execute retrieval over the words annotated to the photos, the majority of conventional and traditional techniques of image retrieval employ a certain method of adding information to the images, such as captioning, keywords, titles or descriptions.
Therefore, when they discuss private Multimedia IR that searches through images and photos, they refer to the search based on the metadata, tags and content of the images and photos. The indexing and storage functionality for the multimedia information is also implemented by the Multimedia IR systems in response to market demand. However, those systems might also have features like the ability to extract descriptions from multimedia data. It goes without saying that if an average family does not have the Multimedia IR system to search for one item in thirty thousand photos, it would almost be impossible to find that one item by means of manual search. For this reason, the prospects offered by Multimedia IR systems would be of great interest to regular people.
The capacity to swiftly and simply search through the photo album is the essential requirement from the perspective of the end user when they are visiting the photo album. It will also provide cataloging and search result browsing in addition to search functionality. Obviously, the search systems must deliver accurate and complete search results.
User's needs in Multimedia IR Systems
Considering Multimedia IR Systems, it is appropriate to bear in mind the potential users and their information needs. Therefore, the Multimedia IR system should:
be able to store various types of multimedia data (or be integrated with other tools to do so);
be able to represent any type of multimedia information (photo, video, and audio) to the user via UI;
provide a simple way to edit the multimedia data;
be able to quickly and easily search through the content;
allow filtering of information by various sets of criteria;
provide relevance feedback;
include indexing and cataloging functions;
provide the possibility to find information by image example;
include a function of browsing through search results.
Professional's needs in Multimedia IR systems
Moreover, not ordinary people only, but also the businesses might find Multimedia IR systems useful due to their professional needs. For example, there might be different professions who access the photos or images often in their everyday work activity, such as:
professional photographers who work for businesses;
journalists who use cameras to produce multimedia news content;
individuals in various professions who require access to images, such as doctors looking for medical images or architects needing image examples to create buildings;
car engineers needing images and audio of car engines;
video content engineers searching for specific video segments and movies by their titles.
Multimedia IR systems analysis
Let's compare several multimedia IR systems by going over the above-mentioned requirements and needs. The best approach here would be to choose one offline and one online system, since some users could feel free to use multimedia IR systems that are integrated with online storage (like Cloud ones), while others want to utilize offline Multimedia IR systems. Similarly, the proprietary and open systems should be included in the comparison. Below is the list of them with brief descriptions:
Synology Moments– a system that is aimed at collecting all user’s images and videos in one secure location and organizing them in an entirely new way. Synology Moments is the brand-new photo solution for personal and home and personal use, offering a modern browsing experience with the image recognition technique.
Mylio– a free application that can be used to organize images, videos and other data types. This program can be used on any Mac, iOS, Windows and Android device. Mylio can automatically arrange the files depending on the calendar app and can run without the Internet to operate.
Google Photos– a photo sharing and storage service, developed by Google. The service automatically analyzes photos, identifying various visual features and subjects. Users can search for anything in photos, with the service returning results in three major categories: People, Places, and Things.
digiKam– a free and open-source photo organizing software that can handle more than 100K images. The program has all the photo organizing functionality needed, such as uploading, deleting and sorting images.
Apple photo– a photo management and editing application, developed by Apple.
Well implemented features
The comparison table demonstrates, how well the following features have been implemented:
keeping/browsing functions for the multimedia content of private photo albums;
extensive photo editing capabilities;
search by date/time (using metadata);
search by location (using metadata);
search by people (using face recognition technology over multimedia data content);
search by rating/marks (using metadata);
search by feedback and by comments.
From the user's perspective, all of the aforementioned features are well-developed and do not need to be improved upon or replaced by any other software.
To be developed features
In order to meet user demands, the following set of searching features must be developed:
search by events (currently possible through tagging only);
search by objects/subjects (only a few systems can recognize the objects on the photos);
search by emotions;
similar photos search (only a few systems are able to provide the search by a similar photo);
search by a photo author (is possible by applying tagging only);
search by a combination of color or texture (only a few systems can perform this).
Although the most recent multimedia IR systems may provide search operations, providing tagging must be implemented at a much earlier level than import.
What are the Photo Organizers?
Multimedia Organizers functions
A program that helps organizing photos is known as a photo organizer. There are a variety of photo organizer software programs available on the market, and each one provides various features and tools to assist in organizing photos. Thus, let’s consider main functions multimedia organizers perform:
they provide the possibility to search through a photo set by organizing photos based on their date, location, categories, etc.;
they offer an option of creating a folder structure and place the files in it;
they automate the import of new photos into IR Multimedia Systems.
Additionally, they make photo albums more searchable (that is driven by IR Multimedia Systems) by enriching the metadata (and improving its consistency) with minimal human involvement.
Therefore, they are chosen as a subject for the overview and comparison. A list of Photo Organizers for professional photographers is created, and those organizers can also be used by an ordinary user with a big photo album. Pertinent online courses are available to instruct users on how to arrange the photos. More detailed comparison of photo organizers is provided by Wikipedia and is available here.
Multimedia Organizer requirements
a user will be working with a large number of items, so automated approach is recommended;
a toolset that is available for a typical user and not for enterprises shall be used;
they shall provide integration with such Multimedia IR Systems as digiKam and Synology Moments;
they shall be able to identify inconsistent or nonexistent photo metadata;
they shall be able to suggest ways to solve inconsistency and absence.
Multimedia Organizer approaches to create folder structure
The goal is to prevent duplicate photos from being stored in the same folder. In order to construct subfolders based on the number of photographers or cameras, the folder name may be something like Date Shoot-Type Event Name; it could also contain a location or any other information the user desires.