Content recognition technology can detect music tracks and protect them from copyright infringement.
Image credit: stock.xchngYou're standing in line for a movie when a fantastic song starts playing through the speakers. You love the track but have no clue what it’s called or who the artist is. You take out your cell phone, dial a number, and point it towards the speakers. Within moments, you receive a text message with the song title, artist name, and even a link to purchase it.
The service you used employs content-recognition software to identify the song. These tools are useful for discovering music playing around you and can also help combat copyright violations, a major concern for both independent musicians and large corporations.
File-sharing platforms, peer-to-peer networks, and giants like YouTube offer ample opportunities for people to access content without paying for it. Until recently, companies relied on humans to identify copyright infringements and take action. While sites like YouTube typically depend on users to flag inappropriate content, some don’t see copyright violations as violations at all. At present, many businesses depend on staff to search for and report copyrighted footage, a tedious and inefficient process. However, this could soon change with the help of content-recognition software. In this article, we’ll break down how it works and how it benefits both individuals and companies.
Creating the Software
Limewire is just one of many file-sharing applications that are causing significant problems for media companies.
Photo credit: GNU Free Documentation LicenseA number of software firms plan to introduce tools capable of analyzing audio and video files, comparing them with a content database, and determining whether they are copyrighted. This software offers a fast and cost-effective way to sift through the massive content on the Internet, which is much more reliable than simply asking your friend if they know the song playing on the radio.
You might assume that developing a program to identify audio or video content would be simple, but it’s proving to be quite difficult. For example, there are numerous ways to encode a file, so a program that searches for matching code won’t work very well. A WAV file and an MP3 of the same song, from a programming perspective, will not appear identical. Additionally, songs and videos may be recorded at different bit rates, meaning that two MP3 versions of the same track might not align. Software that identifies songs through a cell phone needs to work despite the recording's quality or background noise.
Other issues exist as well. Some video pirates sneak recording devices into theaters and capture movies on their own cameras. There have been cases where projectionists set up digital video cameras in the projection room, recording a film on its opening night. Others bypass legal distribution by cropping or modifying videos. Any software designed to detect such recordings must look beyond identical files or programming language alone.
In the upcoming section, we will explore how audio files are identified and how the process addresses these challenges.
Content Recognition Software - Audio
The software breaks down a song into segments, searching for specific tags to identify it.
Image credit: stock.xchngThe first step in content identification is creating a database of material that other files can be compared to. For a record label, this would include their entire music catalog. The content-recognition software scans each track and generates a digital tag that uniquely identifies it. These tags are referred to as fingerprints or signatures.
The software analyzes the actual sound of the song, not just its encoding. Some programs look at the song's tempo and beat, while others focus on its amplitude and frequency. Fingerprinting software typically takes several short samples, only a few seconds long, from a single recording. Some software packages analyze entire audio clips to create a more detailed fingerprint. One current tool scans for landmarks -- distinct acoustic moments in the song -- then evaluates the sound surrounding these landmarks. Ideally, these landmarks will be easily recognizable when scanning other songs.
The programs utilize algorithms to process sound. A common approach is the Fast Fourier Transform (FFT) algorithm. This mathematical method breaks down complex signals, tracking any changes such as variations in tempo, beats per minute, or amplitude and frequency. These fluctuations are mapped and converted into a digital fingerprint, typically in numeric form.
After a record label builds its database, it’s ready to help identify songs for customers or to pursue cases of copyright infringement. In both scenarios, the software processes unknown audio clips in the same way as the songs in the catalog. It generates a hash—a short code based on the audio file's content. The software assigns digital fingerprints to the clips and compares them against those in the database. In the next section, we’ll see how it determines if the songs are identical.
To make sure content-recognition software identifies songs regardless of their format, developers focus on analyzing sounds within the human hearing range, much like MP3 files. The reason MP3 files are so compact is because they only encode the sounds within the human hearing range and ignore everything else. Content-recognition software doesn’t depend on the full range of sounds that could exist in the original recording to avoid missing MP3 versions of the track.
Recognizing the Sound
The software compares fingerprints that represent sound waves, searching for a match.
Photo courtesy of stock.xchngFrequently, the sound clips being analyzed are not perfect versions of the original song. The song may be cut short, or it could bear resemblance to a different track. This is when algorithms are especially useful. Their task is to compare the digital fingerprints and determine if the incoming audio clip matches a song (or part of a song) in the database, based on a specific probability threshold.
The identification method is similar to how forensics experts matched a suspect's fingerprints to those found at a crime scene. Before modern computer software and advanced fingerprint examination techniques were developed, specialists would look for points of similarity between prints. In most instances, experts required at least 16 matching points for a print to be confirmed as a match.
There isn’t a universal standard for the probability range in content-recognition software. Many programs offer users the flexibility to adjust the degree of similarity needed to confirm a match. For example, users can configure the program to only return results if the algorithm determines a 95% or higher probability that it’s a match. If the incoming clip doesn’t meet that threshold, an error message will be displayed.
When a match is found, a connected application can take over. This could involve sending song details to someone looking for the track's name, or it might flag a song on a website and alert the relevant record company’s legal department. Some record labels use this software to scan file-sharing websites or track content on audio-streaming platforms. The entire matching and analysis process typically takes only a few seconds.
In the upcoming section, we will discuss the unique challenges that video content poses compared to audio files.
Content-recognition Software - Video
Video analysis is far more complex than sound analysis.
Photo courtesy of stock.xchngRecently, Time Warner and Disney teamed up with YouTube to trial Google’s video content-recognition software. Much like audio recognition systems, this software creates a fingerprint for the video content and compares it to a database of known fingerprints. Yet, video analysis presents distinct hurdles that complicate the process.
For example, YouTube limits videos to a maximum of 10 minutes or 100 megabytes. Given that a clip might feature any segment from a film or television show that is copyrighted, the recognition software must analyze the full original content in a way that allows it to identify matches from a small snippet. While Google remains tight-lipped about the software’s approach, it’s likely that the system analyzes overlapping sections of the original content, generating multiple fingerprints for better match identification.
Video recognition software must also detect content even if it's been edited by the person uploading it. For instance, altering the color saturation in a video can deceive software designed to match color resolution. Cropping a video or uploading footage recorded with a video camera further complicates the task. Pirated films, often filmed from an angled position in theaters, present additional challenges for recognition systems.
One method developers are exploring involves using programs to create fingerprints based on analyzing motion characteristics changes in a video. However, this may not work if someone uploads a pirated video recorded with a hand-held camera. In some instances, the match probability range might need to be broader to catch all potential piracy cases. Film studios may find that they still require a human to manually review video clips to confirm infringement. However, the initial detection of possible video piracy will be far more efficient.
Video-identification software is still in the testing phase, although some companies are already showcasing their programs effectively. But even after the software is fully developed, identification challenges will persist. The enormous amount of video content remains a significant obstacle. Movie and television studios will constantly need to refresh their databases with fingerprints for new content released daily. While the piracy detection process may become more streamlined, ongoing maintenance and updates will still be necessary.
