Where was this photo taken? AI knows instantly

Imagine playing a new, slightly modified version of the game. GeoGuessr. You are faced with a photo of an average American house, perhaps two stories with a front yard on a cul-de-sac and an American flag flying proudly out front. But there’s nothing particularly distinctive about this house, nothing to indicate what condition it’s in or where the owners are from.

You have two tools at your disposal: your brain and 44,416 low-resolution, bird’s-eye view photos of random places in the United States and their associated location data. Could you match the house with an aerial image and locate it correctly?

It definitely couldn’t, but a new machine learning model probably could. The software, created by researchers from China Petroleum University (East China)searches a database of remotely sensed photographs with associated location information to match the street-side image (of a house or commercial building or anything else that can be photographed from a road) with an aerial image in the database. While other systems can do the same, this one is pocket-friendly compared to others and super appreciated.

In the best case (when faced with an image that has a 180-degree field of view), it succeeds up to 97 percent of the time in the first stage of narrowing down the location. That’s better than or within two percentage points of all other models available for comparison. Even in less than ideal conditions, it performs better than many competitors. When pinpointing an exact location, it is correct 82 percent of the time, which is three points behind the other models.

But this model is novel for its speed and memory savings. It is at least twice as fast as similar ones and uses less than a third of the memory they need, according to researchers. The combination makes it valuable for applications in navigation systems and the defense industry.

“We train the AI to ignore superficial differences in perspective and focus on extracting the same ‘key landmarks’ from both views, turning them into a simple, shared language,” he explains. Peng Renwho develops signal processing and machine learning algorithms at China University of Petroleum.

The software is based on a method called deep cross-view hashing. Instead of trying to compare every pixel of a street image with every image in the giant bird’s-eye-view database, this method relies on hashing, which means transforming a collection of data (in this case, aerial and street-level photographs) into a string of numbers unique to the data.

To do this, the university’s research group uses a type of deep learning model called a vision transformer that divides images into small units and finds patterns between the pieces. The model can find in a photograph what it has been trained to identify as a tall building, a circular fountain or a roundabout, and then encodes its findings into numerical strings. ChatGPT is based on a similar architecture, but finds patterns in text instead of images. (The “T” in “GPT” stands for “transformer.”)

The number that each image represents is like a fingerprint, he says Hong Dong Liwho studies computer vision at the Australian National University in Canberra. The numerical code captures unique characteristics of each image that allow the geolocation process to quickly narrow down potential matches.

In the new system, the code associated with a given ground-level photograph is compared to those of all aerial images in the database (for testing, the team used satellite images of the United States and Australia), yielding the five closest candidates for aerial matches. Data representing the geography of the closest matches is averaged using a technique that outweighs the closest locations to each other to reduce the impact of outliers, and displays an estimated location from the street view image.

The new geolocation mechanism was published last month in IEEE Transactions on Geoscience and Remote Sensing.

Fast and memory efficient

“Although not a completely new paradigm,” this paper “represents a clear advance within this field,” Li says. Because this problem has been solved before, some experts, such as computer scientists nathan jacobsat Washington University in St. Louis, aren’t as enthusiastic. “I don’t think this is a particularly innovative article,” he says.

But Li disagrees with Jacobs: He believes this approach is innovative in using hashing to make image matching faster and more memory-efficient than conventional techniques. It uses just 35 megabytes, while the next smallest model Ren’s team examined requires 104 MB, about three times as much space.

According to the researchers, the method is more than twice as fast as the next one. When comparing street-level images to a dataset of aerial photographs from the United States, the time it took the runner-up to make the comparison was around 0.005 seconds. The Petroleum group was able to find a location in about 0.0013 seconds, almost four times faster.

“As a result, our method is more effective than conventional image geolocation techniques,” says Ren, and Li confirms that these claims are credible. Hashing “is a well-established route to speed and compactness, and the reported results align with theoretical expectations,” Li says.

Although these efficiencies seem promising, more work is required to ensure this method works at scale, Li says. The group did not fully study realistic challenges such as seasonal variation or clouds blocking the image, which could affect the robustness of geolocation matching. In the future, this limitation can be overcome by introducing images from more distributed locations, Ren says.

Still, experts say it’s now worth considering long-term applications (beyond a super-advanced GeoGuessr).

There are some trivial uses for efficient image geolocation, such as automatically geotagging old family photos, Jacobs says. But the most serious thing is that navigation systems could also take advantage of a geolocation method like this. If GPS fails in an autonomous vehicle, another way to find locations quickly and accurately could be useful, Jacobs says. Li also suggests it could play a role in emergency response within the next five years.

There may also be applications in defense systems. DiscovererA 2011 project of the Office of the Director of National Intelligence, it aimed to help intelligence analysts learn everything they could about photographs without metadata using reference data from sources including aerial images, a goal that could be achieved with models similar to this new geolocation method.

Jacobs puts the defense application in context: If a government agency sent a photograph of a terrorist training camp without metadata, how can the site be geolocated quickly and efficiently? Deep cross view hashing might be of some help.

From the articles on your site

DOST-MIMAROPA to hold Regional Science, Technology and Innovation Week 2025 in Occidental Mindoro – FlipScience