Big Data and Data-Driven Methods in Computer Vision
- Historical Context:
- Geometric methods and hand-designed features dominated early computer vision, but modern methods rely heavily on data-driven approaches fueled by large datasets and deep learning.
- What Changed?
- The Internet: Provided billions of labeled and unlabeled images.
- Crowdsourcing: Allowed rapid collection of annotations for datasets.
- Deep Learning: Enabled automatic feature learning from raw data.
- Increased Computational Power: Moore’s Law has exponentially improved compute capabilities.
The Unreasonable Effectiveness of Data
- Big Idea:
- AI systems don’t necessarily need strong reasoning skills (e.g., human-like cognition). Instead, they can use brute-force methods if provided with enough data.
- This is exemplified by systems like Google Search, where "intelligence" lies in data volume and relevance, not deep reasoning.
- Quotes:
- Eugene Wigner: "The miracle of the appropriateness of the language of mathematics..."
- Peter Norvig: "Stop acting as if our goal is to author extremely elegant theories... embrace the unreasonable effectiveness of data."
Scene Completion
- Problem:
- Filling in missing regions of an image using parts of other images.
- How It Works:
- Step 1: Match the input image to similar images in a large dataset (e.g., Flickr with billions of images).
- Step 2: Extract and blend regions from the matched images to fill the missing part.
- Key Techniques:
- Scene Gist Descriptors: Low-dimensional representations of images used for matching.
- Graph Cuts and Poisson Blending: Seamlessly blend the extracted region into the input image.
- Dataset Scale:
- With larger datasets (e.g., 2 million images vs. 20,000 images), results improve significantly.
im2gps: Image Geolocation
- Problem:
- Predict the location of a photo based solely on visual features.
- How It Works:
- Match the photo to a database of geo-tagged images using features like:
- Gist descriptors.
- Bag-of-SIFT features.
- Color histograms.
- Results:
- With 6 million geo-tagged Flickr images, the system achieved reasonable accuracy for touristy photos.
- Modern Approaches:
- Deep learning models (e.g., CLIP, PlaNet) further improve performance by learning hierarchical spatial features.
Recognition via Tiny Images