What is ImageNet and Why 2012 Was So Important

“We’re going to map out the entire world of objects.”1 That promise from Princeton alumnus and (now) Professor of Computer Science at Stanford University, Fei-Fei Li,2 triggered events that are changing the future of medical imaging for the better. It started with an idea in 2006 that came to be known as ImageNet. 

What is ImageNet? 

This extensive database for the development of visual object recognition software dates back to the late 1980s. Princeton’s National Medal of Science recipient, George Miller,3 pioneered a hierarchical structure for the English language. Providing more than a dictionary definition, Miller’s WordNet organized words according to their relationship to other words. The objective was to align language with machine-readable logic.

Li, then teaching computer science at the University of Illinois Urbana-Champaign, was grappling with obstacles in machine learning. “Overfitting” refers to a model that learns irregularities from a limited data set. The model potentially memorizes “noise” rather than relevant characteristics. Because it can only recognize what it has already seen, it cannot perform reliably on unfamiliar data. The opposite constraint, an under fitted model, over-generalizes and cannot distinguish correct patterns between data.

The dilemma was teaching machines to make better decisions. Li’s solution was to build an improved dataset. Her early inspiration was the fact that WordNet indexed over 155,000 words. Li’s dataset, however, would encompass visual images of objects (such as animals) and concepts (like love). From these benchmarks, the machine would learn to identify new data.

The ImageNet project officially started in 2007, with a team of enterprising minds from Princeton faculty and student body. There were serious obstacles, including costs of human labor and flawed computer-vision algorithms. Then a graduate student introduced Li to Amazon Mechanical Turk, a resource for cost-effective outsourcing of processes and jobs. Less than three years later, the data consisted of more than three million images, each carefully labeled and segmented into over five thousand categories.

ImageNet partnered with the PASCAL Visual Object Classes (VOC)4 European competition in standardized image datasets for object class recognition. The collaboration created an intelligence boom in 2012. 


The start of something much bigger 

In the 2012 competition, a team from the University of Toronto submitted AlexNet5 – a deep convolutional neural network architecture.

In the first year of the competition, every team had an error rate of at least 25%. In 2012, AlexNet, the first team to use deep learning, was the only competitor to achieve an error rate below 25%. The next year, all high-scoring competitors used deep neural networks, and nearly every team had pushed their error rate below the 25% mark. By 2017, competitors were scoring less than five percent, then two percent.

This meant much more than winning a competition. It proved that training on ImageNet gave models a big boost, requiring only fine-tuning for other recognition tasks. Convolutional neural networks trained in this manner find patterns at the pixel level, making thousands of computations through ascending fields of abstraction – a concept called transfer learning. 

These convolutional neural networks are finding their way into everyday life, like Facebook automatically recommending photo tags, iPhone’s voice recognition, and autonomous vehicles detecting surrounding objects. Deep learning is also becoming an important component of medical imaging. 

What ImageNet 2012 means to radiology 

Image recognition deep neural networks (DNN), are already making inroads in areas of medical diagnostics such as:

  • Diabetic retinopathy screening
  • Classification of skin lesions
  • Detection of lymph node metastasis

Convolutional neural network technology is improving the efficiency of protocol determination, calculating optimal contrast medium dose without a reduction in image quality, and many other specific areas of radiology practice.6

The potential for this technology in image reconstruction – the mathematical process of generating three-dimensional images from X-ray “slices” – is immense. The importance of reconstruction lies in creating the highest quality images possible, using the lowest possible radiation dose.

Early analytical reconstruction produced clear images but required a higher dose. Images from lower-dose iterative reconstruction lacked realism, and the process took longer. Deep learning is the newest evolution in image reconstruction. Trained with the clarity of first-generation images, this smart system adeptly recognizes patterns from labeled (such as benign or cancerous) images in the large control dataset.

Today’s radiologist has the benefit of a rapid accumulation of electronic data – many more images per exam. However, compounded with increasing patient volumes, gathering relevant points from this overload of data can feel like drinking from a firehose.

Artificial intelligence does not replace the radiologist’s expertise, nor does it diagnose patients. A properly trained algorithm can do a terrific job of identifying an object in the context of its immediate surroundings, but it does not actually understand what it sees. Rather, it performs as a software assistant, efficiently assessing patterns in visual data and marking abnormalities for the radiologist’s further diagnosis.


  1. The data that transformed AI research – and possibly the world. Quartz. June 17, 2019.
  2. People. Stanford Vision Lab. June 17, 2019.
  3. George Miller, Princeton psychology professor and cognitive pioneer, dies. Princeton University. June 17, 2019.
  4. The PASCAL Visual Object Classes (VOC) Challenge. Springer. June 17, 2019.
  5. Understanding AlexNet. Big Vision LLC. June 17, 2019.
  6. An overview of deep learning in medical imaging focusing on MRI. Elsevier. June 17, 2019.