Apache Hadoop is an open-source software framework for storing and processing massive datasets across clusters of computers, designed for scalability and reliability. It works by breaking down large data jobs into smaller ones that run in parallel, using a distributed file system (HDFS) and a processing model called MapReduce. Key components include HDFS, YARN (resource management), Hadoop Common, and MapReduce.