Higher plants, algae and cyanobacteria absorb light energy to drive oxygenic photosynthesis. Light harvesting is the first step in the photosynthetic process and is mediated by pigment-binding proteins forming light-harvesting antenna systems. However, excess light can be harmful and can lead to protein damage due to the formation of reactive oxygen species (ROS), establishing a strong evolutionary pressure on photosynthetic organisms to develop potent photoprotective mechanisms [1–3]. Both functions, light harvesting and photoacclimation/photoprotection are mediated by members of the extended light-harvesting complex (LHC) protein superfamily in photosynthetic eukaryotes [1, 3–9]. The eukaryotic members of the extended LHC protein superfamily have a common origin and evolved from a cyanobacterial one-helix ancestor with a characteristic chlorophyll-binding motif that is strongly conserved across the entire extended LHC protein super family [1, 4–6, 8, 10, 11]. Apart from LHC superfamily proteins also other proteins are known to bind chlorophyll, examples are the prochlorophyte Chl a/b binding proteins  or the IsiA chlorophyll-binding protein in cyanobacteria . The chlorophyll binding motifs of these proteins are non-homologous to motifs found in the LHC protein super family [5, 12].
Eukaryotic photosynthetic organisms evolved by the uptake of an ancient cyanobacterium and the subsequent reduction of the endosymbiont to an organelle. Soon after the evolution of primary plastids, photosynthetic eukaryotes split into three lineages, chlorophytes (green algae and land plants), rhodophytes (red algae) and glaucophytes [14, 15]. During this process, structure and composition of the light-harvesting systems changed: Phycobilisomes, the main light harvesting systems in cyanobacteria, were lost in chlorophytes and their function was taken over by members of the extended LHC protein superfamily. Rhodophytes and glaucophytes, however, retained phycobilisomes as a part of their light-harvesting machineries [5, 15].
Diatoms and cryptophytes (along with related algal groups collectively termed “Chromista”) evolved via secondary endocytobiosis, the uptake of a eukaryotic alga into a eukaryotic host cell [14–16], with the secondary endosymbiont being phylogenetically related to recent red algae . Red algae and algae with secondary plastids of red algal origin are therefore often collectively referred to as the “red lineage” of photosynthetic eukaryotes, opposed to the “green lineage” (chlorophytes and organisms with secondary plastids of chlorophyte origin).
Interestingly, also secondary endocytobiosis led to drastic changes in structure and function of the light-harvesting systems in the red lineage. In cryptophytes, phycobilins are present, however they are not organised in phycobilisomes, while diatoms exclusively use LHC superfamily proteins for light harvesting [5, 15].
Across all recent bacterial and eukaryotic photosynthetic organisms, the extended LHC protein superfamily consists of the LHC, LHC-like and PSBS protein families. The LHC protein family in the red lineage is represented by LHCR proteins present in red algae (“R” for Rhodophyta), chlorophyll (Chl) a/c-binding (CAC) proteins present in algal groups with secondary plastids of red algal origin, also called fucoxanthin CAC proteins (FCPs) or LHCF (“F” for fucoxanthin) in diatoms and brown algae, LI818, called also LHCX in diatoms, and a less known clade, LHCZ, described for some algae with complex plastids [4–6, 8, 9, 18]. In the green lineage, the LHC protein family is represented by Chl a/b-binding (CAB) proteins and LI818, also called LHCSR in green algae [4–6, 9, 19].
The LHC-like protein family is divided into early light-induced proteins (ELIPs), stress-enhanced proteins (SEPs, also called light-harvesting-like (LIL) proteins), one-helix proteins (OHPs, also called high light-induced proteins HLIPs), and high light (HL) intensity-inducible LHC-like 4 (LHL4) proteins [1, 20]. While ELIPs and LHL4 are found exclusively in the green lineage, SEPs and OHPs are shared between red and green algae [8, 11, 20]. Two types of OHPs can be distinguished: the OHP1/HLIP-type present in cyanophages, cyanobacteria and photosynthetic eukaryotes and the OHP2-type restricted to eukaryotic organisms [8, 11]. Members of the PSBS protein family are present only in the green lineage [5, 11].
Proteins from the CAC, LHCR and CAB protein families mainly fulfill a light harvesting function, while members of the LHC-like, LI818/LHCX/LHCSR, PSBS and LHL4 families are mainly involved in photoprotection and photoacclimation. It was proposed that these proteins play a role in the regulation of Chl and tocopherol biosynthesis, participate in the transient binding of released free Chlorophylls, thus preventing the formation of ROS, and act as a sink for excessive excitation energy in a process called non-photochemical quenching (NPQ) [9, 20].
Four novel sequences belonging to the extended LHC protein superfamily were recently reported from the red algae Galdieria sulphuraria and Griffithsia japonica and from the two diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana. Based on sequence similarity (hidden Markov model analysis and BLAST searches) and predicted secondary structure (presence of three predicted transmembrane α-helices) these sequences did not fall into any of the previously described extended LHC protein superfamily groups but formed a new group instead, termed red lineage CAB-like proteins (RedCAPs) . Here, we elucidate the taxonomic distribution, phylogeny, localisation, expression and potential function of these not yet characterised RedCAPs.