Due to its low storage cost and fast query speed, crossmodal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications.The goal of cross modal hashing is to map the data points from the original space into a Hamming space of binary codes where the similarity in the original space is preserved in the Hamming space. By using binary hash codes to represent the original data, the storage cost can be dramatically reduced. Recent years, due to the strong feature extraction capabilities of deep learning, we can use neural networks to extract effective representations of different modalities from scratch, establish semantic associations of different modalities at high levels, and conduct research on cross-modal hash retrieval methods based on deep learning.