In process mining, many tasks use a simplified representation of a single case to perform tasks like trace clustering, anomaly detection, or subset identification. These representations may capture the control flow of the process as well as the context a case is executed in. However, most of these representations are hand-crafted, which is very time-consuming for practical use, and the incorporation of event and case attributes as contextual factors is challenging. In this paper, we propose a neural network architecture for representation learning to automate the generation. Our network is trained in an supervised fashion to learn the most meaningful features to obtain highly dense and accurate vector representations of cases of an event log. We implemented our approach and conducted experiments in the context of trace clustering with publicly available event logs to show its applicability. The results show improvements regarding the separation of cases, and that process models discovered from identified subsets are of high quality.