Context. Traditionally, weak lensing cosmological surveys have been analyzed using summary statistics that were either motivated by their analytically tractable likelihoods (e.g., power spectrum) or by their ability to access some higher-order information (e.g., peak counts), but at the cost of requiring a simulation-based inference approach. In both cases, even if the statistics can be very informative, they are not designed nor guaranteed to be statistically sufficient (i.e., to capture all the cosmological information content of the data). With the rise of deep learning, however, it has becomes possible to create summary statistics that are specifically optimized to extract the full cosmological information content of the data. Yet, a fairly wide range of loss functions have been used in practice in the weak lensing literature to train such neural networks, leading to the natural question of whether a given loss should be preferred and whether sufficient statistics can be achieved in theory and in practice under these different choices. Aims. We compare different neural summarization strategies that have been proposed in the literature to identify the loss function that leads to theoretically optimal summary statistics for performing full-field cosmological inference. In doing so, we aim to provide guidelines and insights to the community to help guide future neural network-based cosmological inference analyses. Methods. We designed an experimental setup that allows us to isolate the specific impact of the loss function used to train neural summary statistics on weak lensing data at fixed neural architecture and simulation-based inference pipeline. To achieve this, we developed the sbi_lens JAX package, which implements an automatically differentiable lognormal weak lensing simulator and the tools needed to perform explicit full-field inference with a Hamiltonian Monte Carlo (HMC) sampler over this model. Using sbi_lens, we simulated a wCDM LSST Year 10 weak lensing analysis scenario in which the full-field posterior obtained by HMC sampling gives us a ground truth that can be compared to different neural summarization strategies. Results. We provide theoretical insight into the different loss functions being used in the literature, including mean squared error (MSE) regression, and show that some do not necessarily lead to sufficient statistics, while those motivated by information theory, in particular variational mutual information maximization (VMIM), can in principle lead to sufficient statistics. Our numerical experiments confirm these insights, and we show on our simulated wCDM scenario that the figure of merit (FoM) of an analysis using neural summary statistics optimized under VMIM achieves 100% of the reference Ωc−σ8 full-field FoM, while an analysis using summary statistics trained under simple MSE achieves only 81% of the same reference FoM.