Fraunhofer IAIS
ResponsibilitiesSep 08, 2024
As a postdoctoral researcher at Fraunhofer, my role blends both engineering and research, allowing me to contribute to a range of exciting projects within the data team. Our work focuses on delivering high-quality data, primarily for training large language models (LLMs) with text modality. This involves identifying potential datasets that meet our requirements and building efficient data pipelines to process and optimize the data
Currently, much of my work revolves around ensuring the datasets we use are of the highest quality, relevant to our models, and meet our performance expectations. A significant part of this involves working with clusters, particularly using Slurm, to maximize computational resources for managing large-scale data effectively.
In addition to my technical responsibilities, I’m involved in various multi-partner projects that include collaborations with universities and research institutes. This includes managing our team’s contributions and ensuring everything runs smoothly when he’s not available.
The following is a non-comprehensive list of the projects I’m currently (09/2024) involved in, along with a brief description of my responsibilities in each:
-
Rhine-Ruhr Center for Scientific Data Literacy (DKZ.2R): I serve as the lead consultant and primary representative for the Fraunhofer IAIS institute, coordinating efforts to advance data literacy in scientific research.
-
OpenGPT-X: My work focuses on data processing, where I ensure that large datasets are optimized and ready for training advanced language models.
-
EuroLingua: In this project, my key responsibility is maintaining the highest standards of data quality to ensure the accuracy and efficiency of our language models.
-
TrustLLM: I support the data governance aspect of this project, helping to ensure that all data used is managed responsibly and aligns with governance standards.
- Data Task-Force Team: As the lead of this team, I work to achieve vertical synchronization across our department, ensuring that our data strategies and practices are aligned and efficient.
- OpenFLaaS: I serve as the lead for Fraunhofer side