YC_202408103
|
N° : YC_202408103
|
|
|
Nom et Prénom |
ABERCHIH Anas |
|
CIN |
JM93844 |
|
Stage |
AI & DATA |
|
Durée |
2 Mois |
|
Date d’obtention |
Le 28/09/2024 |
"Bringing generative AI to Darija" Dataset Creation: Development of innovative techniques to create a Darija-English translation dataset and a raw Darija text corpus, categorized by themes such as sports and education. Data Collection and Preprocessing: Utilizing methods like web scraping to gather Darija text from various sources, then transforming this data into usable formats, including Darija-English sentence pairs and text files (.txt). Model Development and Training: Training multiple models for the translation task and comparing their performance. This includes using basic training algorithms and fine-tuning techniques. Model Evaluation: Researching and implementing advanced evaluation methods to compare the performance of translation models, followed by drafting a report detailing the methodologies and findings. |
|


0 commentaires:
Enregistrer un commentaire