preprocesing package¶
Subpackages¶
Submodules¶
preprocesing.config_file module¶
preprocesing.convert_images module¶
-
preprocesing.convert_images.convert_images_to_bw()¶ Concurrently and in parallel convert the anime illustration images to black and white
-
preprocesing.convert_images.convert_single_image(image_path)¶ Opens a anime illustration image and turns it black and white
preprocesing.extract_and_verify_fonts module¶
-
preprocesing.extract_and_verify_fonts.create_character_test_string(dataframe_file, render_text_test_file)¶ Create a string of the unique characters in the japanese text corpus to test whether the fonts being used can render enough of the text
-
preprocesing.extract_and_verify_fonts.extract_fonts()¶ A function to get the font files which are in zip format and extract them
-
preprocesing.extract_and_verify_fonts.get_font_files(fonts_zip_output, fonts_raw_dir, font_file_dir)¶ A function to find the .otf and .ttf font files from the scraped font files
- Parameters
fonts_zip_output – Path for zip files
of font files
- Parameters
fonts_raw_dir – Place where all the
raw font files exist whether zipped or not
- Parameters
font_file_dir (str) – Out directory for font files
-
preprocesing.extract_and_verify_fonts.has_glyph(font, glyph)¶ Check if a font file has the character glyph specified
- Parameters
font (TTFont) – A TTFont object from fontTools
glyph (str) – A character glyph
- Returns
0 or 1 as a yes or no
- Return type
int
-
preprocesing.extract_and_verify_fonts.make_char_list(row)¶ Helper functions to make a set of characters from a row in the dataframe of the text corpus
- Parameters
row – A row in the dataframe
- Returns
A set of characters
- Return type
list
-
preprocesing.extract_and_verify_fonts.move_files(paths)¶ Wrapper to move files used for parallel execution
- Parameters
paths (list) – A set of paths 0 is from 1 is to
-
preprocesing.extract_and_verify_fonts.unzip_file(paths)¶ Unzip a file :param paths: Path to unzip file from and to
-
preprocesing.extract_and_verify_fonts.verify_font_files(dataframe_file, render_text_test_file, font_file_dir, font_dataset_path)¶ A function that tests whether the font files that have been scraped meet the benchmark of rendering at least x% (as specififed in the config) of the unique characters in the text corpus
preprocesing.text_dataset_format_changer module¶
-
preprocesing.text_dataset_format_changer.convert_jesc_to_dataframe()¶ Convert the CSV file of the text to a Dask Dataframe