preprocesing package¶
Subpackages¶
Submodules¶
preprocesing.config_file module¶
preprocesing.convert_images module¶
-
preprocesing.convert_images.
convert_images_to_bw
()¶ Concurrently and in parallel convert the anime illustration images to black and white
-
preprocesing.convert_images.
convert_single_image
(image_path)¶ Opens a anime illustration image and turns it black and white
preprocesing.extract_and_verify_fonts module¶
-
preprocesing.extract_and_verify_fonts.
create_character_test_string
(dataframe_file, render_text_test_file)¶ Create a string of the unique characters in the japanese text corpus to test whether the fonts being used can render enough of the text
-
preprocesing.extract_and_verify_fonts.
extract_fonts
()¶ A function to get the font files which are in zip format and extract them
-
preprocesing.extract_and_verify_fonts.
get_font_files
(fonts_zip_output, fonts_raw_dir, font_file_dir)¶ A function to find the .otf and .ttf font files from the scraped font files
- Parameters
fonts_zip_output – Path for zip files
of font files
- Parameters
fonts_raw_dir – Place where all the
raw font files exist whether zipped or not
- Parameters
font_file_dir (str) – Out directory for font files
-
preprocesing.extract_and_verify_fonts.
has_glyph
(font, glyph)¶ Check if a font file has the character glyph specified
- Parameters
font (TTFont) – A TTFont object from fontTools
glyph (str) – A character glyph
- Returns
0 or 1 as a yes or no
- Return type
int
-
preprocesing.extract_and_verify_fonts.
make_char_list
(row)¶ Helper functions to make a set of characters from a row in the dataframe of the text corpus
- Parameters
row – A row in the dataframe
- Returns
A set of characters
- Return type
list
-
preprocesing.extract_and_verify_fonts.
move_files
(paths)¶ Wrapper to move files used for parallel execution
- Parameters
paths (list) – A set of paths 0 is from 1 is to
-
preprocesing.extract_and_verify_fonts.
unzip_file
(paths)¶ Unzip a file :param paths: Path to unzip file from and to
-
preprocesing.extract_and_verify_fonts.
verify_font_files
(dataframe_file, render_text_test_file, font_file_dir, font_dataset_path)¶ A function that tests whether the font files that have been scraped meet the benchmark of rendering at least x% (as specififed in the config) of the unique characters in the text corpus
preprocesing.text_dataset_format_changer module¶
-
preprocesing.text_dataset_format_changer.
convert_jesc_to_dataframe
()¶ Convert the CSV file of the text to a Dask Dataframe