H5 图像识别

识别对比

1、百度识别

发现百度的图片搜索识别率不是特别,下面为测试图片跟测试后的结果:

测试图片:图片1

下面为测试后的结果:

图片2

图片3


2、采用 tesseract.js 后结果

对比图


H5 图像识别 (采用Tesseract.js 进行识别)

简单的文案之类的,识别的还算可以,但是稍微复杂点的,准确率就不是那么好了,在学习中。。。

安装
<script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>

或者

npm install tesseract.js –save

PS:如果使用 npm 安装异常,可以使用 cnpm 进行安装使用

使用

demo 1:then使用

var Tesseract = require('tesseract.js')

Tesseract.recognize(myImage).then(function(result){
    console.log(result)
})

demo 2:lang切换

Tesseract.recognize(myImage, {
    lang: 'spa',
    tessedit_char_blacklist: 'e'
}).then(function(result){
    console.log(result)
})

demo 3:(then、progress、catch、then、finally)

Tesseract.recognize(src, {
        lang:"chi_sim",
    })
    .progress(function(message) {
        console.log(message)
    })
    .catch(function(err) {
        console.error(err)
    })
    .then(function(result) {
        console.log(result.text)
    })
    .finally(function(resultOrError) {
        console.log(resultOrError)
    })
参数介绍:

1、image是任何 参数介绍:

image是任何 ImageLike 对象,取决于它是从浏览器还是通过NodeJS运行。

第一个参数,可以是 img 路劲地址,可以是图片base64位的二进制码、也可以是Image对象 等。

附上实现的代码:

<!DOCTYPE html>
<html>

    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no,user-scalable=no,minimum-scale=1,maximum-scale=1">
        <title>图片识别</title>
        <style>body{margin:0 auto;width:500px;font-size:12px;font-family:"arial, helvetica, sans-serif"}fieldset{margin-bottom:10%;border:1px solid #ddd;border-radius:5px}img,select,button{cursor:pointer}img{background:#ddd}h2{font-weight:500;font-size:16px}fieldset legend{margin-left:33%}</style>
    </head>

    <body>

        <fieldset>
            <legend>
                <h2> 图片识别前 </h2>
            </legend>
            选择文件:<img src="img/1.png" title="图片" />
            <p>
                选择语言:
                <select id="langsel" onchange="recognizeFile()">
                    <option value='afr'> 南非荷兰语(Afrikaans) </option>
                    <option value='ara'> 阿拉伯 (Arabic)</option>
                    <option value='aze'> 阿塞拜疆 (Azerbaijani) </option>
                    <option value='bel'> 白俄罗斯(美式:Belarusian) </option>
                    <option value='ben'> 孟加拉(Bengali) </option>
                    <option value='bul'> 保加利亚语(Bulgarian) </option>
                    <option value='cat'> 西班牙 (Catalan)</option>
                    <option value='ces'> 捷克(Czech) </option>
                    <option value='chi_sim' selected> 中文(Chinese) </option>
                    <option value='chi_tra'> 繁体中文(Traditional Chinese) </option>
                    <option value='chr'> Cherokee </option>
                    <option value='dan'> 丹麦语(Danish) </option>
                    <option value='deu'> 德语(German) </option>
                    <option value='ell'> 希腊语(Greek) </option>
                    <option value='eng'> 英语(English) </option>
                    <option value='enm'> 英文(老)(English (Old)) </option>
                    <option value='meme'> Internet Meme </option>
                    <option value='epo'> Esperanto </option>
                    <option value='epo_alt'> (Esperanto alternative)</option>
                    <option value='equ'> Math </option>
                    <option value='est'> Estonian </option>
                    <option value='eus'> 爱沙尼亚语(Basque) </option>
                    <option value='fin'> (Finnish) </option>
                    <option value='fra'> 芬兰(French) </option>
                    <option value='frk'> Frankish </option>
                    <option value='frm'> 法语(老)(French (Old)) </option>
                    <option value='glg'> 加利西亚(Galician) </option>
                    <option value='grc'> Ancient Greek </option>
                    <option value='heb'> Hebrew </option>
                    <option value='hin'> Hindi </option>
                    <option value='hrv'> Croatian </option>
                    <option value='hun'> Hungarian </option>
                    <option value='ind'> Indonesian </option>
                    <option value='isl'> Icelandic </option>
                    <option value='ita'> 意大利(Italian) </option>
                    <option value='ita_old'> 意大利语(老)(Italian (Old)) </option>
                    <option value='jpn'> 日本(Japanese) </option>
                    <option value='kan'> Kannada </option>
                    <option value='kor'> 朝鲜的(Korean) </option>
                    <option value='lav'> Latvian </option>
                    <option value='lit'> Lithuanian </option>
                    <option value='mal'> Malayalam </option>
                    <option value='mkd'> Macedonian </option>
                    <option value='mlt'> Maltese </option>
                    <option value='msa'> Malay </option>
                    <option value='nld'> Dutch </option>
                    <option value='nor'> Norwegian </option>
                    <option value='pol'> Polish </option>
                    <option value='por'> 葡萄牙语(Portuguese) </option>
                    <option value='ron'> Romanian </option>
                    <option value='rus'> 俄语(Russian) </option>
                    <option value='slk'> Slovakian </option>
                    <option value='slv'> Slovenian </option>
                    <option value='spa'> 西班牙语(Spanish) </option>
                    <option value='spa_old'>老西班牙语 (Old Spanish) </option>
                    <option value='sqi'> Albanian </option>
                    <option value='srp'> 塞尔维亚语(拉丁语)(Serbian (Latin)) </option>
                    <option value='swa'> Swahili </option>
                    <option value='swe'> Swedish </option>
                    <option value='tam'> Tamil </option>
                    <option value='tel'> Telugu </option>
                    <option value='tgl'> Tagalog </option>
                    <option value='tha'> 泰国(Thai) </option>
                    <option value='tur'> 土耳其(Turkish) </option>
                    <option value='ukr'> 乌克兰(乌克兰) </option>
                    <option value='vie'> 越南(Vietnamese) </option>
                </select>
            </p>
            <p align="center">
                <button onclick="btn()">执行</button>
            </p>
        </fieldset>
        <fieldset>
            <legend>
                <h2> 输出结果 </h2>
            </legend>
            <div id="result"></div>
        </fieldset>
    </body>
    <script src='img/tesseract.js'></script>
    <script>
        var src = document.querySelector("img").src,
            selectOption = "",
            result = document.querySelector("#result");

        function recognizeFile() {
            var select = document.querySelector("#langsel")
            selectOption = select.options[select.selectedIndex].value;
        }

        function btn() {

            Tesseract.recognize(src, {
                    lang: selectOption ? selectOption : "chi_sim",
                }).progress(function(message) {
                    console.log(message)
                })
                .catch(function(err) {
                    result.innerHTML = err;
                    console.error(err)
                })
                .then(function(result) {
                    console.log(result.text)
                    result.value = result.text;
                })
                .finally(function(resultOrError) {
                    result.innerHTML = resultOrError.value;
                    console.log(resultOrError)
                })
        }
    </script>

</html>


2、语言支持介绍:

langLanguage
‘afr’Afrikaans
‘ara’Arabic
‘aze’Azerbaijani
‘bel’Belarusian
‘ben’Bengali
‘bul’Bulgarian
‘cat’Catalan
‘ces’Czech
‘chi_sim’Chinese
‘chi_tra’Traditional Chinese
‘chr’Cherokee
‘dan’Danish
‘deu’German
‘ell’Greek
‘eng’English
‘enm’English (Old)
‘epo’Esperanto
‘epo_alt’Esperanto alternative
‘equ’Math
‘est’Estonian
‘eus’Basque
‘fin’Finnish
‘fra’French
‘frk’Frankish
‘frm’French (Old)
‘glg’Galician
‘grc’Ancient Greek
‘heb’Hebrew
‘hin’Hindi
‘hrv’Croatian
‘hun’Hungarian
‘ind’Indonesian
‘isl’Icelandic
‘ita’Italian
‘ita_old’Italian (Old)
‘jpn’Japanese
‘kan’Kannada
‘kor’Korean
‘lav’Latvian
‘lit’Lithuanian
‘mal’Malayalam
‘mkd’Macedonian
‘mlt’Maltese
‘msa’Malay
‘nld’Dutch
‘nor’Norwegian
‘pol’Polish
‘por’Portuguese
‘ron’Romanian
‘rus’Russian
‘slk’Slovakian
‘slv’Slovenian
‘spa’Spanish
‘spa_old’Old Spanish
‘sqi’Albanian
‘srp’Serbian (Latin)
‘swa’Swahili
‘swe’Swedish
‘tam’Tamil
‘tel’Telugu
‘tgl’Tagalog
‘tha’Thai
‘tur’Turkish
‘ukr’Ukrainian
‘vie’Vietnamese


Tesseract参数支持介绍:

ParameterDefault ValueDescription
ambigs_debug_level0Debug level for unichar ambiguities
applybox_debug1Debug level
applybox_exposure_pattern.expExposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif
applybox_learn_chars_and_char_frags_mode0Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters.
applybox_learn_ngrams_mode0Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally.
applybox_page0Page number to apply boxes from
assume_fixed_pitch_char_segment0include fixed-pitch heuristics in char segmentation
bestrate_pruning_factor2Multiplying factor of current best rate to prune other hypotheses
bidi_debug0Debug level for BiDi
bland_unrej0unrej potential with no chekcs
certainty_scale20Certainty scaling factor
certainty_scale20Certainty scaling factor
chop_center_knob0.15Split center adjustment
chop_centered_maxwidth90Width of (smaller) chopped blobs above which we don’t care that a chop is not near the center.
chop_debug0Chop debug
chop_enable1Chop enable
chop_good_split50Good split limit
chop_inside_angle-50Min Inside Angle Bend
chop_min_outline_area2000Min Outline Area
chop_min_outline_points6Min Number of Points on Outline
chop_new_seam_pile1Use new seam_pile
chop_ok_split100OK split limit
chop_overlap_knob0.9Split overlap adjustment
chop_same_distance2Same distance
chop_seam_pile_size150Max number of seams in seam_pile
chop_sharpness_knob0.06Split sharpness adjustment
chop_split_dist_knob0.5Split length adjustment
chop_split_length10000Split Length
chop_vertical_creep0Vertical creep
chop_width_change_knob5Width change adjustment
chop_x_y_weight3X / Y length weight
chs_leading_punct(‘`”Leading punctuation
chs_trailing_punct1).,;:?!1st Trailing punctuation
chs_trailing_punct2)’`”2nd Trailing punctuation
classify_adapt_feature_threshold230Threshold for good features during adaptive 0-255
classify_adapt_proto_threshold230Threshold for good protos during adaptive 0-255
classify_adapted_pruning_factor2.5Prune poor adapted results this much worse than best result
classify_adapted_pruning_threshold-1Threshold at which classify_adapted_pruning_factor starts
classify_bln_numeric_mode0Assume the input is numbers [0-9].
classify_char_norm_range0.2Character Normalization Range …
classify_character_fragments_garbage_certainty_threshold-3Exclude fragments that do not look like whole characters from training and adaption
classify_class_pruner_multiplier15Class Pruner Multiplier 0-255:
classify_class_pruner_threshold229Class Pruner Threshold 0-255
classify_cp_angle_pad_loose45Class Pruner Angle Pad Loose
classify_cp_angle_pad_medium20Class Pruner Angle Pad Medium
classify_cp_angle_pad_tight10CLass Pruner Angle Pad Tight
classify_cp_cutoff_strength7Class Pruner CutoffStrength:
classify_cp_end_pad_loose0.5Class Pruner End Pad Loose
classify_cp_end_pad_medium0.5Class Pruner End Pad Medium
classify_cp_end_pad_tight0.5Class Pruner End Pad Tight
classify_cp_side_pad_loose2.5Class Pruner Side Pad Loose
classify_cp_side_pad_medium1.2Class Pruner Side Pad Medium
classify_cp_side_pad_tight0.6Class Pruner Side Pad Tight
classify_debug_character_fragments0Bring up graphical debugging windows for fragments training
classify_debug_level0Classify debug level
classify_enable_adaptive_debugger0Enable match debugger
classify_enable_adaptive_matcher1Enable adaptive classifier
classify_enable_learning1Enable adaptive classifier
classify_font_nameUnknownFontDefault font name to be used in training
classify_integer_matcher_multiplier10Integer Matcher Multiplier 0-255:
classify_learn_debug_strClass str to debug learning
classify_learning_debug_level0Learning Debug Level:
classify_max_certainty_margin5.5Veto difference between classifier certainties
classify_max_norm_scale_x0.325Max char x-norm scale …
classify_max_norm_scale_y0.325Max char y-norm scale …
classify_max_rating_ratio1.5Veto ratio between classifier ratings
classify_max_slope2.41421Slope above which lines are called vertical
classify_min_norm_scale_x0Min char x-norm scale …
classify_min_norm_scale_y0Min char y-norm scale …
classify_min_slope0.414214Slope below which lines are called horizontal
classify_misfit_junk_penalty0Penalty to apply when a non-alnum is vertically out of its expected textline position
classify_nonlinear_norm0Non-linear stroke-density normalization
classify_norm_adj_curl2Norm adjust curl …
classify_norm_adj_midpoint32Norm adjust midpoint …
classify_norm_method1Normalization Method …
classify_num_cp_levels3Number of Class Pruner Levels
classify_pico_feature_length0.05Pico Feature Length
classify_pp_angle_pad45Proto Pruner Angle Pad
classify_pp_end_pad0.5Proto Prune End Pad
classify_pp_side_pad2.5Proto Pruner Side Pad
classify_save_adapted_templates0Save adapted templates to a file
classify_training_fileMicroFeaturesTraining file
classify_use_pre_adapted_templates0Use pre-adapted classifier templates
conflict_set_I_l_1Il1[]Il1 conflict set
crunch_accept_ok1Use acceptability in okstring
crunch_debug0As it says
crunch_del_cert-10POTENTIAL crunch cert lt this
crunch_del_high_word1.5Del if word gt xht x this above bl
crunch_del_low_word0.5Del if word gt xht x this below bl
crunch_del_max_ht3Del if word ht gt xht x this
crunch_del_min_ht0.7Del if word ht lt xht x this
crunch_del_min_width3Del if word width lt xht x this
crunch_del_rating60POTENTIAL crunch rating lt this
crunch_early_convert_bad_unlv_chs0Take out ~^ early?
crunch_early_merge_tess_fails1Before word crunch?
crunch_include_numerals0Fiddle alpha figures
crunch_leave_accept_strings0Dont pot crunch sensible strings
crunch_leave_lc_strings4Dont crunch words with long lower case strings
crunch_leave_ok_strings1Dont touch sensible strings
crunch_leave_uc_strings4Dont crunch words with long lower case strings
crunch_long_repetitions3Crunch words with long repetitions
crunch_poor_garbage_cert-9crunch garbage cert lt this
crunch_poor_garbage_rate60crunch garbage rating lt this
crunch_pot_garbage1POTENTIAL crunch garbage
crunch_pot_indicators1How many potential indicators needed
crunch_pot_poor_cert-8POTENTIAL crunch cert lt this
crunch_pot_poor_rate40POTENTIAL crunch rating lt this
crunch_rating_max10For adj length in rating per ch
crunch_small_outlines_size0.6Small if lt xht x this
crunch_terrible_garbage1As it says
crunch_terrible_rating80crunch rating lt this
cube_debug_level0Print cube debug info.
dawg_debug_level0Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages
debug_acceptable_wds0Dump word pass/fail chk
debug_fileFile to send tprintf output to
debug_fix_space_level0Contextual fixspace debug
debug_noise_removal0Debug reassignment of small outlines
debug_x_ht_level0Reestimate debug
devanagari_split_debugimage0Whether to create a debug image for split shiro-rekha process.
devanagari_split_debuglevel0Debug level for split shiro-rekha process.
disable_character_fragments1Do not include character fragments in the results of the classifier
doc_dict_certainty_threshold-2.25Worst certainty for words that can be inserted into thedocument dictionary
doc_dict_pending_threshold0Worst certainty for using pending dictionary
docqual_excuse_outline_errs0Allow outline errs in unrejection?
edges_boxarea0.875Min area fraction of grandchild for box
edges_childarea0.5Min area fraction of child outline
edges_children_count_limit45Max holes allowed in blob
edges_children_fix0Remove boxy parents of char-like children
edges_children_per_grandchild10Importance ratio for chucking outlines
edges_debug0turn on debugging for this module
edges_max_children_layers5Max layers of nested children inside a character outline
edges_max_children_per_outline10Max number of children inside a character outline
edges_min_nonhole12Min pixels for potential char in box
edges_patharea_ratio40Max lensq/area for acceptable child outline
edges_use_new_outline_complexity0Use the new outline complexity module
editor_dbwin_height24Editor debug window height
editor_dbwin_nameEditorDBWinEditor debug window name
editor_dbwin_width80Editor debug window width
editor_dbwin_xpos50Editor debug window X Pos
editor_dbwin_ypos500Editor debug window Y Pos
editor_debug_config_fileConfig file to apply to single words
editor_image_blob_bb_color4Blob bounding box colour
editor_image_menuheight50Add to image height for menu bar
editor_image_text_color2Correct text colour
editor_image_win_nameEditorImageEditor image window name
editor_image_word_bb_color7Word bounding box colour
editor_image_xpos590Editor image X Pos
editor_image_ypos10Editor image Y Pos
editor_word_height240Word window height
editor_word_nameBlnWordsBL normalized word window
editor_word_width655Word window width
editor_word_xpos60Word window X Pos
editor_word_ypos510Word window Y Pos
enable_new_segsearch0Enable new segmentation search path.
enable_noise_removal1Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise
equationdetect_save_bi_image0Save input bi image
equationdetect_save_merged_image0Save the merged image
equationdetect_save_seed_image0Save the seed image
equationdetect_save_spt_image0Save special character image
file_type.tifFilename extension
fixsp_done_mode1What constitues done for spacing
fixsp_non_noise_limit1How many non-noise blbs either side?
fixsp_small_outlines_size0.28Small if lt xht x this
force_word_assoc0force associator to run regardless of what enable_assoc is.This is used for CJK where component grouping is necessary.
fragments_debug0Debug character fragments
fragments_guide_chopper0Use information from fragments to guide chopping process
fx_debugfileFXDebugName of debugfile
gapmap_big_gaps1.75xht multiplier
gapmap_debug0Say which blocks have tables
gapmap_no_isolated_quanta0Ensure gaps not less than 2quanta wide
gapmap_use_ends0Use large space at start and end of rows
heuristic_max_char_wh_ratio2max char width-to-height ratio allowed in segmentation
heuristic_segcost_rating_base1.25base factor for adding segmentation cost into word rating.It’s a multiplying factor, the larger the value above 1, the bigger the effect of segmentation cost.
heuristic_weight_rating1weight associated with char rating in combined cost ofstate
heuristic_weight_seamcut0weight associated with seam cut in combined cost of state
heuristic_weight_width1000weight associated with width evidence in combined cost of state
hocr_font_info0Add font info to hocr output
hyphen_debug_level0Debug level for hyphenated words.
il1_adaption_test0Dont adapt to i/I at beginning of word
include_page_breaks0Include page separator string in output text after each image/page.
interactive_display_mode0Run interactively?
language_model_debug_level0Language model debug level
language_model_fixed_length_choices_depth3Depth of blob choice lists to explore when fixed length dawgs are on
language_model_min_compound_length3Minimum length of compound words
language_model_ngram_nonmatch_score-40Average classifier score of a non-matching unichar.
language_model_ngram_on0Turn on/off the use of character ngram model
language_model_ngram_order8Maximum order of the character ngram model
language_model_ngram_rating_factor16Factor to bring log-probs into the same range as ratings when multiplied by outline length
language_model_ngram_scale_factor0.03Strength of the character ngram model relative to the character classifier
language_model_ngram_small_prob1e-06To avoid overly small denominators use this as the floor of the probability returned by the ngram model.
language_model_ngram_space_delimited_language1Words are delimited by space
language_model_ngram_use_only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities.
language_model_penalty_case0.1Penalty for inconsistent case
language_model_penalty_chartype0.3Penalty for inconsistent character type
language_model_penalty_font0Penalty for inconsistent font
language_model_penalty_increment0.01Penalty increment
language_model_penalty_non_dict_word0.15Penalty for non-dictionary words
language_model_penalty_non_freq_dict_word0.1Penalty for words not in the frequent word dictionary
language_model_penalty_punc0.2Penalty for inconsistent punctuation
language_model_penalty_script0.5Penalty for inconsistent script
language_model_penalty_spacing0.05Penalty for inconsistent spacing
language_model_use_sigmoidal_certainty0Use sigmoidal score for certainty
language_model_viterbi_list_max_num_prunable10Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs
language_model_viterbi_list_max_size500Maximum size of viterbi lists recorded in BLOB_CHOICEs
load_bigram_dawg1Load dawg with special word bigrams.
load_fixed_length_dawgs1Load fixed length dawgs (e.g. for non-space delimited languages)
load_freq_dawg1Load frequent word dawg.
load_number_dawg1Load dawg with number patterns.
load_punc_dawg1Load dawg with punctuation patterns.
load_system_dawg1Load system word dawg.
load_unambig_dawg1Load unambiguous word dawg.
m_data_sub_dirtessdata/Directory for data files
matcher_avg_noise_size12Avg. noise blob length
matcher_bad_match_pad0.15Bad Match Pad (0-1)
matcher_clustering_max_angle_delta0.015Maximum angle delta for prototype clustering
matcher_debug_flags0Matcher Debug Flags
matcher_debug_level0Matcher Debug Level
matcher_debug_separate_windows0Use two different windows for debugging the matching: One for the protos and one for the features.
matcher_good_threshold0.125Good Match (0-1)
matcher_great_threshold0Great Match (0-1)
matcher_min_examples_for_prototyping3Reliable Config Threshold
matcher_perfect_threshold0.02Perfect Match (0-1)
matcher_permanent_classes_min1Min # of permanent classes
matcher_rating_margin0.1New template margin (0-1)
matcher_sufficient_examples_for_prototyping5Enable adaption even if the ambiguities have not been seen
max_permuter_attempts10000Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options.
max_viterbi_list_size10Maximum size of viterbi list.
merge_fragments_in_matrix1Merge the fragments in the ratings matrix and delete them after merging
min_orientation_margin7Min acceptable orientation margin
min_sane_x_ht_pixels8Reject any x-ht lt or eq than this
ngram_permuter_activated0Activate character-level n-gram-based permuter
noise_cert_basechar-8Hingepoint for base char certainty
noise_cert_disjoint-1Hingepoint for disjoint certainty
noise_cert_factor0.375Scaling on certainty diff from Hingepoint
noise_cert_punc-3Threshold for new punc char certainty
noise_maxperblob8Max diacritics to apply to a blob
noise_maxperword16Max diacritics to apply to a word
numeric_punctuation.,Punct. chs expected WITHIN numbers
ocr_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing ocr.
ok_repeated_ch_non_alphanum_wds-?*=Allow NN to unrej
oldbl_corrfix1Improve correlation of heights
oldbl_dot_error_size1.26Max aspect ratio of a dot
oldbl_holed_losscount10Max lost before fallback line used
oldbl_xhfix0Fix bug in modes threshold for xheights
oldbl_xhfract0.4Fraction of est allowed in calc
outlines_2ij!?%”:;Non standard number of outlines
outlines_odd%
output_ambig_words_fileOutput file for ambiguities found in the dictionary
page_separator
pageseg_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation.
paragraph_debug_level0Print paragraph debug info.
paragraph_text_based1Run paragraph detection on the post-text-recognition (more accurate)
permute_chartype_word0Turn on character type (property) consistency permuter
permute_debug0Debug char permutation process
permute_fixed_length_dawg0Turn on fixed-length phrasebook search permuter
permute_only_top0Run only the top choice permuter
permute_script_word0Turn on word script consistency permuter
pitsync_fake_depth1Max advance fake generation
pitsync_joined_edge0.75Dist inside big blob for chopping
pitsync_linear_version6Use new fast algorithm
pitsync_offset_freecut_fraction0.25Fraction of cut for free cuts
poly_allow_detailed_fx0Allow feature extractors to see the original outline
poly_debug0Debug old poly
poly_wide_objects_better1More accurate approx on wide things
preserve_interword_spaces0Preserve multiple interword spaces
prioritize_division0Prioritize blob division over chopping
quality_blob_pc0good_quality_doc gte good blobs limit
quality_char_pc0.95good_quality_doc gte good char limit
quality_min_initial_alphas_reqd2alphas in a good word
quality_outline_pc1good_quality_doc lte outline error limit
quality_rej_pc0.08good_quality_doc lte rejection limit
quality_rowrej_pc1.1good_quality_doc gte good char limit
rating_scale1.5Rating scaling factor
rej_1Il_trust_permuter_type1Dont double check
rej_1Il_use_dict_word0Use dictword test
rej_alphas_in_number_perm0Extend permuter check
rej_trust_doc_dawg0Use DOC dawg in 11l conf. detector
rej_use_good_perm1Individual rejection control
rej_use_sensible_wd0Extend permuter check
rej_use_tess_accepted1Individual rejection control
rej_use_tess_blanks1Individual rejection control
rej_whole_of_mostly_reject_word_fract0.85if >this fract
repair_unchopped_blobs1Fix blobs that aren’t chopped
save_alt_choices1Save alternative paths found during chopping and segmentation search
save_doc_words0Save Document Words
save_raw_choices1Deprecated- backward compatablity only
segment_adjust_debug0Segmentation adjustment debug
segment_debug0Debug the whole segmentation process
segment_nonalphabetic_script0Don’t use any alphabetic-specific tricks.Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch
segment_penalty_dict_case_bad1.3125Default score multiplier for word matches, which may have case issues (lower is better).
segment_penalty_dict_case_ok1.1Score multiplier for word matches that have good case (lower is better).
segment_penalty_dict_frequent_word1Score multiplier for word matches which have good case andare frequent in the given language (lower is better).
segment_penalty_dict_nonword1.25Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better).

GitHub地址

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Rkatsiteli

你的鼓励将是我创作的最大动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值