Skip to contents

List distances to nearest neighbors of a given kanji in terms of a reference distance (which is currently only the stroke edit distance) and compare with values in terms of another distance (currently only the component transport distance, a.k.a. kanji distance).

Usage

compare_neighborhoods(
  kan,
  refdist = "strokedit",
  refnn = 10,
  compdist = "kanjidist",
  compnn = 0,
  ...
)

Arguments

kan

a kanji (currently only as a single UTF-8 character).

refdist

the name of the reference distance (currently only "strokedit").

refnn

the number of nearest neighbors in terms of the reference distance.

compdist

a character vector. The name(s) of one or several other distances to compare with (currently only "kanjidist").

compnn

the number of nearest neighbors in terms of the other distance(s). If this is positive it is assumed that the suggested package kanjistat.data is available.

...

further parameters that are passed to kanjidist().

Value

A matrix of distances with refnn + compnn columns named by the nearest neighbors of kan (first in terms of the reference distance, then the other distances) and 1 + length(compdist) rows named by the type of distance.

Warning

[Experimental]
This is only a first draft of the function and its interface and details may change considerably in the future. As there is currently no precomputed kanjidist matrix, there is a huge difference in computation time between setting compnn = 0 (only kanji distances to the refnn nearest neighbors in terms of refdist have to be computed) and setting compnn to any value $> 0$ (kanji distances to all 2135 other Jouyou kanji have to be computed in order to determine the compnn nearest neighbors; depending on the system and parameter settings this can take (roughly) anywhere between 2 minutes and an hour).

Examples

# compare_neighborhoods("晴", refnn=5, compo_seg_depth=4, approx="pcweighted",
#                       compnn=0, minor_warnings=FALSE)