A keyboard layout optimizer for layouts of the

Dario Götz

Last update: Jan 4, 2023

Related tags

Overview

Keyboard Layout Optimizer

Neo variant layout optimizer written in rust. The optimizer is based on the "evolve-keyboard-layout" scripts by ArneBab. It supports layouts of the "Neo"-family, i.e. permutations of the base layer, where layers 2, 5, and 6 follow the permutation and layers 3 and 4 remain unchanged.

At the heart of the optimization lies a layout evaluation that involves multiple criteria on the frequencies of unigrams, bigrams, and trigrams.

The optimization is implemented using the genevo crate.

Features

evaluation of keyboard layouts of the "Neo" family
evaluation based on prepared unigrams, bigrams, and trigrams or a text
fast evaluation (~100ms per layout for standard corpus)
layout optimization using a genetic algorithm
accounting for higher layer characters (e.g. uppercase letters) by expanding ngrams with modifier keys

Metrics

badly positioned shortcut keys - How many shorcut keys are not easily reachable with the left hand?
asymmetric keys - Which keys are similar (in some sense), but lie in non-consistent locations (e.g. "aou" - "äüö")?
key costs - How do the letter frequencies relate to the "cost" associated to the keys?
hand disbalance - Are left and right hands similarly loaded?
finger balance - Is each finger suitably loaded? Pinkies less than pointers?
finger repeats - How often are fingers in action consecutively?
finger repeats top and bottom - How often does the same finger need to move from top to bottom row (or vice versa) consecutively?
movement pattern - How often are (near-)neighboring fingers used one after the other?
no handswitch after unbalancing key - How often does no handswitch occur after a hand needed to move away from the home row?
unbalancing after neighboring - How often do unbalancing keys occur consecutively?
line changes - How far (vertically) are consecutive keystrokes of the same hand apart?
asymmetric bigrams - How often are consecutive keystrokes of different hands not symmetrical?
manual bigram penalty - How often do some key-combinations occur that are hard to type but do not fall into the other metrics cases?
no handswitch in trigram - How often does no handswitch happen within a trigram (and have a direction change in between)?
irregularity - How often are the first and the second bigram in a trigram "bad" (wrt. to all bigram metrics)?

Installation

Clone the repository

git clone https://github.com/dariogoetz/keyboard_layout_optimizer.git --recurse-submodules

Build the binaries (add CC=gcc in the beginning if cc is not installed, but gcc is)
```
cargo build --release
```
The binaries are then located under target/release.
Generate documentation with
```
cargo doc
```

Usage

Specifying Layouts

Some binaries expect layouts as commandline arguments. These layouts are represented as strings specifying the keys of the layout from left to right, top to bottom, i.e. it starts on the top left of the keyboard and lists each letter of the base layer going to the right in the same row. After that the letters of the next row follow, again from left to right.

Whitespace is allowed and will be ignored.

Only those keys shall be specified that are not marked as "fixed" in the layout configuration file "standard_keyboard.yml" (usually 32 keys).

Layout Plot Binary

The plot binary expects a layout representation as commandline argument.

Example (Bone layout):

RUST_LOG=INFO ./target/release/plot "jduax phlmwqß ctieo bnrsg fvüäö yz,.k"

As an optional parameter --layout-config, a different layout configuration file can be specified.

Layout Evaluation Binary

The evaluate binary expects a layout representation as commandline argument.

Example (Bone layout):

RUST_LOG=INFO ./target/release/evaluate "jduax phlmwqß ctieo bnrsg fvüäö yz,.k"

There are various optional parameters that can be explored using the -h option, e.g. provide a text or file to be used as corpus.

Configuration

Many aspects of the evaluation can be configured in the yaml files standard_keyboard.yml and evaluation_parameters.yml.

standard_keyboard.yml This file contains "physical" properties of the keyboard and information about the Neo layout that serves as an underlying base for the variants to evaluate. It covers for the keyboard:

key positions
key to hand mapping
key to finger mapping
key costs (used for evaluation)
keys that are "unbalancing" the hand's position when hit
symmetries
plot templates

And for the Neo base layout:

the symbols that can be generated in each layer over each key
keys that can not be permutated
modifiers to be used to access each layer
cost associated to accessing each layer

evaluation_parameters.yml This file contains configuration parameters for all available evaluation metrics, filenames of prepared ngram data to use, and parameters specifying the behavior of post-processing the ngram data for a given layout.

Layout Optimization Binary

The optimize binary can run without any commandline parameter. In that case, it starts with a collection of random layouts and optimizes from there. With commandline options, a "starting layout" can be specified or a list of keys that shall not be permutated (if no starting layout is given, fixed keys relate to the Neo2 layout). Optional commandline parameters can be explored with the -h option.

Example (starting from Bone layout, fixing "," and "."):

RUST_LOG=INFO ./target/release/optimize -s "jduax phlmwqß ctieo bnrsg fvüäö yz,.k" -f ",."

Example for a never ending search for good layouts (appends solutions to a file found_solutions.txt and publishes them to https://keyboard-layout-optimizer.herokuapp.com):

" ">

RUST_LOG=INFO ./target/release/optimize -f ",." --run-forever --append-solution-to "found_solutions.txt" --publish-as "
   
    "

Configuration

The parameters of the optimization process can be configured in the file optimization_parameters.yml. This includes sizes of the population, number of generations to evaluate, mutation and insertion rates, and the selection ratio.

Structure

The project includes several binaries within the evolve_keyboard_layout crate:

plot - Plots the six layers of a specified layout
evaluate - Evaluates a specified layout and prints a summary of the various metrics to stdout
optimize - Starts an optimization heuristic to find a good layout
evaluate-random - Evaluates a series of randomly generated layouts (mostly used for benchmarking)

The binaries rely on three library crates providing relevant data structures and algorithms:

keyboard_layout - Provides a representation of keys, keyboards, and layouts and a layout generator that generates layout objects from given strings.
layout_evaluation - Provides functionalities for reading, generating, and processing ngram data and datastructures and traits for evaluating several metrics.
layout_optimization - Provides a connection to the genevo optimization algorithms by implementing a specialized genetic algorithm based on the evaluator in layout_evaluation.

Comments

crashes

Hi

Don't know Rust.

./target/release/evaluate -l config/keyboard/standard_qwerty_us.yml "[]qwertyuiop-=asdfghjkl;'zxcvbnm,./" thread 'main' panicked at 'called Option::unwrap() on a None value', layout_evaluation/src/metrics/layout_metrics/similar_letters.rs:56:68 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

I have edited standard_qwerty_us.yml and set some true values to false ... I've swapped the [ ] keys to top row and brought - = down in their place.

opened by iandoug 43
Add Simulated Annealing to website via WASM
A great asset could be a website that employs some optimization-algorithm.

This pull request will focus on Simulated Annealing (SA) because (using https://github.com/argmin-rs/argmin,) it seems to allow compilation to WASM.

To-Do's

[x] Add Rust function that can be called from JS

[x] Implement SA into JS

[x] Find way to make SA and the Genetic Algorithm coexist. Currently, the Code seems to be heavily geared towards the Genetic Algorithm, which makes it slightly difficult to implement SA

[ ] ~Share one cache between all optimization-implementations (so far it'd be SA and a Genetic Algorithm) to speed up optimizing when repeating runs with only little modifications or switching between algorithms.~

[x] Automatically evaluate&display the new, optimized layout's stats

[x] UI-improvements and bug-fixes

[x] Give feedback when saving optimization parameters

[x] Properly highlight the config-textboxes' parameters

[x] Disable Save-buttons as long as nothing in the config-file has changed

[x] Autofocus textfields
opened by Glitchy-Tozier 22
No option to change the letters that should be used for the layout in the webapp

I stumbled over this app while researching how to generate a custom keyboard layout for the way I write. This seems very promising as I can provide my own corpus and I really like the ease of using the webapp for this purpose.

However it seems to me there is no option to generate a layout that doesn't contain Umlauts and ß - which are keys I don't really want to include in my layout.

Is there an option I'm missing to adjust which keys the app will use for the optimization? Or is it possible to add such an option? I think this would benefit people like me who write in different languages from German a lot.
question

opened by Gotos 18

Speed up `mapped ngrams()`

Closes #29

Turns out the ideas proposed in #29 don't make that big of a difference. A short optimization run went from ~24.5s to ~23.0s, which is roughly a 6% speed increase. This is the command I used for testing: (with 201 iterations)

time RUST_LOG=INFO ./target/release/optimize_sa --no-cache-results -s jduaxphlmwqßctieobnrsgfvüäöyz,.k

Additionally to the ideas proposed in #29, I also manually implemented PartialEq for LayerKey, as deriving it means that all fields get compared. This used to result in redundant comparisons (assuming that the LayerKey's index unique to a particular Layerkey).

As a safety-check, I compared the result of an evaluation before and after this PR:

Old: Cost: 259.1387 (optimization score: 385893)
New: Cost: 259.1387 (optimization score: 385893)

Before (no changes)

cargo build --release && RUST_LOG=INFO ./target/release/evaluate jduaxphlmwqßctieobnrsgfvüäöyz,.k
    Finished release [optimized] target(s) in 0.12s
[2022-04-02T10:44:42Z INFO  evolve_keyboard_layout::common] Reading unigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/1-grams.txt"'
[2022-04-02T10:44:42Z INFO  evolve_keyboard_layout::common] Reading bigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/2-grams.txt"'
[2022-04-02T10:44:42Z INFO  evolve_keyboard_layout::common] Reading trigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/3-grams.txt"'
Layout (layer 1):
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬──────┐
│ ^ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 0 │ - │ ` │ ←    │
├───┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬────┤
│   ⇥ │ j │ d │ u │ a │ x │ p │ h │ l │ m │ w │ q │ ß │ Ret│
├─────┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┐   │
│    ⇩ │ c │ t │ i │ e │ o │ b │ n │ r │ s │ g │ ⇘ │ ´ │ ⏎ │
├────┬─┴─┬─┴─┬─┴─┬─┴─┯─┴─┬─┴─┬─┴─┯─┴─┬─┴─┬─┴─┬─┴─┬─┴───┴───┤
│  ⇧ │ ⇚ │ f │ v │ ü │ ä │ ö │ y │ z │ , │ . │ k │    ⇗    │
├────┼───┴┬──┴─┬─┴───┴───┴───┴───┴───┴─┬─┴──┬┴───┼────┬────┤
│  ♕ │    │ ♔  │           ␣           │  ⇙ │    │    │  ♛ │
└────┴────┴────┴───────────────────────┴────┴────┴────┴────┘

Layout string (layer 1):
jduaxphlmwqßctieobnrsgfvüäöyz,.k

Layout compact (layer 1):
jduax phlmwqß
ctieo bnrsg⇘
fvüäö yz,.k

Layout metrics:
     1.0000 (weighted:    0.3500) Badly positioned shortcut keys      | Bad shortcuts: x
     2.5000 (weighted:    7.5000) Similar Letters                     | Poorly placed pairs: mn
     0.0000 (weighted:    0.0000) Similar Letter-Groups               | 

Unigram metrics:
  Not found: 0.0218% of 116607824.9096
     0.2534 (weighted:   17.4860) Finger Balance                      | Finger loads % (no thumb): 9.6 11.4 10.3 23.5 - 16.3 10.2 10.1 8.5
     0.0487 (weighted:    1.9464) Hand Disbalance                     | Hand loads % (no thumb): 54.87 - 45.13
     0.0000 (weighted:    0.0000) Row Loads                           | Row 1: 28.2%; Row 2: 63.1%; Row 3: 8.7%
     6.9453 (weighted:   52.4369) Key Costs                           | Worst unigrams: a ( 8.72%), ⇧ ( 6.94%), h ( 6.27%)

Bigram metrics:
  Not found: 0.0329% of 143233718.2575
     0.0145 (weighted:   11.3357) Finger Repeats                      | Worst: ea (10.50%), s. ( 8.20%), ⇩⇧ ( 5.76%);  Worst non-fixed: ea (10.50%), s. ( 8.20%), rl ( 4.31%)
     0.0120 (weighted:    9.3452) Finger Repeats Lateral              | Worst: ex (12.74%), ph ( 7.44%), ⇧c ( 7.29%);  Worst non-fixed: ex (12.74%), ph ( 7.44%), nb ( 5.51%)
     0.0014 (weighted:    2.6108) Repeats Top to Bottom               | Worst: m. (20.75%), 0s (18.28%), hy ( 7.11%);  Worst non-fixed: m. (20.75%), hy ( 7.11%), zh ( 5.98%)
     2.7128 (weighted:   14.9206) Line Changes                        | Worst: .0 (18.75%), `⇗ ( 7.47%), .\n ( 7.27%);  Worst non-fixed: yp ( 4.17%), pr ( 2.80%), py ( 1.70%)
     0.0011 (weighted:    2.3937) Manual Bigram Penalty               | Worst: ff (29.57%), ft (22.19%), vi (15.60%);  Worst non-fixed: ff (29.57%), ft (22.19%), vi (15.60%)
     0.3015 (weighted:   30.1462) Movement Pattern                    | Worst: di ( 8.85%), .\n ( 7.66%), ut ( 4.89%);  Worst non-fixed: di ( 8.85%), ut ( 4.89%), ls ( 3.35%)
    -0.0570 (weighted:  -11.4098) Movement Pattern (same row)         | Worst: ie (-19.22%), ei (-14.43%), te (-12.64%);  Worst non-fixed: ie (-19.22%), ei (-14.43%), te (-12.64%)
     0.4047 (weighted:    7.2839) No Handswitch After Unbalancing Key | Worst: ⇧o (18.58%), ⇧a ( 9.78%), p⇗ ( 9.72%);  Worst non-fixed: pr ( 1.65%), ou ( 1.00%), jo ( 0.84%)
     0.0410 (weighted:    8.2068) Unbalancing After Neighboring       | Worst: .\n (11.72%), pr ( 8.12%), ou ( 4.93%);  Worst non-fixed: pr ( 8.12%), ou ( 4.93%), io ( 4.80%)
    -0.0426 (weighted:   -0.0426) Symmetric Handswitches              | Worst: en (-37.40%), st (-15.87%), ne (-12.05%);  Worst non-fixed: en (-37.40%), st (-15.87%), ne (-12.05%)

Trigram metrics:
  Not found: 0.0414% of 162767371.5744
     0.8658 (weighted:    7.1427) Irregularity                        | Worst: .0\n (10.13%), s.\n ( 9.67%), ⇚⇩⇚ ( 8.51%);  Worst non-fixed: out ( 3.46%), exa ( 3.29%), ivi ( 2.83%)
     0.0275 (weighted:   17.8954) No handswitch in trigram            | Worst: ati ( 3.27%), ite ( 2.64%), ate ( 2.61%);  Worst non-fixed: ati ( 3.27%), ite ( 2.64%), ate ( 2.61%)
   140.8356 (weighted:   70.4178) Secondary Bigrams                   | Worst: ⇧sc ( 1.82%), ⇗uß ( 1.42%), ⇗⇩0 ( 1.28%);  Worst non-fixed: ben ( 0.80%), one ( 0.59%), hey ( 0.59%)
     0.0009 (weighted:    9.1731) Trigram Finger Repeats              | Worst: ⇚⇩⇚ (15.24%), exa ( 7.53%), ⇧⇩⇧ ( 5.21%);  Worst non-fixed: exa ( 7.53%), phy ( 4.49%), exe ( 4.04%)

Cost: 259.1387 (optimization score: 385893)

After PR

cargo build --release && RUST_LOG=INFO ./target/release/evaluate jduaxphlmwqßctieobnrsgfvüäöyz,.k
    Finished release [optimized] target(s) in 0.66s
[2022-04-02T10:44:33Z INFO  evolve_keyboard_layout::common] Reading unigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/1-grams.txt"'
[2022-04-02T10:44:33Z INFO  evolve_keyboard_layout::common] Reading bigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/2-grams.txt"'
[2022-04-02T10:44:33Z INFO  evolve_keyboard_layout::common] Reading trigram file: '"corpus/deu_mixed_wiki_web_0.6_eng_news_typical_wiki_web_0.4/3-grams.txt"'
Layout (layer 1):
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬──────┐
│ ^ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 0 │ - │ ` │ ←    │
├───┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬─┴─┬────┤
│   ⇥ │ j │ d │ u │ a │ x │ p │ h │ l │ m │ w │ q │ ß │ Ret│
├─────┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┬──┴┐   │
│    ⇩ │ c │ t │ i │ e │ o │ b │ n │ r │ s │ g │ ⇘ │ ´ │ ⏎ │
├────┬─┴─┬─┴─┬─┴─┬─┴─┯─┴─┬─┴─┬─┴─┯─┴─┬─┴─┬─┴─┬─┴─┬─┴───┴───┤
│  ⇧ │ ⇚ │ f │ v │ ü │ ä │ ö │ y │ z │ , │ . │ k │    ⇗    │
├────┼───┴┬──┴─┬─┴───┴───┴───┴───┴───┴─┬─┴──┬┴───┼────┬────┤
│  ♕ │    │ ♔  │           ␣           │  ⇙ │    │    │  ♛ │
└────┴────┴────┴───────────────────────┴────┴────┴────┴────┘

Layout string (layer 1):
jduaxphlmwqßctieobnrsgfvüäöyz,.k

Layout compact (layer 1):
jduax phlmwqß
ctieo bnrsg⇘
fvüäö yz,.k

Layout metrics:
     1.0000 (weighted:    0.3500) Badly positioned shortcut keys      | Bad shortcuts: x
     2.5000 (weighted:    7.5000) Similar Letters                     | Poorly placed pairs: mn
     0.0000 (weighted:    0.0000) Similar Letter-Groups               | 

Unigram metrics:
  Not found: 0.0218% of 116607824.9096
     0.2534 (weighted:   17.4860) Finger Balance                      | Finger loads % (no thumb): 9.6 11.4 10.3 23.5 - 16.3 10.2 10.1 8.5
     0.0487 (weighted:    1.9464) Hand Disbalance                     | Hand loads % (no thumb): 54.87 - 45.13
     0.0000 (weighted:    0.0000) Row Loads                           | Row 1: 28.2%; Row 2: 63.1%; Row 3: 8.7%
     6.9453 (weighted:   52.4369) Key Costs                           | Worst unigrams: a ( 8.72%), ⇧ ( 6.94%), h ( 6.27%)

Bigram metrics:
  Not found: 0.0329% of 143233718.2575
     0.0145 (weighted:   11.3357) Finger Repeats                      | Worst: ea (10.50%), s. ( 8.20%), ⇩⇧ ( 5.76%);  Worst non-fixed: ea (10.50%), s. ( 8.20%), rl ( 4.31%)
     0.0120 (weighted:    9.3452) Finger Repeats Lateral              | Worst: ex (12.74%), ph ( 7.44%), ⇧c ( 7.29%);  Worst non-fixed: ex (12.74%), ph ( 7.44%), nb ( 5.51%)
     0.0014 (weighted:    2.6108) Repeats Top to Bottom               | Worst: m. (20.75%), 0s (18.28%), hy ( 7.11%);  Worst non-fixed: m. (20.75%), hy ( 7.11%), zh ( 5.98%)
     2.7128 (weighted:   14.9206) Line Changes                        | Worst: .0 (18.75%), `⇗ ( 7.47%), .\n ( 7.27%);  Worst non-fixed: yp ( 4.17%), pr ( 2.80%), py ( 1.70%)
     0.0011 (weighted:    2.3937) Manual Bigram Penalty               | Worst: ff (29.57%), ft (22.19%), vi (15.60%);  Worst non-fixed: ff (29.57%), ft (22.19%), vi (15.60%)
     0.3015 (weighted:   30.1462) Movement Pattern                    | Worst: di ( 8.85%), .\n ( 7.66%), ut ( 4.89%);  Worst non-fixed: di ( 8.85%), ut ( 4.89%), ls ( 3.35%)
    -0.0570 (weighted:  -11.4098) Movement Pattern (same row)         | Worst: ie (-19.22%), ei (-14.43%), te (-12.64%);  Worst non-fixed: ie (-19.22%), ei (-14.43%), te (-12.64%)
     0.4047 (weighted:    7.2839) No Handswitch After Unbalancing Key | Worst: ⇧o (18.58%), ⇧a ( 9.78%), p⇗ ( 9.72%);  Worst non-fixed: pr ( 1.65%), ou ( 1.00%), jo ( 0.84%)
     0.0410 (weighted:    8.2068) Unbalancing After Neighboring       | Worst: .\n (11.72%), pr ( 8.12%), ou ( 4.93%);  Worst non-fixed: pr ( 8.12%), ou ( 4.93%), io ( 4.80%)
    -0.0426 (weighted:   -0.0426) Symmetric Handswitches              | Worst: en (-37.40%), st (-15.87%), ne (-12.05%);  Worst non-fixed: en (-37.40%), st (-15.87%), ne (-12.05%)

Trigram metrics:
  Not found: 0.0414% of 162767371.5744
     0.8658 (weighted:    7.1427) Irregularity                        | Worst: .0\n (10.13%), s.\n ( 9.67%), ⇚⇩⇚ ( 8.51%);  Worst non-fixed: out ( 3.46%), exa ( 3.29%), ivi ( 2.83%)
     0.0275 (weighted:   17.8954) No handswitch in trigram            | Worst: ati ( 3.27%), ite ( 2.64%), ate ( 2.61%);  Worst non-fixed: ati ( 3.27%), ite ( 2.64%), ate ( 2.61%)
   140.8356 (weighted:   70.4178) Secondary Bigrams                   | Worst: ⇧sc ( 1.82%), ⇗uß ( 1.42%), ⇗⇩0 ( 1.28%);  Worst non-fixed: ben ( 0.80%), one ( 0.59%), hey ( 0.59%)
     0.0009 (weighted:    9.1731) Trigram Finger Repeats              | Worst: ⇚⇩⇚ (15.24%), exa ( 7.53%), ⇧⇩⇧ ( 5.21%);  Worst non-fixed: exa ( 7.53%), phy ( 4.49%), exe ( 4.04%)

Cost: 259.1387 (optimization score: 385893)

enhancement

opened by Glitchy-Tozier 16

Improve optimization for "diacritics-modifiers"
At the moment, keys like `, ^, ˇ, ', ¸, ~, etc. are only treated as their individual characters. This drastically reduces this optimizer's potential to be used in other languages.

Example: Spanish uses

´ + a

´ + e

´ + i

´ + o

´ + u

These Acute Accents are not optimized for. There are similar, but worse problems in languages such as French, Portugese, Swedish, Polish, and so on.

My suggestion: Properly optimize for all possible combinations that key might produce. Example with the ^-key:

^ + space = ^ (Caret)

^ + ^ = ̂ (Circumflex)

^ + allowed letters, such as e = ê

^ + disallowed lettrs, such as r = nothing

enhancement
opened by Glitchy-Tozier 11
Modify ortho_bored config to more closely match Moonlander.
Please check whether my modifications actually make sense. There is a good deal of guesswork involved in my changes.

Intended changes:

Introduced another thumb key. Updated tables and graphical representation accordingly.

Reassigned mods, space, and BACKSPACE to thumb keys. No need to define them for both hands.

Introduced another mod key: ⑤

In an effort to reduce the high ratio of not found trigrams (see below), I added grave and acute accents to layer 3. But this did not help.

Observations:

Keyboard produces very low result for e.g. : zluaqßbdmwjhrieocntsgfyxüäövp,.k of 136,84, which is surprisingly low.

There is a suspicious looking output in the evaluation that might hint at some sort of issue. I do not understand it though: "Trigram metrics: Not found: 6.8399% of 486487579.7860". This is high compared to standard keyboard layout.
opened by natsume42 11
Add Simulated Annealing Algorithm

See #3 for discussion.

This pull request is an attempt to add the simulated annealing algorithm to this project. According to my current understanding, it may be the most productive algorithm for this kind of problem.

opened by Glitchy-Tozier 9
Upload scripts used to create corpus

@dariogoetz, could you please add the scripts you used for creating corpus-files? It would be neat if users had the ability to easily create new corpora themselves.
enhancement

opened by Glitchy-Tozier 6
Improve `asymmetric_keys`-metric
I started thinking about this when I saw you removed ["gbd", "kpt"] from the metric. On one hand I think those letters are related and thus should be grouped with each other. On the other hand I understand that it is unnecessary to have b→p be in the same relative positions as g→k. With those non-diacritics-letters I'd actually expect them to be in "related positions". Thus I propose improving the evaluation of similar letters:

1. Related position

This metric checks whether the specified letters are in sensible related positions. Something like this:

0% cost if they are next to each other and in the same row or the same column (not diagonal, though)

((50% cost if they are in the same column (the same finger), but there's a row separating the two keys))

((50% cost if they are in the same row and have symmetric or similar positions (by similar positions, I mean something like k&g in KOY)))

100% cost if none of the criteria apply

similar_letters: - ["aä"] - ["oö"] - ["uü"] - ["gk"] - ["pb"] - ["dt"] - ["mn"]

2. Position-groups

This is the same as what the current metric does, according to my understanding. Symbols of specified groups shall have the same relative location to each others.

similar_letter_groups: - ["auo", "äüö"]

This change allows us to

Make sure similar letters are in related positions to each other. This also is an important improvement to aä oö uü and also allows us to improve placements of gk pb dt mn without blocking too many of the other metrics.

Still make sure aou äöü are grouped in clusters.

enhancement
opened by Glitchy-Tozier 6
matrix_positions in manual_bigram_penalty

when evaluating the "manual_bigram_penalty" score, it seems that the key positions are interpreted in the inverse way : [to_position, from_position] and not in intended and mentioned in the comments [from_position, to_position]

I think the culprit is this line in manual_bigram_penality.rs: .map(|(((x1, y1), (x2, y2)), w)| (((*x2, *y2), (*x1, *y1)), *w)),

It should be ? : .map(|(((x1, y1), (x2, y2)), w)| (((*x1, *y1), (*x2, *y2)), *w)),

Please let me know if I'm wrong, Thanks and have a nice day

opened by mbooga 5
Re-test all layouts in https://keyboard-layout-optimizer.herokuapp.com/ every week

Seeing as how our criteria change every now and then, it would be a good idea to make the website re-test all uploaded Layouts. Otherwise, that website would default to being a collection of found layouts with outdated scores. Which, unfortunately, wouldn't be very helpful.

(As an example, the layout ,so.xvdcljqßzaeiubtnrgyäöhüpfmwk achieves a different score on both pages)

opened by Glitchy-Tozier 5
Option to permute symbols in all layers

My understanding is that currently there is only an option to permute symbols in the base layer, and specifying which layers move with the base layer.

I would like to be able to permute symbols in all layers, and specifying which layers change with the base layer (for capital letters in particular).

Does this option exist?

opened by nimr0d 1
Support combos

Combos are a feature in QMK/ZMK where two or more keys pressed in a small window of time results in a specific output. For example, looking at the "mine" layout on the webapp, a combo may be hitting both G+Y to output ß. The keyboard yml suggests that the total keycost will be 5+14 < 25 and there seems to be room for significant reductions in costs. Another example may be changing an u to ü when combo'd with another key. In my experience, combos never misfire and are easy to press without disrupting the flow (esp. middle and ring finger combos since those fingers like to move together).

I'm not well versed in rust at all, hence I can't comment much about the implementation, but combos seem straight forward as a key with 2+ fingers in 2+ positions. Single finger bigram penalties will then be extended to avoiding bigrams that share the same fingers. I suppose just having all two finger combos as extra key positions will allow the simulated annealing algo to choose which ones they like.

This seems to be the best layout optimizer project that I have found and is actively developed. So it will be great if combos can be incorporated into the layouts!

opened by rayduck 2
Idea for reducing redundant calculations
I'll start by oversimplifying and then move onto potential complications.

The Concept

Let's see what evaluation-results depend on. we're comparing those two layouts:

abcde fghij zyxwv utsrq ... ... ... ...

Let's compare the evaluation-result for two bigrams with letters on the same positions (same Keys and same layers):

Layout 1: "ab"

Layout 2: "zy"

We get our evaluation result by calling individual_cost() and supplying parameters to it. The main ones are:

Weight

LayerKeys

The LayerKey are what actually is used in most of the calculation.

pub struct LayerKey { /// Layer of the layout which the symbol belongs to pub layer: u8, /// Key to press for the symbol pub key: Key, /// Symbol belonging to a layout pub symbol: char, /// Vec of modifiers required to activate the layer (in terms of a [`LayerKeyIndex`] for a layout) pub modifiers: Vec<LayerKeyIndex>, /// If the key shall not be permutated for optimization pub is_fixed: bool, /// If the symbol itself is a modifier pub is_modifier: bool, }

How do the LayerKeys of 'ab" and 'zy' differ? layer is equal, key is equal, modifiers is equal, is_fixed is equal, is_modifier is equal. Only symbol is different, I'll address that later. The LayerKeys are derived from a certain index of a layout. Thus, in the evaluation of 'ab" and 'zy", only (or mainly) two things change:

The bigram's weight

The indices of the bigram's letters on our current layout.

To jump straight to the point: There is only a certain numer of possible index-arrangements. Thus, there's only a certain number of scores that need to be multiplied with weight before returning them from individual_cost(). Due to the limited variability, we could actually

Pre-calculate the scores for every metric for every possible index-arrangement.

Place those scores into a Vec.

Instead of costly evaluation (individual_cost(LayerKey1, LayerKey2, weight)), we could reduce this to an index-access. (precalculated_scores[idx1][idx2] * weight)

This would remove a lot of code:

Most of the evaluation taking place for each layout

get_filtered_layerkeys()

The creation of LayerKeys for each layout

Possibly some additional stuff I'm not aware of

RAM

Shouldn't be a problem. The number of possible index-combinations we need to store scores for is roughly the following. We assume that there's 60 different keys/positions/indices (due to the fact that we split all higher-layer-ngrams into L1-ngrams):

Unigrams: 60^1 different sequences ~5 metrics – 5 vecs with len 60

Bigrams: 60^2 different sequences, ~10 metrics – 10 vecs with len 3600

Trigrams: 60^3 different sequences, 5 metrics – 5 vecs with len 216000 (comparable to the number of Trigrams we process)

Caveats

Sometimes, the layerkey.symbol information is used as well. This seems to happen in two scenarios:

To create informative messages ("Worst Bigrams: …")

To filter out unwanted n-grams (the pause-indicator-stuff)

I'm not sure whether this is doable (the rare need for symbols makes some things slightly more complicated) and worthwhile (there would be speed-improvements, but we would still need to use split_trigram_modifiers(), which probably makes out at least half of all computational work.
enhancement question
opened by Glitchy-Tozier 2
Improvements to corpus
A random collection of ideas that would improve usage or optimization.

[ ] Add a README to /corpus, which explains where the sentences for those word-lists came from and what was done to them.

[ ] Add a README to every folder, explaining what files were used in what proportions (for example, with which percentages does "deu_mixed_wiki_web…" incorporate "wiki" and "web"

[ ] Include different countries for existing languages: Some collections (for example "web-public") separate the results into different countries.

English: I suggest taking files from the US, Great Britain, and Australia and weighting them 1:1:1.

German: As there's way less Austrians than Germans, we could tilt the weighting in favor of German corpora when adding "Austrian German". While I'd prefer going 1:1, I know many would not agree with this. Currently, the ratio of the two populations is roughly 9:1, so this might be an acceptable starting-point.

[ ] In http://www.adnw.de/index.php?n=Main.Bewertungsverfahren, multiple sources of n-gram frequencies are mentioned. It would be interesting to incorporate some of them, to further solidify the validity of our corpus.

enhancement
opened by Glitchy-Tozier 1
Filtering out >99.99% of layouts before even testing them
First the basic idea, then the implementation. Of course, as all other evaluation-modes, this should be optional and configurable.

With a problem space of ~10^35 layouts, we can't solely rely on the fact that we use one of the fastest programming languages there is (and a great project-structure, I've been told). Optimization algorithms are great and help a lot, but even they are bound by the time it takes to test layouts. Therefore the idea:

Idea

Add a mechanism that checks a layout for minimum requirements. Do this using only unigrams. Only use criteria that can be true or false (instead of getting a cost). This check should very fast compared to the bigram & trigram metrics currently in use.

Only if the found layout meets those basic requirements should it be accepted for regular evaluation.

Implementation

It might be tempting to use this method inside the regular evaluation. We might think to stop evaluation if we see that costs become too high, midway during evaluation. I think this is not optimal, mainly because it might mess up algorithms. Stopping evaluation wouldn't absolve us of returning a total_cost(), which we can't precisely, having preemptively stopped evaluation. This will impact any algorithm that requires precise comparisons of layout-costs. (SA, possibly ABC and Genevo).

Instead, I propose a function like this meets_min_requirements, that might be used like this: evaluator.meets_min_requirements(&layout). This function would be placed inside layout_generator.generate_random(), common::perform_n_swaps() and common::switch_n_keys().

The goal is to make those functions check if their generated layout meets our minimum requirements, and if it doesn't, try another swap/switch/random layout.

Basic example:

/// Takes in a Layout, switches [nr_switches] keys in that layout, then returns it. /// Layout, in this case, is a [Vec<usize>]. pub fn perform_n_swaps(&self, permutation: &[usize], nr_switches: usize, evaluator: Evaluator) -> Vec<usize> { get indices-vec loop { 1. swap indices from original indices-vec 2. let meets_min_requirements = evaluator.meets_min_requirements(&swapped_indices) 3. if meets_min_requirements { break } } indices }

Minimum requirements

Of course, those requirements should be carefully and conservatively considered. The ones I propose as a starting-point are the following: ("top keys" refers to the most frequent letters. "top positions" are read from key_costs:)

key_costs: - [80, 70, 60, 50, 50, 50, 60, 60, 50, 50, 50, 60, 70, 80] - [24, 16, 10, 5, 12, 17, 20, 13, 5, 9, 11, 20, 36] - [ 9, 5, 3, 3, 3, 6, 6, 3, 3, 3, 5, 9, 30, 6] - [20, 16, 19, 24, 20, 9, 30, 10, 8, 22, 22, 17, 19] - [ 0, 0, 0, 3, 7, 0, 0, 0]

Key Costs

top 2 keys are on the top 16 positions

top 6 keys are on the top 20 positions

bottom 6 keys are NOT on the top 6 positions

This would only leave approximately (16/32)^2 * (20/32)^4 * (16/32)^6 = 0,0006 = 0.06% of layouts

Hand disbalance, Hand switching

top 4 keys are not all on the same hand

This would probably (I'm not 100% sure about the math) leave approximately *(0,5^4)2 = 0,125 = 12% of layouts

Finger loads

Each finger's load is less than twice and more than half the optimal load

This is a good one, I'm not sure how to calculate its advantage though.

Bad positioned shortcut keys

(This is an controversial one!! Remember, it should be configurable!)

All the desired shortcut-keys should be placed on the [x] leftmost columns of the keyboard

This would only leave approximately 0,5^4 = 0,0625 = 6% of layouts.

All of this would come down to 0,0006 * 0,125 * 0,?? * 0,0625 = 0,000004688. I don't want to count how many % of layouts that leaves, but it's helpful to say the least.

EDIT: It's about 1 in 200_000.

Feedback?

There's one small logical caveat that we need to consider when implementing this idea, but that's a question for later. What are your thoughts on the proposal, @dariogoetz?
enhancement
opened by Glitchy-Tozier 10