1. Tags instead of tokens. That means they would be more interactive, separate blocks, but we would keep compatibility with the text interface and ability to switch between them.
2. Keeping all the existing features and displaying them in the UI, including weights, blend, swap.Plan the ability to control them with a click rather than keyboard input.
2.1 Don't divide fields into positive and negative prompts, it's better to separate tags by colour or form.
3. Autocomplete tokens as you type.
4. Suggest tokens, depending on the context. Not sure exactly how to implement it yet. But at least we know what user has entered before, and could suggest frequently used tokens, for example.
5. Presets: saving and loading prompts.
6. Libraries: tokens and TI. Like on that screenshot, but without a separate field for them, because in my opinion it's part of the prompt. Search, display suggestions with pictures, add a few by click.
Design sources
https://www.dropbox.com/sh/hwetq9hcnftswk2/AACopwF3GBkCApUn3nS99iY4a?dl=0
1. Make the positive tags rounded and the negative tags square.
2. Input: entering a word the usual way, a word or group of words turns into a tag after entering a comma or enter. Two clicks on the centre of the tag, on the word inside — edit.
When focus in the input field, a block appears at the right with a selection of concepts and a search. On a separate tab in the same block, token and string management — save, load, search, select. When you change the focus from the input field this block disappears.
The context menu (right button click) contains tag types and operations with them like with any objects in any editors. Copy, paste, cut, delete, and so on. This is necessary for some cases and to duplicate features and make them more accessible.
When hovering over a tag, the tag block enlarges — so it's easier to hit it and we solve the problem of changing very short words and typos. At the same time, the text or token, the word inside the tag, remains in its place, it should not move. Only the size of the form around the tag changes. A great solution is the dock response in macos, it is magnetised to the cursor.
3. Change token type (weight, blend, swap): by left-clicking on the tag, cyclically. This changes the tag visually, for example, I want to assign to each type of tag a different color, or rather a gradient, to distinguish them from other tags without additions. Change is also accessible through the context menu.
4. Change the weight: by clicking on the right and left side of the tag. The side is highlighted and turns into a + or - button. When you change the weight, the tag has an indicator below — dots. I’m not sure there is a need for more than 3 (+++ or - - -). Also, if a user uses our text syntax, we can turn it into a tag of some type.
5. Swap: tag with two words, between which there is an icon like refresh. Each word is edited separately, mechanics as described above. You write one word, a tag is created, the focus goes to the part of the swap for the other word. Or write two words with unused character between them (like & or %). Swap with hover:
6. Blend: tag with two words, each of which has a weight indicator for blending. The mechanics are as described above.
7. Selecting multiple tags: possible via shift or ctrl, then selecting a type via the context menu. For example, to change the weight of a group of tags or change them to negative tags.
Moving and merging tags: drag one tag on another and merge them to swap or blend. Drag to any part of the field and leave it there.
Option to automatically arrange the tags. This refers to grouping negative tags after positive ones. It would be convenient, but not necessary.
8. Suggestions while typing: some tokens in the separate block below the cursor (can be disabled, optional).
Suggestions include tokens suggestions when the user doesn't know what to type in, but we know the context of the prompt. Also, if the user doesn't know how to start, has no ideas at all, we can set the “Random prompt” for example.
9. Autocomplete tokens while typing: add the token’s part at the right side of a cursor, add it as a token with enter:
Tokens library consists of four types:
1. Single token — a keyword, tag
2. Style concept — several tokens for some style, which can be added to any prompt
3. Prompt — complete prompt to save it outside an image and to share
4. Embedding — TI trained
There are local and remote (Hugging Face) embeddings. Local embeds can be marked as favourites (if HF remote embed is marked, it downloads to the local storage).
“Save prompt” button is always there. If no token is selected, it saves a full prompt. If one token is selected, it saves the token. If multiple tokens are selected, it saves style.
Tokens organised into categories — folders or just virtual categories in some local file. Favourites — all frequently used tokens. Tokens are sorted A–Z. Main categories are:
1. Subject — landscape, portrait, still life, some objects, people, animals, etc.
2. Setting — location & context: cityscape, rural, interior, studio…
3. Style — art styles, such as surrealism, dada, installation, futurism and so
4. Artist — artists names
5. Medium — what and how the object is drawn or made of. Drawing, photo, ceramics, steel, oil painting, wood, etc.
6. Composition — arrangement of the elements, abstracts like symmetry, balance, emphasis (if they work at all).
7. Camera & Light — available camera models, films, focal lengths, lighting names and other photographic terms.
8. Emotion — happy, sad, horror, smart, beautiful…
Prompts are organised in “folders” created by a user. “Previously used” prompts — those that the user has in the log file. Some of the prompts already have pictures (previously generated), maybe we can show them, too.
• • •
———NOTE: All this work was done for InvokeAI project in 2023. Original document is here. You can use it for free and reach me if you want to develop something.