aka_FA

Aka-FA Journal #1: Making a Dialogue System in Game Maker Studio (Patreon)

Published:

2020-01-30 21:46:18

Edited:

2020-03-18 20:04:21

Imported:

2021-09

Content

Making a Dialogue System in Game Maker Studio

This is how my spent my christmas break

Abstract

Most videogame engines do not have built-in systems for advanced text rendering. This paper looks into using html-like tags in the text to add life and character to the text on screen, as well as other effects related to the dialogue. It also explores the methods created to interpret the text and how to render it attractively to the screen.

Introduction

Game Maker has incredibly basic text rendering capabilities. You pick a font with a baked-in size, and tell it what to render to the screen. There's very basic line-breaking capabilities. But that's it. If you want to make the text look fancy, you'll have to write your own code to manipulate the plain text you're drawing to the screen.

Normally the dialogue is something of an afterthought to actual game systems when I'm programming. For a change, I wanted to focus on just the dialogue system and give it all the polish it needs. While this endeavor is far from complete, I have set up a flexible system that could easily be expanded to fit more functionality.

Requirements

Text will appear over time, character by character.
The input for the dialogue system should be readable by humans (read: me) and easily writable by humans (read: also me). No extra tools should be necessary to reasonably work with the system. (read: I don't want to make those)
Different parts of the text should be able to be rendered differently
The minimum set of text modifiers is:
- faster scrolling text
- slower scrolling text
- shaking text
- bold text
- coloured text
Portraits of the speaking characters should be on screen as dictated by the input
Two portraits can appear at the same time, one on the left and one on the right. They will face each other. More functionality would be nice, but outside of the current scope.
The portraits should be able to be moved by the input
The minimum set of portrait modifiers is:
- hopping around
The UI that houses the text should not use drawn elements, and instead rely on Game Maker's Primitive drawing functions. To allow it to scale to any size without getting blurry.
The UI elements should slide into the screen dynamically

Implementation

Input Syntax

I could try to reinvent the wheel -and some would argue I'm doing just that by not picking up a visual novel engine- but instead, I based my input syntax on something used worldwide to format and style text: HTML.

To be a bit more human-readable, I did opt to define my own tags instead of using HTML tags. So <bold> instead of <b>. Lazy me already curses the 3 extra letters.

Mind you, my syntax isn't nearly as complex as HTML. It only needs to tell the text how it looks, positioning is handled, well, like how text tends to work in novels. From left to right and top to bottom. (Don't comment on my problematic eurocentric bias there.)

Functionally the syntax contains three different kinds of tags: opening tags, closing tags, and stand-alone tags. Opening tags look like '<name>'. Closing tags look like '</name>'. You can't have an opening tag without an associated closing tag. Everything between the opening and closing tag is modified in some way, defined by the tag. Standalone tags look like '<name/>'. Standalone tags in HTML are equivalent to writing '<name></name>'.

Tags can include attributes like so: '<slow speed=0.1>', this would tell the slow tag exactly how slow it goes. These are at the moment only used by the effect nodes to make them more flexible. When attributes are used, they usually have default values, and writing them down overrides said values. This way they don't have to be written down by the writer unless the default behaviour isn't good enough.

Text Parsing

The next step is for code to read the input and parse it into a structure the engine can easily use to render the text to screen. With regular HTML the text is parsed into a mathematical tree structure. I originally considered doing the same for my text parser, but as it turns out, that's too much unnecessary complication for simple text. This text is always going to be sequential, and thus a list structure is easier to work with than a tree.

The list contains nodes. There are text nodes and effect nodes. Text nodes contain text, and all the modifiers to apply to said text, while effect nodes hold instructions for other parts of the dialogue system. The goal of the parser is to translate the input string into a list of nodes to be used by the rendering system.

The text parser will run recursively through the input string. It is provided the entire input string, as well as an object containing all the text modifiers. This object will be empty at the start.

If no opening tag is found:
1. create a text node, put the string in. You're done!
If an opening tag is found:
1. the parser looks for the associated closing tag. Then all the text before the tag is put into a text node, along with the provided text modifier. Put the text node at the end of the list.
2. The parser makes a copy of the text modifiers, and adds the new modifier based on the opening tag in. It will then recursively call the text parser again, providing it the text between the opening and closing tags, as well as the expanded collection of text modifiers.
3. Finally, the part after the closing tag is also handed to the text parser recursively, with the unmodified text modifier object.
If a standalone tag is found:
1. an effect node is created. It reads the associated attributes and adds them to the node.
2. Recursively continue parsing the text after the standalone tag.

Let's look at an example input string for the dialogue system:

<portrait character=Miriel/><fast>Hey, let's have a <red>new year's party</red>!</fast>

This input string contains a standalone tag (portrait), and two sets of opening and closing tags (fast and red).

We then translate these three elements into text and effect nodes. These nodes are put together sequentially in a list, looking like this:

Let's look at the two types of nodes in more details.

Text nodes

A text nodes consists of text, a collection of modifiers, and a state. The modifiers have been predefined, and to quickly repeat them from the requirements list, are as such: fast, slow, red, bold, shake

The state is used to decide whether the text node starts scrolling through the text. By a default, the text node's state is inactive. At some point, the node's parent will tell it to activate, at which point it starts scrolling through the text, at a pace defined by the modifiers. When it is has scrolled through the entire text, the state is set to complete. This tells the parent that it can activate the next node.

Rendering a text node to screen is fairly simple. You add up all the modifiers and use Game Maker's basic text rendering tools. The parent object that contains the list of nodes tells the node the screen location.

Note: I should've been a good boy and either have something cool happen when stacking fasts on fasts or fasts on slow, but I didn't. Because I have faith in the end user (me) not being enough of a moron to mistakenly write these modifiers together.

I may very well be wrong.

Effect nodes

An effect node has an action defined by the tag's name, and a collection of attributes telling how and where to do the action it does. Each effect node has a default set of attributes. For example, the Portrait effect node has the attributes character, side, facing, and mood. Character is the only required attribute. It'll tell the system which character sprite to draw to the screen. Side is defaulted to right, if not specified the character will be on the right. Facing is default to inwards, the character will look towards the center of the screen, and mood is default to default (wow). If you want the character to smile you'll have to set the attribute for mood to 'happy'. And draw a happy character sprite. That's the hard part.

Drawing the UI

I wanted a sleek and simple look. I decided to look at visual novels for inspiration and I ended up with a partially translucent black box surrounded by white bars on the top and bottom. The white bars cast a shadow to add depth to the UI. The sides of the white bars and the black text box fade out.

Game Maker Studio does not have a good way to draw primitive shapes like rectangles with an opacity gradient. I had two options:

define a script that draws the gradient by drawing a new rectangle for each horizontal pixel at a slightly higher opacity.
Define a shader to use over drawing one rectangle which sets the opacity with each pixel.

The first one would've been a lot easier. But I wanted to do it with a single draw_rectangle call. So I made a shader. It kinda sucks. I taught shaders to fellow students. I should be embarrassed of myself.

I don't even want to write what I did for the shaders, so let me tell you how I'd do it now instead:

Define some sprites of 256 by 256 pixels, these contain a black to white opacity map. This sprite can be given to the shader as a texture. The darker the pixel value in such a map is, the lower the opacity. The shader has no trouble in finding the pixel value in the texture that corresponds to the pixel of the rectangle it is drawing. It is a flexible method which can fade all sides, both sides, the center, or any weird thing you can draw in black and white on the opacity map.

Why 256 by 256 pixels? Colours are defined by 4 bytes. A byte can contain any number from 0 to 255 (which makes 256 distinct values). One of the 4 bytes signifies opacity. By having a 256 by 256 map, every level of opacity the hardware can support, can be drawn. If you have a smaller opacity maps, the gradient can look blocky. Larger opacity maps are pointless for linear gradients (but can be useful for other maps!)

About animating the UI

To animate the UI, I introduced a Tween object. Tween is a term often used in modern animation. It's short for in between, and specifically is often used for animation software automatically moving an image from one point to the other over a certain amount of frames. That is exactly what this object does. You give the object a starting location, an ending location, the time it has to get there, and a tween-type (more on that later), in the object itself you just tell it each frame to update its screen location to the new tween result. A parent object could somewhat simplify using tweens even more, letting the parent object handle updating the screen location each frame. However, this requires setting a parent for the object and making sure to tell the child to run the parent's code as well. These two steps make the benefit of a parent object currently not worth it.

Tween types

There are multiple types of tween types. The most basic one is a linear tween. If an object has to move 100 pixels over 5 frames, a linear tween moves the object 20 pixels per frame. It's a functional and sometimes useful tween, but often, this type of tween feels stiff and lifeless compared to other types. I still programmed it. Because it's easy.

For the other tween types I ended up making a generic Bézier-tween. A Bézier curve is a curving line defined by multiple points. In my scenario I used four. The starting point, the ending point, and two points that define how the curve moves. There's a big, long, and scary-looking formula to use with Bézier curves that I couldn't explain correctly if I wanted to. The point is, fancy curvy lines over which I have a lot of control. Then by using the Bézier formula, I can get the y value for the x value, where x is the time passed since the start of the tween, and y then symbolizes how close the coordinates should be towards the end nodes.

Using the Bézier-tween, I could easily create tweens that ease in (accelerate over time), ease out (decelerate over time), do both, and also do what the scientific community calls a 'sproing' tween, where the animation first takes a step back, then overshoots the goal, to settle into position afterward.

Don't tell the scientific community I said they call it a 'sproing' tween.

Conclusion

While not perfect, the system features a lot of flexible features to enhance over time. As you might have noticed, the text modifiers are not complete, and there are only very few Single modifiers at the moment.

There could be easy room for expansion by building a sound system. Both text nodes and effect nodes could make use of this. Text nodes could have a 'blip' with each letter to symbolize talking, and this blip could be changed to fit the tone of the character.

Meanwhile effect nodes could easily be used to include a single sound, like a sound byte for a character “Hei'na!” or a sound effect, like an audible “!”. It could even be used to run a music system, and cut the background music halfway through a sentence.

The UI is simple, but I like to believe it is the good kind of simple. It doesn't detract and leaves the player to focus on the writing (or, let's be real here, the portraits). The tween system used to move UI elements into and out of place is flexible and the code to use it is easily readable. The Bézier-tween could be improved even more by allowing a dynamic number of points to define the Bézier curve. But for most intents and purposes, the four-point Bézier should be enough.

For the UI, the previously mentioned shader could be greatly improved. I would have already, if I wasn't really lazy and didn't hate debugging shaders.

Files

Dialogue System example

The most beautiful dialogue system imaginable, put to the test.