FAP with a Keyboard

Previous Post: FAP says Hello World

Next post: FAP reborn – Backplane

Github repo

Now that FAP’s video card is finished, it’s time to move on. Although we’ve come so far, FAP is still missing something crucial in order to be called a proper computer, it still needs at least one input device. That will be the keyboard. More specifically, a PS/2 keyboard.

Using PS/2 keyboard for retro computers is nothing new thanks to its simple protocol and wide availability. It uses a simple synchronous serial interface, one clock line and one data line, and the documentation is all over the internet so I’m not going to bother to explain here again. Actually I didn’t bother to read much about it at all because of a reason you’ll see soon enough.

Before we start working on keyboard though, we need to first figure out some way for keyboard to talk to the CPU. Generally there are two means of communicating with peripherals, memory mapped I/O or port I/O. In memory-mapped I/O peripherals’ registers are mapped to the memory address space, and CPU writes or reads its memory address to control the peripheral. I used this method for FAP’s video card. For the keyboard I decided to use port I/O, since it’s usually used with external peripherals, and I feel it’s a good exercise to try a little bit of everything with Z80, which is why I’m doing this in the first place. The Z80 supports 256 ports, when the user calls in or out instruction it places the lower 8 bit of the register on to the address bus, and activates IORQ and one of the RD or WR lines to read or write a byte from that port.

With the I/O method sorted, next issue is how to let CPU know the keyboard events. One way is polling, in which the CPU constantly asks keyboard if it has something. This approach is simple but extremely wasteful, since CPU will spent most of its time reading from keyboard instead of doing actual work. A much better way is using interrupts. As its name suggests, when there is actually data from keyboard, the CPU will be interrupted from its work, it then can store the keystroke into a buffer and do something with it later. This way the CPU does not waste anytime polling the keyboard, and only acts when there is an actual key press, and that is what I’m going to use today.

The Z80 has a (seen from today)somewhat rudimentary but still rather clever interrupt system. There are two interrupt pins on the package, INT and NMI, the former is for regular interrupt that can be disabled in software, the latter is non-maskable interrupt and always occurs when the line goes active. For the regular interrupt, 3 interrupt modes are provided, mode 0 is a 8080-compatible mode where when interrupted, you have to put some instruction on the data bus for the CPU to execute. It’s basically witchcraft so I’m not going to use it. Mode 1 is much simpler, it just jumps to address 0x38 when interrupted. This makes designing simple systems very easy since you can just put your interrupt handler there, or a jump instruction to jump to somewhere else if you need more space. However, if in mode 1 there are more than one interrupting devices, the CPU will have to figure out who is the one that initialized the interrupt, which gets complicated real fast.

That’s where interrupt mode 2 comes into play. I’m going to list how mode 2 works off the top of my head since I worked with it for so long, it’s going to be a mouthful and I hope I get it right: When interrupt fires, the interrupting device puts a byte onto the data bus, the CPU then combines the 8 bit in the interrupt vector base register and the 8 bit on the data bus to form a 16 bit address, jumps to that, read a 16-bit word at that address, which is the interrupt service routine address, then make another jump to that. For example, if I load interrupt register with 0x12, and put 0x10 on the bus after interrupting the CPU, the CPU will jump to 0x1210. If I put data 0x3000 there beforehand, the CPU will then read a word from 0x1210, gets 0x3000, and jump to 0x3000, which is where the handler is. It’s extremely confusing at first, but once you get the hang of it you’ll realized this vectorized interrupt system is much more powerful than other modes, you can change the location of ISR on the fly, and it supports a huge number of interrupting devices. It might be just a keyboard for now, but I’m also going to add a WiFi module in the future, which takes 2 ports, a timer wouldn’t hurt either. Anyway, the thing is that FAP is going to have a couple more peripherals later on, and interrupt mode 2 is the best way for it.

Now let’s recap the entire interrupt process to see what my keyboard controller needs to do: When user presses a key, the controller pulls the INT line low, wait until M1′ and IORQ’ both goes low (we’ll call it INTACK’, short for interrupt acknowledge), put the interrupt vector on the data bus, wait until INTACK goes high, then stop driving the data bus. Sounds like a lot of work, but I can just use 2 chips for that. A 74HC32 OR gate to generate INTACK’, which is then tied to OE’ of  a 74HC245 buffer, it’s bidirectional but in this case I’ll just set it to a single direction. This way, the interrupt vector on the other side of the buffer is gated to the data bus the moment INTACK’ goes active. The CPU will then go to the ISR after 2 jumps.

Things gets slightly more complicated at the ISR though. The CPU needs to read the keyboard port to see what key user just pressed, that needs some decoding logic, the keyboard controller also needs to respond to CPU’s port read just in time, otherwise the CPU will get some garbage data. To achieve this I’m going to use an additional 2 chips, a 74HC688 8-bit equality detector and a 74HC573 transparent latch. The keyboard controller will latch the keyboard data , and when CPU performs a read, the ‘688 compares the port address and put the latched data onto the data bus.

So to sum it all up again, here is how my keyboard controller will work: The controller receives a byte of the key press, put the data on the latch, enable the latch, pull down INT’, wait for INTACK’, put interrupt vector on the data bus, and pull up INT’ and turn off latch when INTACK’ goes away. The cpu will then try to read from port 0, an IORD’ signal is generated when both IORQ’ and RD’ are low, IORD’ is tied to the OE’ of the latch, which enables it when it’s active, releasing the keyboard data onto the data bus for CPU to read.

For reading PS/2 signal itself I used an Arduino Pro Micro that I have laying around, while the AVR chip in Arduino itself might seem slow and out of date compared to the 32-bit microcontroller today, what’s unbeatable is its incredible communities. There is probably already an Arduino library written for every single thing you can think of, and for PS/2 it’s really a doddle. 2 minutes of google search got me the library, and the rest is just hooking up 2 wires. The Arduino reads the keypress and send out a byte of ASCII code to the main controller, which is a STM32F103 dev board, that manages the interrupt activity. Below is the schematics:

dfgsdfgsdfgsdfg

As you can see there are already 6 chips right off the bat, and it’s just the start. Looks like the I/O board is going to be the most complicated  board in my FAP computer.

Here is the finished board:IMG_1422

I also wrote a short test program for the new I/O board:

Note how my program start from 0x100 now, it’s customary to start Z80 program at that location because there are reset vectors at $0000, $0008, $0010, $0018, $0020, $0028, $0030, $0038, I guess you can put code there if you’re not using RST instruction, but it’s just good practice to leave them alone. My code sets up stack, load interrupt register with 0x0, select interrupt mode 2 and enters an idle loop. When keyboard interrupt comes in the cpu will jump to 0x10, then jump to 0x3000, where the ISR will read a byte from I/O port 0 and print it on screen.

Here is the corresponding STM32 interrupt controller code snippet:

The interrupt controller turns off latch and interrupt on startup, then start listening from the Arduino that decodes PS/2 commands, once it receives a keypress byte it put that byte on lower 8 bits of PORTA, enable the keyboard data latch, then activities Z80’s INT. It then waits for INTACK’ to finish, after which it pulls INT line high again and disables the keyboard data latch.

Does it work? Watch the video:

As it turned out it works surprisingly well, of course that’s not how it looked like the first time round. At first the FAP would print a few characters, and goes back and print from the beginning, it was clock speed depend as well, as it didn’t happen quite as often at slower clock speeds. A bit of debugging later I found that the CPU would randomly jump to 0x66, and since there’s nothing there, it would slide a bunch of NOP’s all the way to 0x100, and start executing the main program from beginning. As it happens 0x66 address is where NMI will jump to if the line is active. It’s the goddamn noises again, and it also looks like the pull up from main stm32 controller is too weak. A few more filtering caps and a dedicated pullup resistor on NMI later, it’s working like a dream.

Next step? WiFi. More specifically, I want to use my FAP as a IRC client to use on Twitch chats, now we have the video card and keyboard, all that’s left is some way to stream IRC data to FAP, which I’ll tackle in the next post. It’s all getting very close now.

Previous Post: FAP says Hello World

Next post: FAP reborn – Backplane

Github repo

FAP says Hello World

Previous Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Next Post: FAP with a Keyboard

 

It has been a week since my last post, when we left the action last time the FAP’s video card was working with double buffering and color attributes, and a test run resulted in this:

Screen Shot 2016-03-15 at 6.23.37 PM copy.jpg

Remember the program was filling up attribute memory with 0x1b, a purple color, and character memory with letter A. However as you can see above, after running the program some character cells did not turn purple, and some others turned purple, but letters did not change to A. This was because the CPU was trying to write to VRAM while VRAM copying was under way, and the operation was ignored by the GPU. To prevent this from happening we need some way to let CPU know that GPU is busy so it can wait a little until the copying is done. I decided to make a few memory mapped virtual register for my GPU, this way the Z80 can ask GPU if it’s busy first, if it is the CPU will wait, otherwise CPU can write to VRAM right away, here is the updated virtual register code:

Basically I just added a single line of code, now when CPU tries to read address 0x92c0 it will get the value of copy_in_progress, which is 1 when VRAM is underway. This way, the CPU first poll this register every time it wants to write to VRAM, and wait if its value is 1, problem solved! Well not quite:

IMG_1367

After running the VRAM-filling program again, there are still a bunch of problem character cells, although much fewer than before. Actually it’s not hard to figure out why: Sometimes the CPU will ask right before VRAM copy starts, so the GPU replies that it’s idle, but when CPU tries to write to VRAM a few clock cycles later the copying will already be underway, and the write gets ignored as a result. We need a way to let CPU know a little while before copy actually starts. I did a dirty little hack by creating another signal that goes active a few scanlines before VBLANK actually starts, this way the CPU will see that GPU is busy a little earlier, so if it happens to try to write before the copy it will have time to do finish the operation. After that, the FAP is finally rendering a beautiful screen full of purple A’s. Time to write a proper “hello world” program:

I created a ‘check’ subroutine which gets called every time before CPU tries to write to VRAM. The program first fill the attribute with yellow, then clear the screen, then print “hello world” at the first row on the screen. Here is how it look like:

Nice isn’t it, it’s the moments like this that keeps me going. A few weeks ago I have no idea how to build a computer, and FAP was just a bunch of 20 year old chips, and now it’s running my program and saying hello to the world like any other proper computers!

Celebration aside though, notice how this hello world program is remarkably slow. I ran it at a slower clock to see the progress, but even at a full 2MHz it still takes like half a second to complete, which is forever in microcomputers. The reason is that the Z80 has to read the gpu_busy register before every single write, which wastes a huge amount of time. A better solution would be let CPU be able to disable GPU copying all together, do the write, then enable copying again. This way the CPU does not have to wait at all, and only has to write to GPU register twice instead of 4800 times. The new copy_enable register is at 0x92c1. Here is the updated code:

I added another if condition so now CPU can read from VRAM too. However most importantly now when CPU writes 0 to 0x92c1, VRAM copy will be disabled. The CPU then can write to VRAM at full speed without interruption. And when it’s done the CPU can enable VRAM copy again and the result will be displayed on the screen on the next frame. I wrote another program to test it.

I put together a “print” function, you put character you want to print in c, attribute in b, index on screen in de, and call it. I first disable the VRAM copy, then clear the screen with char 0 and write “Hello World!” to it, then enable the VRAM copy again. Here is the result:

As you can see this is much faster than the first attempt. The text appears on screen instantly. Looks like disabling copy during bulk VRAM write is the way to go.

However, if you look at the video carefully the supposedly white text appears bit yellow or red. I spent days trying to fix this issue, tweaking the FPGA code and swapping out VRAM chips. In the end I don’t think it’s the VRAM or the copy routine, since the character and attribute are copied together, and all the texts seems fine. I think it’s the noise again, with more than 100 spaghetti wires running around. It might be better than breadboard, but apparently still not good enough. I’ll probably have to design a proper PCB for the video card to see if it gets better. I think I’ll move on for now, I’m just kind of tired with working on the video card right now.

 

Now the video card is working, next step is setting up the keyboard input for FAP. Find out what happens in my next post!

Previous Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Next Post: FAP with a Keyboard

Got My Mojo Working: Character Attributes, VGA board, Double Buffering, and CPU Interface

Previous Post: VGA Character Generator

Next Post: FAP says Hello World

In my last post I designed and built an early stage of FAP’s video card on breadboard,  I decided to use VGA for video output, running 80×30 text mode at 640×480. The VRAM was working, so was the character ROM, and now the board is rendering garbage data in VRAM on start up as characters on screen, all is good.

IMG_1232
garbage data never looked so happy

However, there was still some glaring issues left untouched in the last post, for example the lack of colors. That would be the job of the attribute byte. In my design(and a lot of other early PCs) each text mode character is accompanied by an attribute, which describes what color is the said character, should it be blinking, etc. I ignored it last time to get the character generator working faster, now since it does, it’s time to go back to that.

First a little bit about the VRAM layout, the 80×30 text mode needs 2400 bytes for character, and another 2400 for attribute. Those 4800 bytes will be mapped to between 0x8000 and 0x92c0 in Z80’s memory, first 2400 for char and second half for attribute.

Here is updated verilog file with attribute fetching:

This block is now being clocked by the 50MHz internal clock, and the fetch_attribute alternates between 0 and 1 at each clock cycle. At the clock cycle where fetch_attribute is 0 the FPGA gives the character address to the VRAM, and when fetch_attribute is 1 the attribute address. Then the character is looked up in character ROM, while lower 6 bits of attribute is sent to DAC directly, giving colors. Here is what it looks like:

IMG_1283

However, the fuzziness is back again. I can think of several reasons: the VRAM is now being accessed twice for every pixel, once for character once for attribute. This could mean it might be too slow again. The pixel clock is 25MHz, which means each clock cycle is 40ns, two memory access during this period means each access only has 20ns, and my VRAM is rated 15ns, so it’s very close. The faster speed also generates more noise, especially on the breadboard. Lastly, the monitor’s auto optimization seems a bit off on the signal of my video card. I bought some 12ns IS61C256 chips, it’s only 3 ns faster, but in FAP, every nanosecond counts.

IMG_1246
Every nanosecond counts

I put it on, and it looks a little bit better, but not by much, since there is still the noisy breadboard. I’ll have to wait until I build the thing on the stripboard.

Now that the video card is mostly working by itself, we come to the issue of how to interface it with the CPU. As you know the VRAM holds the characters and attributes that will be rendered on screen, however that requires CPU writing something into it in the first place, but the VRAM is being read by the video card most of the time so the only time that CPU have the chance of writing into it is one of the blanking periods, namely Horizontal Blanking Interval and Vertical Blanking Interval. HBLANK happens every scanline but its duration is extremely short, only around 6us if I remember correctly, our 1970s Z80 won’t have enough time to push a lot of useful things into VRAM during that kind of time, and even if it does it will probably cause screen tearing or visual artifacts, since it change the content of a single pixel line instead of an entire frame. So that leaves us with VBLANK, that lasts around 1.6ms every single frame, which is plenty of time, but that means CPU will have to spend 90% of its time doing nothing but waiting for VBLANK, which is extremely wasteful, that is what Quinn chose to do with her Veronica. One way to get around this is to use a Dual Port RAM which allows write and read at the same time, but that stuff is pretty hard to find. I decided to use my existing parts and implement the tried-and-true method of freeing CPU from waiting for the beam: Double Buffering.

The principle of double buffering is actually pretty simple: two VRAMs, called back and front VRAM, are used. The CPU writes to back VRAM, and during the VBLANK period the content of back VRAM is copied to the front VRAM and subsequently rendered. This way CPU can write to VRAM at anytime it wants, apart from a miniscule amount of time during copying. Screen tearing and artifacts are also eliminated since the entire frame is being modified. The only possible downsides are probably cost and complexity, but I think it would be worth it in the end, when I won’t need to race the beam while writing my programs.

However, at this stage the noise problem on the breadboard is pretty obvious now, and adding another VRAM and 28 wires will only make it worse, so I decided to just build it on the board.

And to make sure I don’t make wiring mistakes I decided to start again and retest everything from the beginning again, firstly just the FPGA and DAC, displaying color bars.

IMG_1261

IMG_1263
Color bars on the FPGA board

Good, that works. Next step is hooking up one VRAM and try rendering the garbage data again.

IMG_1282

IMG_1297.jpg

Notice how all the noises are gone, good stuff, a lot of U’s for some reason.

Coming up next is the main course of this post: double buffering.

The module waits for the vblank interval, and when it arrives it enables the read of back VRAM and write of the front VRAM, then starts a counter that goes to 4800, which doubles as the address for both front and back VRAM. This way the content of back VRAM is copied to the front as counter increments. The copy_in_progress signal is made available so that other modules as well as the CPU can know if a buffer copy is under way. The total copy time is around 192us, this means instead of having to wait 90% of the time to write to the VRAM, the CPU now only have to wait 1.1% of the time instead. A pretty big improvement.

I added another slot on FAP’s backplane and plugged the video card in, it’s practically indistinguishable from a GTX980 to be honest.

IMG_1317.jpg

Until now the video card has been working by itself, copying buffers and rendering characters. However sooner or later CPU will have to be able to control it. There are some ways that CPU can talk to the GPU, certain computers map their VRAM directly to the addressable memory (NES and Gameboy comes to mind), while others chose to not expose the VRAM and instead have memory mapped GPU registers, MOS Technology 8568 and VIC-II belong to this kind. For FAP, I decided to use both. VRAM will be mapped to between 0x8000 and 0x92c0, while virtual registers somewhere after that. This way I can easily test something by writing directly into VRAM, while I can also ask GPU to do heavylifting operations like text scrolling, it’s the best of both words.

For now though, it’s just the write-only VRAM mapping, here is the code:

This simple module checks if the MREQ and WR is low and address is in the correct range, and connects the CPU bus to back VRAM if it does. I’ll implement VRAM read and virtual registers later, I can already print some pretty text with just write for now.

Time to write a Z80 program to try it out, if I start to write a value from the back of the VRAM to front, the screen should first change color, since addribute bytes are at the back of the VRAM, once attribute bytes have been filled, text will start appearing as the value fills up the character portion of the VRAM. Here is the test program:

The program first fills the bottom half of the VRAM with 0x1b, binary 011011, the first 01 goes to red DAC, middle 10 goes to green DAC, and last two bit 11 goes to blue DAC, the result should be a violet color. After filling up attribute the program then fills the first 2400 bytes of VRAM with 65, ASCII of letter A. So we should expect the garbage text on screen first turn violet, then fill up with letter A. Here is what happens:

Well it’s mostly what should be happening, however, some characters didn’t turn violet, and there are few missed A’s here and there. This was because CPU was trying to write during the VRAM copy operation and was ignored. My next step would be implementing a few virtual registers so CPU can check if GPU is busy before trying to write to VRAM. But anyhow, progress is progress.

There was actually a extremely frustrating backstory that I didn’t mention, after trying to get the above code work at around 500Hz, I tried to bump up the processor speed to around 500KHz and it simply would not work. Sometimes it enters the end loop early, sometimes it ignores the loop entirely and continues past the end of the program. I thought there was a loose connection or weak short somewhere along the bus but after inspecting and cleaning the board the problem still happens from time to time. I eventually let my STM32 controller print out the address and data content after each clock cycle, here is one of the executing traces:

Screen Shot 2016-03-10 at 5.49.52 AM.png

As you can see address bus is missing bits, 0x8960 turned into 0x8160, and jumping to 0xb turned into jumping to 0x3, that’s why my program was going all over the place. The culprit? Noise. After adding a couple of caps bewteen VCC and GND of the Z80 and memory chip, the problem went away. However I did spent a stupidly long amount of time trying to nail it down. It’s one of the things I should have known better, oh well.

Anyway, FAP’s video card is halfway working now. I’ll finish up the GPU virtual register in the next post, and have my FAP finally say “hello world” like all the great computer did when they were born.

Previous Post: VGA Character Generator

Next Post: FAP says Hello World

Putting the F in FAP: VGA Character Generator

Previous Post: Programming FAP

Next Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

As I mentioned in my first post, Steve Ciarcia, in his 1981 book Build Your Own Z80 Computer, called his computer ZAP as it stands for Z80 Application Processor. Since I was planning to use FPGA as a part of my own Z80 computer, it’s only natural to name mine FPGA Assisted Processor, or FAP in short. And now after getting CPU to work, adding memory, and programming the processor, the time has finally come for me to put the F in FAP and starting designing the video interface of my computer.

Looking back at the history of personal computers, it’s not hard to see that the graphical capability is one of the most important aspects of determining the popularity of a said computer, everything was hooked up to a monitor or TV, and no one wanted to have a computer that only communicates through a row of 16 LED lights. And it’s the same case with FAP. Right now it has no means of input/output whatsoever, the only way to see if a program is executing correctly is halting the CPU and examining the RAM content. My plan is to adding keyboard as input, and video out as output. As for the video, there are a number of standards to choose from. There’s RF, and composite video, then the slightly more modern VGA, and after that comes DVI, DisplayPort, HDMI and all the HD standards. I picked VGA because it’s a rather straightforward interface, and it is still being supported by a lot of monitors. I need a horizontal sync pulse, a vertical sync pulse, and 3 analog color signals between 0 and 0.7V. A pixel clock pushes out pixels at each line from left to right, at the end of the line is the HSYNC to tell monitor to move down to the next line, and VSYNC is generated when the last line has reached, signalling monitor to start a new frame. Standard resolution of VGA is 640×480, however, there are additional pixels and lines called front and back porch, those are not rendered on screen and were used to allow time for electron beam to move to the next line/frame. Those adds another 100 or so pixels on each line, and another 40 new lines. The porch also gives time to update video buffer.

When it comes to actually designing the video interface, you have to keep in mind that Z80’s resources is actually rather limited. If we are to drive the VGA at 640×480, we need to put what each pixel is displaying somewhere in a memory, and that is called a Video RAM. For 640×480 we’ll need 307200 bits just to store a 1-bit black and white image, and if we want colors, it will use more than 300K of memory if I use one byte for each pixel, and Z80 can only natively address 64K of memory. What’s more, the processor will have to update more than 300K dots at every single frame, which is way too slow for the 1970s CPU. Since most of what was to be displayed was text anyway, a lot of person computers at the time implemented what called the Text Mode. Instead of individual pixels, the screen is divided in to a number of text cells, each one containing a character, and with character ROM and some special circuits, a text screen can be rendered much faster, and the content of a screen can be manipulated easier too. And most importantly, this method saves a lot of memory, a 80×30 text mode screen in 640×480 only needs 2400 bytes, compared to 300K in bitmap mode. If I want color/underline/blinking/etc I could use an additional byte for those functions, this is often called the attribute. Therefore two bytes are often used in VRAM in text mode, one byte to specify what character it is, and another to specify the color and a number of other attributes of the said character.

I’m using FPGA for the VGA video controller, since it’s much faster than bigbanging VGA with a microcontroller. A class I took a few years ago used a FPGA board for a few projects, but I have forgotten most it by now, that class was also in VHDL, while a lot of the resources online are in Verilog, so this is basically starting anew for me. I used Mojo V3 with Spartan 6 chip. I like it because it’s open source, has a lot of pins, cheap, and it uses a ATmega32u4 microcontroller to configure the FPGA so I don’t need to spend more to buy a programmer, most importantly though it doesn’t have random peripherals that I don’t need taking up pins, it’s just a simple, clean, minimalist board with all the pins broken out and nothing else, just how I like it.

mojov3front_2
The Mojo V3 FPGA Board, image source embeddedmicro.com

Here is FAP’s video spec that I planned: 640×480 resolution, 80×30 text mode with 8×16 monospace font, 64 colors, 2.4K character RAM and 2.4K attribute RAM. It’s rather basic, but I can always add more fancy stuff later.

I wanted to start with something simple like displaying some patterns, and then go from there. Fortunately for me VGA signal generation is one of the most common tasks for FPGA, and there are tons of resources online. I used some code of this rather excellent example, it simply displays some color bars on screen. I tweaked the code so it uses 2-bit color instead of 3, giving the video card 4*4*4 = 64 colors. The color outputs are digital, with 6 bits in total, which means I need to build a simple DAC to convert it to analog signal bewteen 0 and 0.7V, a R-2R DAC is enough, I used 1K-2K for the job.

IMG_1157
R-2R DAC for color output

With everything hooked up, I uploaded the code and hooked it up to an old LCD monitor that I picked up for $7 just for this project.

IMG_1154

Voilà, Color bars! However, the colors in question looks extremely dark, despite the monitor on full brightness. I tried another monitor and it was still the same, after measuring the output of DAC it turns out it’s only putting out 0.09V instead of 0.7V at full intensity, it looks like 1K-2K DAC can’t provide enough current to drive VGA inputs, which I should have known. I tried to use an op-amp buffer, but it was too slow for the 25MHz pixel clock. In the end I just used smaller value resistors for the DAC, I used 150 and 330 ohm, it’s not exactly double but it’ll have to do, and the peak output with VGA connected is 0.68V, pretty close to 0.7V in the specification. With the new DAC, the image is back to full brightness.

IMG_1155
Proper brightness with the new DAC

Eagle-eyed viewers might have spotted the color bars have changed colors, I did it to see the DAC performance, in 00, 01, 10, 11 increments, first only red, then all channels. Well it looks exactly like how it should be like, reds are red, and greys are grey without color casts.

Then it’s time to tackle the details of the text rendering. First I need two counters to know which pixel I’m at, one for each axis, hpos[9:0] is the horizontal counter that goes from 0 to 639, and vpos[9:0] is the vertical counter that goes from 0 to 479. As the controller draws each pixel I also need to know which pixel in the 8*16 font I am at, this is simply the lower 3/4 bit of the hpos/vpos counter. While the upper 7/6 bits of hpos/vpos is the index of the current character being rendered in the 80×30 grid. The controller will fetch a byte from character RAM, which contains the character to render, as well as an attribute byte to see what color that character is. For start I’ll ignore attribute for now, and the memory location of each character would be 80 * vchar + hchar, where vchar and hchar is the coordinate of the letter. And with the memory address, the video controller can give it to character RAM and get a byte of data back to render. I changed the character rendering code to see if it works.

As you can see it calculates a memory address vram_addr based on the pixel counter, and feeds the lower 6 bits of data to the VGA output, I hooked up a leftover SRAM, and here’s what greeted me when I ran the code:

IMG_1158

Beautiful isn’t it? You can see each character cell, where letters will go in, only right now it’s just random colors from junk data inside the SRAM.  Say I want to display “A” in a character cell, the content of memory of that cell will be 65, the ASCII code of A. Right now the VGA controller gets the data of 65 and just put its lower 6 bit out as color, next step is making a character ROM, so the VGA controller will look up 65 in it, and see which pixel it should render to put the letter A on screen instead of just a block of color.

And what do you know? Open source community to the rescue again! I found the exactly thing I need on github, a 8*16 code page 437 font. After dropping it in and giving it column and row data, I got this:

IMG_1171

We actually got text! Instead of color blobs the VGA controller is rendering characters from garbage data in the SRAM. The characters do look somewhat fuzzy though, and some characters are not rendered properly. To compare with a known example, I filled up a EEPROM with 0 to 255 and dropped it in, it should look like this:

codepage-437
image source wikipedia

And here is what I got:IMG_1177.jpg

It looks even worse than before! If you look really closely you can see it follows the pattern of the code page 437, but every single character is mangled, sometimes beyond recognition. I first thought it was a breadboard noise problem, as I’m running 25MHz on it, and Quinn had this issue while designing hers. However looking closer I can tell each character are mangled in the exactly the same way, and they don’t change when I wiggle some wires, noise tends to be random, so it is not the case. I then thought something might be wrong with the character ROM code, but that seemed unlikely. I then thought it was my crappy monitor, but it looks the same on a better one I tried. Drunk and out of ideas, I though I hooked my wires wrong and basically started messing around with the SRAM data lines, while doing so I found something interesting, this is what the screen looks like while hooking up the 2nd data line directly to the address line:

IMG_1199

Happy faces! And most importantly it looks perfect! So it’s not noise nor character ROM problem. I then hooked up the same wire to the data output of the EEPROM:

IMG_1200

It looks like the faces are shifted to the left, looks like a timing problem to me. Time to break out the logic analyzer.

Screen Shot 2016-03-05 at 8.14.00 AM

Channel 4 is pixel clock, channel 0 is address, channel 2 is data, here you can see data lags behind address for a whopping 140ns, that is more than 4 pixel clocks, and that’s why those faces looks shifted 4 pixels, it turned out my SRAM and EEPROM is too slow, who would have thought that? The AT28C256-15PU I have is rated 150ns, so it’s pretty close, the SRAM I have is rated 50ns, which is still not fast enough, that’s why the image with SRAM is still fuzzy, but not as bad as the one with EEPROM.

Time to buy some new parts, IS61c256 looks promising, it’s the same 32K SRAM but with only 10 – 15ns of delay, which is 14 times faster than the one I’m using. It’s in SOJ package though, so I ought a fee breakout board too, I also picked up a newer Z80 CPU to replace the current one I’m using, this one is CMOS so it uses less power and can run up to 8MHz.

IMG_1226.jpg
New parts

Soldered the new SRAM, hope I didn’t burn it out.

IMG_1229.jpg

And after plugging it in, here’s how it looks like:

IMG_1232.jpg

The speed does make a difference! All the character are sharply rendered now, it’s still displaying garbage because that’s what in the SRAM and the time, but the system is working.

Now that the text rendering is working, next part is the interface between the processor and the VGA controller. I’m so drunk and tired now so I’m just going to end it here, stay tuned for the next post!

 

Previous Post: Programming FAP

Next Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Programming FAP

Previous Post: Memories of FAP

Next Post: VGA Character Generator

 

Project Github Repo

 

It has been a few days since I last wrote about FAP, it’s not that I have given up on it, although I did cool off a bit and spent more time on other stuff. I’ve been developing a video card for FAP, but it’s not finished yet so I’ll leave it to another post. This time, it’s all about writing programs for FAP to execute.

With 16KB of RAM and 16KB of ROM, FAP is ready to function like any other proper computers! Although without any means of input or output, it’s not entirely useful at this stage, but I’m not going to let that stop me from at least making it execute a simple program to see if it actually works.

However, before doing that, some more housekeeping to do. I need a streamlined way to write an assembly program, have it assembled and uploaded to FAP and have it executed. I began by adding a “programming mode”, which is entered by default upon powering up. In this mode the STM32 controller kicks the CPU off the bus, takes control of the memory and listens on the serial input, I can then issue read or write commands via bluetooth serial and have it program the EEPROM. When I actually want FAP to start executing the program, I reset the STM32 controller while holding down the second button, and it will enter execute mode and start running the program, easy. While I was at it, I also added another button as well as labeling them. From left to right are reset, single clock, single instruction, and full speed start/stop.

IMG_1210.jpg
The new program mode, and updated buttons.

 

Here is a code snippet of the program mode function, in the loop it checks if there is a command in the UART receive buffer, and execute it if there is. The command could be read, write, or zeroing the EEPROM.

With that done, time to focus on the PC side. I need a Z80 cross assembler, debugger, and some way to upload my program. The assembler I picked is zmac, a long lived and feature rich cross assembler, and open source too. It comes with a windows executable but you can compile it from source yourself. The debugger is z80sim, I downloaded it but haven’t got a chance to use it because all my programs are really simple right now. Anyway, that only leaves the matter of uploading the program. I wrote a simple and rather crappy python script for it:

It basically go through each byte in a binary file and send write commands for each, while making sure it has been written properly. It’s pretty ugly but it works.

Now we have everything set up, time to write a proper program, and what a program it is!

The program starts at address 0x0, and just jump to itself endlessly. The assembled program is 3 bytes: C3 00 00, C3 is JMP instruction, and 00 is the address its supposed to jump to. If everything goes well, we should see the address bus go in a loop forever.

 

Here you can see me uploading the program wirelessly to FAP via bluetooth, reset the STM32 controller while holding the second button to enter execute mode, and then a few single clock steps. You can see the address went from 0x0 to 0x1, then 0x2, and then back to 0x0 again, also notice the C3 and 00 on the data bus as Z80 fetches instructions.  After that I let it run at full speed and you can see the address looping. Success!

Now that FAP is going, I wrote a slightly more complicated program for it to test writing to SRAM:

After clearing A and setting up stack pointer, it adds 0x10 to A, then push the content of AF to stack every loop. If everything goes well I can let it run for a while, reset the STM32 controller to go into program mode, and read back values in SRAM, and it should be 0, 10, 20, 30 etc. This program is also a whooping 10 bytes, so it must be at least 3 times better than the last one, right?

I upload the program, let it run for a while, and read back the beginning of the stack. You can see the address bus looping, and the SRAM address flashing by in each loop. Now let’s examine the content:

0x7fff: 0x0
0x7ffe: 0x10
0x7ffd: 0x0
0x7ffc: 0x20
0x7ffb: 0x20
0x7ffa: 0x30
0x7ff9: 0x20
0x7ff8: 0x40
0x7ff7: 0x0
0x7ff6: 0x50
0x7ff5: 0x0
0x7ff4: 0x60
0x7ff3: 0x20
0x7ff2: 0x70
0x7ff1: 0x20

At first glance it might look like I cocked it up, since it’s not strictly increasing as the code suggests. However, push instruction pushes both A and F to stack, F first then A, so the value of A is at even addresses, i.e. 0x7ffe, 0x7ffc, 0x7ffa etc.  And if you look at the numbers you can see they are indeed increasing. The odd addresses are contents of flag register, it should be 0 for the first few loops, but some of them are 0x20. 0x20 is flag bit 5 set, which is actually a undocumented flag that is “a copy of bit 5”, which is exactly what happened here, so after all, everything is alright.

And there you have it! FAP, built from a 8 ancient chips and spaghetti of wires, actually  running proper programs that I wrote for it. It’s a big milestone in this build, and as you expect I’m thrilled by it. However, I’m going to put the F in FAP as I develop a VGA graphics card for the Z80 computer in the next post, stay tuned!

Previous Post: Memories of FAP

Next Post: VGA Character Generator