dekuNukem's Log

We have finally come to the big one, the new I/O card. This is the one I designed completely from scratch without influence of the half-assed old I/O card, so I’m going to go in details in this article.

So far, we have the backplane, CPU, memory, and video card. I can program FAP to display some messages on the screen but there is no way to do anything else because no input methods are available, and as a result FAP can’t communicate with the outside world. It needs some way of input/output and that’s why I’m designing the I/O card.

A quick recap of the the previous article about FAP I/O: Z80 uses in/out instruction for port read/write. 256 ports are available and port address is the lower 8 bit of the address bus when in/out is executed. A port is read when both IORQ and RD signal goes low, and written when both IORQ and WR goes low. I’m also using interrupt instead of polling, Z80 has 3 interrupt modes, mode 0 is 8080 compatible mode that no one uses, mode 1 just jumps to 0x38, and mode 2 makes your head explode the first time you learn about it but is the most powerful, and is what my I/O board going to use.

The last time I worked on the I/O board I used an arduino just for reading PS/2 keyboard, an STM32F103 as the main I/O interrupt controller, and a bunch of 74 chips for the interface between STM32 and Z80 bus. The slap-together had 2 I/O ports and 2 interrupt vectors, but even with just that it accumulated 9 74 series chips, just look at this mess:

Notice the piggybacking everywhere and the general messiness, if I were to continue to design the new I/O board this way it will end up a horrendous board to route, no flexibility at all, high power consumption, possible noise problems, and one single bug means making a entire new board.

So instead I’m going to do it properly and go the modern route: Using a CPLD. CPLD stands for Complex Programmable Logic Device, like FPGA it has logic that can be configured via HDL, but unlike FPGA it has a built-in non-volatile configuration memory so it works right away when powered up. CPLD is also cheaper and less complex than FPGA, often only have hundreds of gate-equivalents instead of millions in the FPGA, as a result CPLD is often used as glue logic instead of an active device. Personally I feel that FPGA is mainly for high speed high bandwidth active applications, and CPLD is basically a replacement for 74 series chips.

And replace 74 series chips it will. The CPLD I picked is Altera EPM570. It’s a slightly older part, but has plenty of pins and is cheap, there is a even cheaper version EPM240, exactly same but with less logic cells. Because EPM570 has 144 pins, the plan is just connect everything on the bus to the CPLD and figure it out in HDL. As for the I/O controller, I’m using a STM32F051C8T6 this time.

I went all out on the STM32 side, adding as much peripherals as possible, in the end I have SD card, I2C EEPROM, PS/2 Keyboard, RTC, ESP8266, and a general purpose UART header all hooked up to the STM32. It might be a bit overkill but I guess it’s better to have them just in case than not having them at all.

The STM32 will talk to the CPLD via a mini-bus with 4 bit of address, 8 bit of data, and a handful of control signals. While the CPLD contains all the glue logic to interact with the Z80 bus. Below is the diagram of the I/O board structure.

Screen Shot 2016-12-26 at 01.19.52.png

To program the CPLD I bought a cheap chinese knockoff programmer which seems work well enough, also needed is Altera Quartus Prime Lite, which is free. I’m going to use schematic capture for this instead of connecting everything in Verilog because I feel that being able to see and edit a schematic is more intuitive in this case. But first I need to write a couple of components that I’m going to use inside CPLD, chief among which is the 74HC573 8-bit transparent latch, a simple matter of 30 lines of code:

	module dlatch8(
	input wire [7:0] data,
	input wire LE_H,
	input wire OE_L,
	output reg [7:0] q
	);

	reg [7:0] q_internal;

	// activates when LE_H or OE_L changes
	always @ (LE_H or OE_L)

	// output active, load inactive
	if (OE_L == 0 && LE_H == 0) begin
	// save the data to internal latch
	q <= q_internal;
	end

	// output active, load active
	else if (OE_L == 0 && LE_H == 1) begin
	// save the data to internal latch, and mirror it on the output
	q_internal <= data;
	q <= q_internal;
	end

	// output and load both inactive
	else if (OE_L == 1 && LE_H == 0) begin
	// do nothing, output high impedance
	q <= 8'bzzzzzzzz;
	end

	// output inactive, load active
	else begin
	// load internal latch while output high impedance
	q <= 8'bzzzzzzzz;
	q_internal <= data;
	end

	endmodule

view raw dlatch8.v hosted with ❤ by GitHub

Another one we’re going to use is a 4-to-16 line decoder:

	module decoder_4to16 (
	input enable,
	input [3:0] binary_in,
	output reg [15:0] decoder_out
	);

	always @ (enable or binary_in)
	begin
	if (enable) begin
	case (binary_in)
	4'h0 : decoder_out = 16'h0001;
	4'h1 : decoder_out = 16'h0002;
	4'h2 : decoder_out = 16'h0004;
	4'h3 : decoder_out = 16'h0008;
	4'h4 : decoder_out = 16'h0010;
	4'h5 : decoder_out = 16'h0020;
	4'h6 : decoder_out = 16'h0040;
	4'h7 : decoder_out = 16'h0080;
	4'h8 : decoder_out = 16'h0100;
	4'h9 : decoder_out = 16'h0200;
	4'hA : decoder_out = 16'h0400;
	4'hB : decoder_out = 16'h0800;
	4'hC : decoder_out = 16'h1000;
	4'hD : decoder_out = 16'h2000;
	4'hE : decoder_out = 16'h4000;
	4'hF : decoder_out = 16'h8000;
	default : decoder_out = 0;
	endcase
	end
	else begin
	decoder_out = 0;
	end
	end

	endmodule

view raw decoder_4to16.v hosted with ❤ by GitHub

And with that, we can start laying out our glue logic inside the CPLD. Instead laying down physical 74 chips, we can just do it in software and watch the magic happen. This I/O board needs to handle interrupts, port write, and port read. So let’s start with the first one:

While it might look complicated at first, this is actually not that bad. The centrepiece is just a 8-bit latch. When the STM32 controller wants to start a Z80 interrupt, it first puts the interrupt vector on the STM32-CPLD data bus, then activates INTVECT_LOAD signal, this loads the vector into the latch. Then STM32 pulls down the INT line on the Z80 to start the interrupt, the Z80 will acknowledge the interrupt at the beginning of the next instruction, where both IORQ and M1 goes low. They are OR’ed together to give INTACK, which then activates the output enable of the latch, putting stored interrupt vector onto the CPU bus, which Z80 then combines with I register and jumps to that interrupt vector address. The INTACK also generates an interrupt on the STM32, upon which deactivates the INT line. Here is the STM32’s code snippet:

	void interrupt_activate(uint8_t vector)
	{
	// put the interrupt vector on STM32-CPLD bus
	CPLD_DATA_PORT->ODR &= 0xff00;
	CPLD_DATA_PORT->ODR \|= vector;
	// load the interrupt vector latch inside CPLD
	data_output();
	vect_load_activate();
	vect_load_deactivate();
	data_input();
	// start the Z80 interrupt
	HAL_GPIO_WritePin(Z80_INT_GPIO_Port, Z80_INT_Pin, GPIO_PIN_RESET);
	}

	// ...meanwhile in STM32 pin change ISR
	void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
	{
	// INTACK interrupt
	if(GPIO_Pin == INTACK_Pin)
	HAL_GPIO_WritePin(Z80_INT_GPIO_Port, Z80_INT_Pin, GPIO_PIN_SET); // turn off Z80 interrupt
	}

view raw FAP_interrupt.c hosted with ❤ by GitHub

With that out of the way, next up is port write. It uses 2 transparent latches and it’s actually extremely similar as the interrupt above, only the data is loaded from Z80 side. When Z80 wants to write to a port, both IORW and WR goes low, which is OR’ed to give IOWR signal. When IOWR is low, there is valid port address and data on the CPU bus, so I made IOWR simply load the address and data into the corresponding latches. IOWR also fires an interrupt on the STM32, who will then activate the latch1 signal and read the data and address off the latches and process them. Below is the diagram and code snippet.

	// STM32 pin change interrupt handler
	void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
	{
	// IOWR interrupt
	if(GPIO_Pin == IOWR_Pin)
	{
	// switch to input
	data_input();
	addr_input();
	// enable address and data latch outputs
	latch1_activate();
	// process port address and data
	vport_raw = CPLD_DATA_PORT->IDR;
	vport_addr = (uint8_t)((vport_raw & 0xf00) >> 8);
	vport_data = (uint8_t)(vport_raw & 0xff);
	latch1_deactivate();
	}
	}

view raw FAP_port_write.c hosted with ❤ by GitHub

Next up is port read, this is the most complex of the three because of the timing constraints. Let’s look back at port write first: when CPU does a port write, the port address and data are loaded into two transparent latches, while STM32 is notified at the same time. So even if STM32 bogged down for some reason the port address and data will still be available in the latch, nothing is lost.

There is no such luxury in the case of port read, when Z80 wants to read a port, both IORQ and RD goes down, OR’ed together to give IORD. Port data must be valid the moment IORD goes active, otherwise the CPU gets garbage data. That means I can’t make STM32 respond to IORD as an interrupt since it would already be too late. So as you probably have guessed, transparent latches to the rescue again.

This is much more complicated than the first two so bear with me: There are 16 transparent latches, corresponding to 16 available Z80 ports. When Z80 wants to read one of these ports, a valid port address will be present on the CPU bus, then the IORD signal goes active. When that happens a 4-to-16 decoder is activated and selects the corresponding latch out of the 16 based on the address on the CPU bus, this select signal along with IORD itself enables the output of that latch, putting its content onto the data bus instantly.

Of course we need be able to write data into those latches first for CPU to read. When STM32 wants to load data into a latch, it puts the address and data onto the STM32-CPLD bus, then activates the LATCH16 signal. This enables another 4-to-16 decoder inside CPLD that select the corresponding latch. The select signal together with LATCH16 signal loads the data into said latch, which will then be available for CPU to read. The STM32 code is actually pretty simple:

	// loads a port for CPU to read
	void load16(uint8_t address, uint8_t data)
	{
	// put the address and data onto STM32-CPLD bus
	uint16_t value = 0;
	value = (address & 0xf) << 8;
	value = value \| data;
	CPLD_DATA_PORT->ODR &= 0xf000;
	CPLD_DATA_PORT->ODR \|= value;
	data_output();
	addr_output();
	// load the latch
	latch16_activate();
	latch16_deactivate();
	data_input();
	addr_input();
	}

view raw FAP_port_read.c hosted with ❤ by GitHub

With all those out of the way, here is the finished board:

Again, I made some mistakes in the rush to get the board made, the footprint of PS/2 connector is flipped, the JTAG connector pinout is all wrong, and somehow I missed the TDO line. Those are all fixed in the repo. And with just one jumper wire, the board functions perfectly.

Previous article: CPU board, memory board, video card.

Github repo

FAP reborn: CPU board, memory board, video card.

Previous post: FAP reborn – Backplane

Next post: the new I/O card

Github repo

With backplane finished, next step is to design a number of modules that plug into it. It would be the same as the old FAP, consisting of CPU board, memory board, video card, and I/O board. In this post I’ll get the first three out of the way, since they are have not changed much.

And because they are still largely the same, albeit laid out nicely on a PCB, I’m not going into details about how they work in this article, check out the older entries for those details.

FAP reborn – Backplane

Previous Post: FAP with a Keyboard

Next post: CPU board, memory board, video card

Github repo

Can’t believe it has been more than 9 months since my last update. Work picked up, and although I was still working on FAP on and off, I just didn’t bother update the blog until I made some progress.

Last time I was here I just finished the I/O board. The mode 2 interrupt and port read was working, keyboard was working, so did the serial input. So I started coding some simple print routines in assembly, stuff like putc and puts. It quickly turned out that my programs would only work intermittently, sometimes goes into NMI for no reason, sometimes resetting by itself, sometimes jumps to the wrong address. And faster the clock speed, more often those happen. It was getting extremely frustrating since when it goes haywire I wouldn’t know if it’s a bug in my program or just FAP being unhappy.

As you probably have guessed, It was the noise problem from having no ground planes, hundreds of wires bunched together cross talking like crazy, and maybe even one or two cold solder joints. I thought I could get away with it but I should have known better, just look at this spaghetti shitshow running at 8MHz, it’s amazing it worked at all.

Clearly, in order for FAP to work, I had to ditch hand-assembly and go full PCB. This way, it will look much more professional, and I don’t have to cut, strip and solder miles of red wires. That’s exactly what I did. And now, after 9 months of hiatus, FAP rises again, stronger than ever.

First things first, I need to design a brand new backplane. The original one is basically a stripboard with 4 dozens wires soldered from an STM32 dev board. Since I’m doing it again, I’m going to do the whole thing properly. The uC used on the old FAP was an STM32F103VCT6, 3 problem with it: it’s a 100pin part which is too much for this application, it’s a relatively old member of the STM32 family, missing a lot of nice features, and there is a 32KB code size limit in uVision5. So back to the parts pin it goes, instead I’ll using STM32F072R8T6. It’s a F0 so no code size limit in uVision, it has a lot of new features that F1 doesn’t (32-bit timer, build-in USB pull-up, swappable RX/TX, just to name a few), it’s a 64pin so it’s easier to solder, and in the end it’s cheaper too.

Now comes the problem with the size of the backplane. The board size limit of EAGLE free version is only 8cm x 10cm so that won’t do. Luckily EAGLE offers a educational version, free as well if you have a .edu email, that has a 10cm x 16cm limit. That’s what I ordered and as a result 10cm x 16cm will be the dimension of the FAP backplane.

Next part to reconsider is the bus connector, in the old FAP it was a double row female pin header with 2 rows of 40 pins, so in theory it should have 80 signals. But because it was on a strip board the two rows were connected, only 40 signals were actually available. And because I had to cram most of the CPU signals on there, few control signals had to be omitted and only one pin was used for GND, resulting in the non-existent noise immunity. This time I decided to still use double row female pin header, this time 38 pins wide because of the board size. But because it’s on PCB I have all 76 signals so every single Z80 signal is on the bus this time. Here is the pinout of FAP’s new bus connector.

screen-shot-2016-12-23-at-02-24-15

As you can see most of the first row is GND to reduce noise, the clock signal is surrounded by 3V3 to for that reason too. The signals is arranged in the form of control outputs, control inputs, data, and address.

With those problems out of the way, here is the finished design of FAP’s new backplane:

5 bus connecters spaced 2cm apart, the microcontroller on the right side of the board, a USB connector provides both power and communication, and a 3.3V regulator and 6 buttons round up the design. I’m also putting the LCD above the uC instead of letting it dangle off the side of board.

Here is the assembled board, much neater than the old one, and hopefully a lot less noisy too.

8qytgad

The LCD is secured via 4 mounting holes, and the microcontroller hides below it.

bp13zbk

The backplane firmware needed an update too. On the old backplane communication was via a serial port, this time we’re using USB, which is much faster. But overall not much has changed. You can find the up-to-date resources on the Github repo of this project.

Previous Post: FAP with a Keyboard

Next post: CPU board, memory board, video card

Github repo

FAP with a Keyboard

Previous Post: FAP says Hello World

Next post: FAP reborn – Backplane

Github repo

Now that FAP’s video card is finished, it’s time to move on. Although we’ve come so far, FAP is still missing something crucial in order to be called a proper computer, it still needs at least one input device. That will be the keyboard. More specifically, a PS/2 keyboard.

Using PS/2 keyboard for retro computers is nothing new thanks to its simple protocol and wide availability. It uses a simple synchronous serial interface, one clock line and one data line, and the documentation is all over the internet so I’m not going to bother to explain here again. Actually I didn’t bother to read much about it at all because of a reason you’ll see soon enough.

Before we start working on keyboard though, we need to first figure out some way for keyboard to talk to the CPU. Generally there are two means of communicating with peripherals, memory mapped I/O or port I/O. In memory-mapped I/O peripherals’ registers are mapped to the memory address space, and CPU writes or reads its memory address to control the peripheral. I used this method for FAP’s video card. For the keyboard I decided to use port I/O, since it’s usually used with external peripherals, and I feel it’s a good exercise to try a little bit of everything with Z80, which is why I’m doing this in the first place. The Z80 supports 256 ports, when the user calls in or out instruction it places the lower 8 bit of the register on to the address bus, and activates IORQ and one of the RD or WR lines to read or write a byte from that port.

With the I/O method sorted, next issue is how to let CPU know the keyboard events. One way is polling, in which the CPU constantly asks keyboard if it has something. This approach is simple but extremely wasteful, since CPU will spent most of its time reading from keyboard instead of doing actual work. A much better way is using interrupts. As its name suggests, when there is actually data from keyboard, the CPU will be interrupted from its work, it then can store the keystroke into a buffer and do something with it later. This way the CPU does not waste anytime polling the keyboard, and only acts when there is an actual key press, and that is what I’m going to use today.

The Z80 has a (seen from today)somewhat rudimentary but still rather clever interrupt system. There are two interrupt pins on the package, INT and NMI, the former is for regular interrupt that can be disabled in software, the latter is non-maskable interrupt and always occurs when the line goes active. For the regular interrupt, 3 interrupt modes are provided, mode 0 is a 8080-compatible mode where when interrupted, you have to put some instruction on the data bus for the CPU to execute. It’s basically witchcraft so I’m not going to use it. Mode 1 is much simpler, it just jumps to address 0x38 when interrupted. This makes designing simple systems very easy since you can just put your interrupt handler there, or a jump instruction to jump to somewhere else if you need more space. However, if in mode 1 there are more than one interrupting devices, the CPU will have to figure out who is the one that initialized the interrupt, which gets complicated real fast.

That’s where interrupt mode 2 comes into play. I’m going to list how mode 2 works off the top of my head since I worked with it for so long, it’s going to be a mouthful and I hope I get it right: When interrupt fires, the interrupting device puts a byte onto the data bus, the CPU then combines the 8 bit in the interrupt vector base register and the 8 bit on the data bus to form a 16 bit address, jumps to that, read a 16-bit word at that address, which is the interrupt service routine address, then make another jump to that. For example, if I load interrupt register with 0x12, and put 0x10 on the bus after interrupting the CPU, the CPU will jump to 0x1210. If I put data 0x3000 there beforehand, the CPU will then read a word from 0x1210, gets 0x3000, and jump to 0x3000, which is where the handler is. It’s extremely confusing at first, but once you get the hang of it you’ll realized this vectorized interrupt system is much more powerful than other modes, you can change the location of ISR on the fly, and it supports a huge number of interrupting devices. It might be just a keyboard for now, but I’m also going to add a WiFi module in the future, which takes 2 ports, a timer wouldn’t hurt either. Anyway, the thing is that FAP is going to have a couple more peripherals later on, and interrupt mode 2 is the best way for it.

Now let’s recap the entire interrupt process to see what my keyboard controller needs to do: When user presses a key, the controller pulls the INT line low, wait until M1′ and IORQ’ both goes low (we’ll call it INTACK’, short for interrupt acknowledge), put the interrupt vector on the data bus, wait until INTACK goes high, then stop driving the data bus. Sounds like a lot of work, but I can just use 2 chips for that. A 74HC32 OR gate to generate INTACK’, which is then tied to OE’ of a 74HC245 buffer, it’s bidirectional but in this case I’ll just set it to a single direction. This way, the interrupt vector on the other side of the buffer is gated to the data bus the moment INTACK’ goes active. The CPU will then go to the ISR after 2 jumps.

Things gets slightly more complicated at the ISR though. The CPU needs to read the keyboard port to see what key user just pressed, that needs some decoding logic, the keyboard controller also needs to respond to CPU’s port read just in time, otherwise the CPU will get some garbage data. To achieve this I’m going to use an additional 2 chips, a 74HC688 8-bit equality detector and a 74HC573 transparent latch. The keyboard controller will latch the keyboard data , and when CPU performs a read, the ‘688 compares the port address and put the latched data onto the data bus.

So to sum it all up again, here is how my keyboard controller will work: The controller receives a byte of the key press, put the data on the latch, enable the latch, pull down INT’, wait for INTACK’, put interrupt vector on the data bus, and pull up INT’ and turn off latch when INTACK’ goes away. The cpu will then try to read from port 0, an IORD’ signal is generated when both IORQ’ and RD’ are low, IORD’ is tied to the OE’ of the latch, which enables it when it’s active, releasing the keyboard data onto the data bus for CPU to read.

For reading PS/2 signal itself I used an Arduino Pro Micro that I have laying around, while the AVR chip in Arduino itself might seem slow and out of date compared to the 32-bit microcontroller today, what’s unbeatable is its incredible communities. There is probably already an Arduino library written for every single thing you can think of, and for PS/2 it’s really a doddle. 2 minutes of google search got me the library, and the rest is just hooking up 2 wires. The Arduino reads the keypress and send out a byte of ASCII code to the main controller, which is a STM32F103 dev board, that manages the interrupt activity. Below is the schematics:

dfgsdfgsdfgsdfg

As you can see there are already 6 chips right off the bat, and it’s just the start. Looks like the I/O board is going to be the most complicated board in my FAP computer.

Here is the finished board:

I also wrote a short test program for the new I/O board:

	include helper.z80

	org 0x0
	jmp program_start

	org 0x10 ; interrupt vector table
	.dw 0x3000 ; keyboard interrupt at 0x3000

	org 0x100
	program_start:
	ld a, 0
	ld i, a ; load interrupt register
	ld sp, 0x7fff ; set up stack
	im 2 ; select interupt mode 2
	call enable_vram_copy

	ei ; enable interrupt

	ld d, 0
	ld e, 0
	ld b, 0x39 ; set text color to orange

	end: jp end ; do nothing

	kb_isr: ; keyboard ISR at 0x3000
	org 0x3000
	di ; disable interrupt first
	in a, (0) ; read keyboard port
	ld c, a ; print it on screen
	call print
	ei ; enable interrupt again
	reti

view raw kb.z80 hosted with ❤ by GitHub

Note how my program start from 0x100 now, it’s customary to start Z80 program at that location because there are reset vectors at $0000, $0008, $0010, $0018, $0020, $0028, $0030, $0038, I guess you can put code there if you’re not using RST instruction, but it’s just good practice to leave them alone. My code sets up stack, load interrupt register with 0x0, select interrupt mode 2 and enters an idle loop. When keyboard interrupt comes in the cpu will jump to 0x10, then jump to 0x3000, where the ISR will read a byte from I/O port 0 and print it on screen.

Here is the corresponding STM32 interrupt controller code snippet:

	int main(void)
	{
	/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
	HAL_Init();
	/* Configure the system clock */
	SystemClock_Config();
	/* Initialize all configured peripherals */
	MX_GPIO_Init();
	MX_USART1_UART_Init();
	MX_USART3_UART_Init();

	HAL_UART_MspInit(&huart1);
	HAL_UART_MspInit(&huart3);

	// turn off keyboard data latch
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_LATCH_H, LOW);
	// deselect keyboard interrupt
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_INT_SEL_L, HIGH);
	// deselect Z80 global interrupt
	HAL_GPIO_WritePin(PORT_CTRL_PORT, GLOBAL_INT_L, HIGH);
	// start receiving data from keyboard
	HAL_UART_Receive_IT(&huart1, kb_recv_buf, RECV_BUF_SIZE);

	while (1)
	{
	;
	}

	}

	// UART receive complete
	void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
	{
	// if we received a byte from keyboard...
	if(huart->Instance==USART1)
	{
	// latch keyboard data
	CPU_DATA_PORT->ODR = kb_recv_buf[0];
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_LATCH_H, HIGH);
	// select keyboard interrupt
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_INT_SEL_L, LOW);
	HAL_GPIO_WritePin(PORT_CTRL_PORT, GLOBAL_INT_L, LOW);
	HAL_UART_Receive_IT(huart, kb_recv_buf, RECV_BUF_SIZE);
	}
	}

	// INTACK interrupt handler, rising edge
	void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
	{
	// when CPU's INTACK' goes from active to inactive...
	if(GPIO_Pin == GPIO_PIN_6)
	{
	// clear interrupt
	HAL_GPIO_WritePin(PORT_CTRL_PORT, GLOBAL_INT_L, HIGH);
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_INT_SEL_L, HIGH);
	// turn off latch
	HAL_GPIO_WritePin(PORT_CTRL_PORT, KB_LATCH_H, LOW);
	}
	}

view raw int_ctrl.c hosted with ❤ by GitHub

The interrupt controller turns off latch and interrupt on startup, then start listening from the Arduino that decodes PS/2 commands, once it receives a keypress byte it put that byte on lower 8 bits of PORTA, enable the keyboard data latch, then activities Z80’s INT. It then waits for INTACK’ to finish, after which it pulls INT line high again and disables the keyboard data latch.

Does it work? Watch the video:

As it turned out it works surprisingly well, of course that’s not how it looked like the first time round. At first the FAP would print a few characters, and goes back and print from the beginning, it was clock speed depend as well, as it didn’t happen quite as often at slower clock speeds. A bit of debugging later I found that the CPU would randomly jump to 0x66, and since there’s nothing there, it would slide a bunch of NOP’s all the way to 0x100, and start executing the main program from beginning. As it happens 0x66 address is where NMI will jump to if the line is active. It’s the goddamn noises again, and it also looks like the pull up from main stm32 controller is too weak. A few more filtering caps and a dedicated pullup resistor on NMI later, it’s working like a dream.

Next step? WiFi. More specifically, I want to use my FAP as a IRC client to use on Twitch chats, now we have the video card and keyboard, all that’s left is some way to stream IRC data to FAP, which I’ll tackle in the next post. It’s all getting very close now.

Previous Post: FAP says Hello World

Next post: FAP reborn – Backplane

Github repo

FAP says Hello World

Previous Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Next Post: FAP with a Keyboard

It has been a week since my last post, when we left the action last time the FAP’s video card was working with double buffering and color attributes, and a test run resulted in this:

Screen Shot 2016-03-15 at 6.23.37 PM copy.jpg

Remember the program was filling up attribute memory with 0x1b, a purple color, and character memory with letter A. However as you can see above, after running the program some character cells did not turn purple, and some others turned purple, but letters did not change to A. This was because the CPU was trying to write to VRAM while VRAM copying was under way, and the operation was ignored by the GPU. To prevent this from happening we need some way to let CPU know that GPU is busy so it can wait a little until the copying is done. I decided to make a few memory mapped virtual register for my GPU, this way the Z80 can ask GPU if it’s busy first, if it is the CPU will wait, otherwise CPU can write to VRAM right away, here is the updated virtual register code:

	module cpu_vreg(
	input wire clk,
	input wire copy_in_progress,
	input wire cpu_rd,
	input wire cpu_wr,
	input wire cpu_mreq,
	input wire [15:0] cpu_addr,
	inout wire [7:0] cpu_data,
	output reg back_vram_wr_low,
	output reg back_vram_rd_low,
	output reg [12:0] back_vram_addr,
	output reg [7:0] back_vram_data
	);

	assign cpu_data = (cpu_wr == 1 && cpu_rd == 0 && cpu_mreq == 0 && cpu_addr == 16'h92c0) ? copy_in_progress : 8'bzzzzzzzz;

	always @(posedge clk)
	begin
	if(copy_in_progress == 0 && cpu_rd == 1 && cpu_wr == 0 && cpu_mreq == 0 && cpu_addr >= 16'h8000 && cpu_addr <= 16'h92bf) begin
	back_vram_addr = cpu_addr[12:0];
	back_vram_data = cpu_data;
	back_vram_wr_low = 0;
	end
	else begin
	back_vram_wr_low = 1;
	back_vram_addr = 13'bzzzzzzzzzzzzz;
	back_vram_data = 8'bzzzzzzzz;
	end
	end

	endmodule

view raw fap_gpu_vreg.v hosted with ❤ by GitHub

Basically I just added a single line of code, now when CPU tries to read address 0x92c0 it will get the value of copy_in_progress, which is 1 when VRAM is underway. This way, the CPU first poll this register every time it wants to write to VRAM, and wait if its value is 1, problem solved! Well not quite:

After running the VRAM-filling program again, there are still a bunch of problem character cells, although much fewer than before. Actually it’s not hard to figure out why: Sometimes the CPU will ask right before VRAM copy starts, so the GPU replies that it’s idle, but when CPU tries to write to VRAM a few clock cycles later the copying will already be underway, and the write gets ignored as a result. We need a way to let CPU know a little while before copy actually starts. I did a dirty little hack by creating another signal that goes active a few scanlines before VBLANK actually starts, this way the CPU will see that GPU is busy a little earlier, so if it happens to try to write before the copy it will have time to do finish the operation. After that, the FAP is finally rendering a beautiful screen full of purple A’s. Time to write a proper “hello world” program:

	org 0x0
	xor d
	xor e
	xor b
	ld sp, 0x7fff

	ld hl, 0x92bf ; fill attribute RAM with 0x3c, yellow
	ld de, 2400
	ld b, 0x3c
	clear_attri:
	call check
	ld (hl), b
	dec hl
	dec de
	ld a,d
	or e
	jp nz,clear_attri

	ld hl, 0x895f ; fill char RAM with 0, effectively clearing screen
	ld de, 2400
	ld b, 0
	clear_char:
	call check
	ld (hl), b
	dec hl
	dec de
	ld a,d
	or e
	jp nz,clear_char

	ld hl, 0x8001 ; then print "hello world"
	call check
	ld (hl), 'h'

	ld hl, 0x8002
	call check
	ld (hl), 'e'

	ld hl, 0x8003
	call check
	ld (hl), 'l'

	ld hl, 0x8004
	call check
	ld (hl), 'l'

	ld hl, 0x8005
	call check
	ld (hl), 'o'

	ld hl, 0x8007
	call check
	ld (hl), 'w'

	ld hl, 0x8008
	call check
	ld (hl), 'o'

	ld hl, 0x8009
	call check
	ld (hl), 'r'

	ld hl, 0x800a
	call check
	ld (hl), 'l'

	ld hl, 0x800b
	call check
	ld (hl), 'd'

	end: jp end


	; check if GPU copy is underway, wait if it does
	check: ld a, (0x92c0)
	or a
	jp nz,check
	ret

view raw helloword1.z80 hosted with ❤ by GitHub

I created a ‘check’ subroutine which gets called every time before CPU tries to write to VRAM. The program first fill the attribute with yellow, then clear the screen, then print “hello world” at the first row on the screen. Here is how it look like:

Nice isn’t it, it’s the moments like this that keeps me going. A few weeks ago I have no idea how to build a computer, and FAP was just a bunch of 20 year old chips, and now it’s running my program and saying hello to the world like any other proper computers!

Celebration aside though, notice how this hello world program is remarkably slow. I ran it at a slower clock to see the progress, but even at a full 2MHz it still takes like half a second to complete, which is forever in microcomputers. The reason is that the Z80 has to read the gpu_busy register before every single write, which wastes a huge amount of time. A better solution would be let CPU be able to disable GPU copying all together, do the write, then enable copying again. This way the CPU does not have to wait at all, and only has to write to GPU register twice instead of 4800 times. The new copy_enable register is at 0x92c1. Here is the updated code:


	module cpu_vreg(
	input wire clk,
	input wire copy_in_progress,
	input wire close_to_vlank,
	input wire cpu_rd,
	input wire cpu_wr,
	input wire cpu_mreq,
	output reg [7:0] copy_enable,
	input wire [15:0] cpu_addr,
	inout wire [7:0] cpu_data,
	output reg back_vram_wr_low,
	output reg back_vram_rd_low,
	output reg [12:0] back_vram_addr,
	output reg [7:0] back_vram_data
	);

	reg vram_read;

	assign cpu_data = (cpu_wr == 1 && cpu_rd == 0 && cpu_mreq == 0 && cpu_addr == 16'h92c0) ? copy_in_progress \|\| close_to_vlank : 8'bzzzzzzzz;
	assign cpu_data = vram_read ? back_vram_data : 8'bzzzzzzzz;

	always @(posedge clk)
	begin
	if(cpu_rd == 1 && cpu_wr == 0 && cpu_mreq == 0 && cpu_addr == 16'h92c1) begin
	copy_enable = cpu_data;
	end
	else if(copy_in_progress == 0 && cpu_rd == 1 && cpu_wr == 0 && cpu_mreq == 0 && cpu_addr >= 16'h8000 && cpu_addr <= 16'h92bf) begin
	back_vram_addr = cpu_addr[12:0];
	back_vram_data = cpu_data;
	back_vram_wr_low = 0;
	vram_read = 0;
	end
	else if(copy_in_progress == 0 && cpu_rd == 0 && cpu_wr == 1 && cpu_mreq == 0 && cpu_addr >= 16'h8000 && cpu_addr <= 16'h92bf) begin
	back_vram_addr = cpu_addr[12:0];
	back_vram_rd_low = 0;
	vram_read = 1;
	end
	else begin
	back_vram_wr_low = 1;
	back_vram_addr = 13'bzzzzzzzzzzzzz;
	back_vram_data = 8'bzzzzzzzz;
	back_vram_rd_low = 1'bz;
	vram_read = 0;
	end
	end

	endmodule

view raw cpu_vreg.v hosted with ❤ by GitHub

I added another if condition so now CPU can read from VRAM too. However most importantly now when CPU writes 0 to 0x92c1, VRAM copy will be disabled. The CPU then can write to VRAM at full speed without interruption. And when it’s done the CPU can enable VRAM copy again and the result will be displayed on the screen on the next frame. I wrote another program to test it.

	vram_char_base_addr .equ 0x8000
	vram_attri_base_addr .equ 0x8960

	org 0x0
	xor b
	xor c
	xor d
	xor e
	ld sp, 0x7fff

	call disable_vram_copy

	; clear screen
	ld de, 2400
	ld b, 0
	ld c, 0
	clear: call print
	dec de
	ld a,d
	or e
	jp nz,clear

	; now print "Hello World!" in white (0x3f)
	ld de, 1
	ld b, 0x3f

	ld c, 'H'
	call print
	inc de

	ld c, 'e'
	call print
	inc de

	ld c, 'l'
	call print
	inc de

	ld c, 'l'
	call print
	inc de

	ld c, 'o'
	call print
	inc de

	inc de

	ld c, 'W'
	call print
	inc de

	ld c, 'o'
	call print
	inc de

	ld c, 'r'
	call print
	inc de

	ld c, 'l'
	call print
	inc de

	ld c, 'd'
	call print
	inc de

	ld c, '!'
	call print
	inc de

	call enable_vram_copy

	end: jp end

	enable_vram_copy:
	ld hl, 0x92c1
	ld (hl), 0xff
	ret

	disable_vram_copy:
	ld hl, 0x92c1
	ld (hl), 0x0
	ret

	; print character
	; c: char
	; b: attri
	; de: char index
	; destories: b, c, d, e, hl
	print:
	ld hl, vram_attri_base_addr
	add hl, de
	ld (hl), b

	ld hl, vram_char_base_addr
	add hl, de
	ld (hl), c

	ret

view raw helloword2.z80 hosted with ❤ by GitHub

I put together a “print” function, you put character you want to print in c, attribute in b, index on screen in de, and call it. I first disable the VRAM copy, then clear the screen with char 0 and write “Hello World!” to it, then enable the VRAM copy again. Here is the result:

As you can see this is much faster than the first attempt. The text appears on screen instantly. Looks like disabling copy during bulk VRAM write is the way to go.

However, if you look at the video carefully the supposedly white text appears bit yellow or red. I spent days trying to fix this issue, tweaking the FPGA code and swapping out VRAM chips. In the end I don’t think it’s the VRAM or the copy routine, since the character and attribute are copied together, and all the texts seems fine. I think it’s the noise again, with more than 100 spaghetti wires running around. It might be better than breadboard, but apparently still not good enough. I’ll probably have to design a proper PCB for the video card to see if it gets better. I think I’ll move on for now, I’m just kind of tired with working on the video card right now.

Now the video card is working, next step is setting up the keyboard input for FAP. Find out what happens in my next post!

Previous Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Next Post: FAP with a Keyboard

Got My Mojo Working: Character Attributes, VGA board, Double Buffering, and CPU Interface

Previous Post: VGA Character Generator

Next Post: FAP says Hello World

In my last post I designed and built an early stage of FAP’s video card on breadboard, I decided to use VGA for video output, running 80×30 text mode at 640×480. The VRAM was working, so was the character ROM, and now the board is rendering garbage data in VRAM on start up as characters on screen, all is good.

However, there was still some glaring issues left untouched in the last post, for example the lack of colors. That would be the job of the attribute byte. In my design(and a lot of other early PCs) each text mode character is accompanied by an attribute, which describes what color is the said character, should it be blinking, etc. I ignored it last time to get the character generator working faster, now since it does, it’s time to go back to that.

First a little bit about the VRAM layout, the 80×30 text mode needs 2400 bytes for character, and another 2400 for attribute. Those 4800 bytes will be mapped to between 0x8000 and 0x92c0 in Z80’s memory, first 2400 for char and second half for attribute.

Here is updated verilog file with attribute fetching:

	always @(posedge clk50)
	begin
	// inside active region
	if ((vc >= vbp && vc < vfp) && (hc >= hbp && hc < hfp))
	begin
	front_vram_rd_low = 0;
	hpos = hc - hbp;
	vpos = vc - vbp;
	hdot = hpos[2:0];
	vdot = vpos[3:0];
	hchar = hpos[9:3];
	vchar = vpos[9:4];
	if (fetch_attribute) begin
	front_vram_addr = 80 * vchar + hchar;
	attribute_data = front_vram_data;
	end
	else begin
	front_vram_addr = 80 * vchar + hchar + 2400;
	char_data = front_vram_data;
	end
	fetch_attribute = ~fetch_attribute;
	red = font_pixel ? attribute_data[5:4] : 0;
	green = font_pixel ? attribute_data[3:2] : 0;
	blue = font_pixel ? attribute_data[1:0] : 0;
	end
	// outside active region
	else
	begin
	fetch_attribute = 1;
	front_vram_rd_low = 1;
	front_vram_addr = 13'bzzzzzzzzzzzz;
	red = 0;
	green = 0;
	blue = 0;
	end
	end

view raw attribute.v hosted with ❤ by GitHub

This block is now being clocked by the 50MHz internal clock, and the fetch_attribute alternates between 0 and 1 at each clock cycle. At the clock cycle where fetch_attribute is 0 the FPGA gives the character address to the VRAM, and when fetch_attribute is 1 the attribute address. Then the character is looked up in character ROM, while lower 6 bits of attribute is sent to DAC directly, giving colors. Here is what it looks like:

However, the fuzziness is back again. I can think of several reasons: the VRAM is now being accessed twice for every pixel, once for character once for attribute. This could mean it might be too slow again. The pixel clock is 25MHz, which means each clock cycle is 40ns, two memory access during this period means each access only has 20ns, and my VRAM is rated 15ns, so it’s very close. The faster speed also generates more noise, especially on the breadboard. Lastly, the monitor’s auto optimization seems a bit off on the signal of my video card. I bought some 12ns IS61C256 chips, it’s only 3 ns faster, but in FAP, every nanosecond counts.

I put it on, and it looks a little bit better, but not by much, since there is still the noisy breadboard. I’ll have to wait until I build the thing on the stripboard.

Now that the video card is mostly working by itself, we come to the issue of how to interface it with the CPU. As you know the VRAM holds the characters and attributes that will be rendered on screen, however that requires CPU writing something into it in the first place, but the VRAM is being read by the video card most of the time so the only time that CPU have the chance of writing into it is one of the blanking periods, namely Horizontal Blanking Interval and Vertical Blanking Interval. HBLANK happens every scanline but its duration is extremely short, only around 6us if I remember correctly, our 1970s Z80 won’t have enough time to push a lot of useful things into VRAM during that kind of time, and even if it does it will probably cause screen tearing or visual artifacts, since it change the content of a single pixel line instead of an entire frame. So that leaves us with VBLANK, that lasts around 1.6ms every single frame, which is plenty of time, but that means CPU will have to spend 90% of its time doing nothing but waiting for VBLANK, which is extremely wasteful, that is what Quinn chose to do with her Veronica. One way to get around this is to use a Dual Port RAM which allows write and read at the same time, but that stuff is pretty hard to find. I decided to use my existing parts and implement the tried-and-true method of freeing CPU from waiting for the beam: Double Buffering.

The principle of double buffering is actually pretty simple: two VRAMs, called back and front VRAM, are used. The CPU writes to back VRAM, and during the VBLANK period the content of back VRAM is copied to the front VRAM and subsequently rendered. This way CPU can write to VRAM at anytime it wants, apart from a miniscule amount of time during copying. Screen tearing and artifacts are also eliminated since the entire frame is being modified. The only possible downsides are probably cost and complexity, but I think it would be worth it in the end, when I won’t need to race the beam while writing my programs.

However, at this stage the noise problem on the breadboard is pretty obvious now, and adding another VRAM and 28 wires will only make it worse, so I decided to just build it on the board.

And to make sure I don’t make wiring mistakes I decided to start again and retest everything from the beginning again, firstly just the FPGA and DAC, displaying color bars.

Good, that works. Next step is hooking up one VRAM and try rendering the garbage data again.

Notice how all the noises are gone, good stuff, a lot of U’s for some reason.

Coming up next is the main course of this post: double buffering.


	module buffer_copier(
	input wire clk,
	input wire vblank,
	output reg front_vram_wr_low,
	output reg back_vram_rd_low,
	output reg copy_in_progress,
	inout wire [7:0] front_vram_data,
	inout wire [7:0] back_vram_data,
	output wire [12:0] front_vram_addr,
	output wire [12:0] back_vram_addr
	);

	reg [12:0] counter;

	assign front_vram_data = copy_in_progress ? back_vram_data : 8'bzzzzzzzz;

	always @(posedge clk)
	begin
	if (vblank == 0) begin
	back_vram_rd_low <= 1;
	front_vram_wr_low <= 1;
	copy_in_progress <= 0;
	counter <= 0;
	end
	else if (counter <= 4800) begin
	back_vram_rd_low <= 0;
	front_vram_wr_low <= 0;
	copy_in_progress <= 1;
	counter <= counter + 1;
	end
	else begin
	copy_in_progress <= 0;
	back_vram_rd_low <= 1;
	front_vram_wr_low <= 1;
	counter <= 0;
	end
	end

	assign back_vram_addr = copy_in_progress ? counter : 13'bzzzzzzzzzzzzz;
	assign front_vram_addr = copy_in_progress ? counter : 13'bzzzzzzzzzzzzz;

	endmodule

view raw buffer_copier.v hosted with ❤ by GitHub

The module waits for the vblank interval, and when it arrives it enables the read of back VRAM and write of the front VRAM, then starts a counter that goes to 4800, which doubles as the address for both front and back VRAM. This way the content of back VRAM is copied to the front as counter increments. The copy_in_progress signal is made available so that other modules as well as the CPU can know if a buffer copy is under way. The total copy time is around 192us, this means instead of having to wait 90% of the time to write to the VRAM, the CPU now only have to wait 1.1% of the time instead. A pretty big improvement.

I added another slot on FAP’s backplane and plugged the video card in, it’s practically indistinguishable from a GTX980 to be honest.

Until now the video card has been working by itself, copying buffers and rendering characters. However sooner or later CPU will have to be able to control it. There are some ways that CPU can talk to the GPU, certain computers map their VRAM directly to the addressable memory (NES and Gameboy comes to mind), while others chose to not expose the VRAM and instead have memory mapped GPU registers, MOS Technology 8568 and VIC-II belong to this kind. For FAP, I decided to use both. VRAM will be mapped to between 0x8000 and 0x92c0, while virtual registers somewhere after that. This way I can easily test something by writing directly into VRAM, while I can also ask GPU to do heavylifting operations like text scrolling, it’s the best of both words.

For now though, it’s just the write-only VRAM mapping, here is the code:


	module cpu_vreg(
	input wire clk,
	input wire copy_in_progress,
	input wire cpu_rd,
	input wire cpu_wr,
	input wire cpu_mreq,
	input wire [15:0] cpu_addr,
	inout wire [7:0] cpu_data,
	output reg back_vram_wr_low,
	output reg [12:0] back_vram_addr,
	output reg [7:0] back_vram_data
	);

	always @(posedge clk)
	begin
	if(copy_in_progress == 0 && cpu_wr == 0 && cpu_mreq == 0 && cpu_addr >= 16'h8000) begin
	back_vram_addr = cpu_addr[12:0];
	back_vram_data = cpu_data;
	back_vram_wr_low = 0;
	end
	else begin
	back_vram_wr_low = 1;
	back_vram_addr = 13'bzzzzzzzzzzzzz;
	back_vram_data = 8'bzzzzzzzz;
	end
	end

	endmodule

view raw cpu_vreg.v hosted with ❤ by GitHub

This simple module checks if the MREQ and WR is low and address is in the correct range, and connects the CPU bus to back VRAM if it does. I’ll implement VRAM read and virtual registers later, I can already print some pretty text with just write for now.

Time to write a Z80 program to try it out, if I start to write a value from the back of the VRAM to front, the screen should first change color, since addribute bytes are at the back of the VRAM, once attribute bytes have been filled, text will start appearing as the value fills up the character portion of the VRAM. Here is the test program:


	org 0x0
	start: xor d
	xor e
	xor b ; clear d, e, b
	ld hl, 0x92c0 ; load hl with address of back of the VRAM
	ld de, 2400 ; 2400 bytes of attributes
	ld b, 0x1b ; color, 0x1b is 011011, should be purple
	attri: ld (hl), b ; store the color into VRAM address at hl
	dec hl ; decrease hl to move on to next VRAM address
	dec de ; decrease the counter
	ld a,d
	or e ; check if counter is 0
	jp nz,attri ; continue looping if it's not

	ld de, 2400 ; now attributes have been filled, reload the counter
	ld b, 65 ; ASCII code for letter A
	char: ld (hl), b ; similarly loop as above, fill the chatacter VRAM
	dec hl
	dec de
	ld a,d
	or e
	jp nz,char

	end: jp end ; all done

view raw vram_fill_test.z80 hosted with ❤ by GitHub

The program first fills the bottom half of the VRAM with 0x1b, binary 011011, the first 01 goes to red DAC, middle 10 goes to green DAC, and last two bit 11 goes to blue DAC, the result should be a violet color. After filling up attribute the program then fills the first 2400 bytes of VRAM with 65, ASCII of letter A. So we should expect the garbage text on screen first turn violet, then fill up with letter A. Here is what happens:

Well it’s mostly what should be happening, however, some characters didn’t turn violet, and there are few missed A’s here and there. This was because CPU was trying to write during the VRAM copy operation and was ignored. My next step would be implementing a few virtual registers so CPU can check if GPU is busy before trying to write to VRAM. But anyhow, progress is progress.

There was actually a extremely frustrating backstory that I didn’t mention, after trying to get the above code work at around 500Hz, I tried to bump up the processor speed to around 500KHz and it simply would not work. Sometimes it enters the end loop early, sometimes it ignores the loop entirely and continues past the end of the program. I thought there was a loose connection or weak short somewhere along the bus but after inspecting and cleaning the board the problem still happens from time to time. I eventually let my STM32 controller print out the address and data content after each clock cycle, here is one of the executing traces:

Screen Shot 2016-03-10 at 5.49.52 AM.png

As you can see address bus is missing bits, 0x8960 turned into 0x8160, and jumping to 0xb turned into jumping to 0x3, that’s why my program was going all over the place. The culprit? Noise. After adding a couple of caps bewteen VCC and GND of the Z80 and memory chip, the problem went away. However I did spent a stupidly long amount of time trying to nail it down. It’s one of the things I should have known better, oh well.

Anyway, FAP’s video card is halfway working now. I’ll finish up the GPU virtual register in the next post, and have my FAP finally say “hello world” like all the great computer did when they were born.

Previous Post: VGA Character Generator

Next Post: FAP says Hello World

Putting the F in FAP: VGA Character Generator

Previous Post: Programming FAP

Next Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

As I mentioned in my first post, Steve Ciarcia, in his 1981 book Build Your Own Z80 Computer, called his computer ZAP as it stands for Z80 Application Processor. Since I was planning to use FPGA as a part of my own Z80 computer, it’s only natural to name mine FPGA Assisted Processor, or FAP in short. And now after getting CPU to work, adding memory, and programming the processor, the time has finally come for me to put the F in FAP and starting designing the video interface of my computer.

Looking back at the history of personal computers, it’s not hard to see that the graphical capability is one of the most important aspects of determining the popularity of a said computer, everything was hooked up to a monitor or TV, and no one wanted to have a computer that only communicates through a row of 16 LED lights. And it’s the same case with FAP. Right now it has no means of input/output whatsoever, the only way to see if a program is executing correctly is halting the CPU and examining the RAM content. My plan is to adding keyboard as input, and video out as output. As for the video, there are a number of standards to choose from. There’s RF, and composite video, then the slightly more modern VGA, and after that comes DVI, DisplayPort, HDMI and all the HD standards. I picked VGA because it’s a rather straightforward interface, and it is still being supported by a lot of monitors. I need a horizontal sync pulse, a vertical sync pulse, and 3 analog color signals between 0 and 0.7V. A pixel clock pushes out pixels at each line from left to right, at the end of the line is the HSYNC to tell monitor to move down to the next line, and VSYNC is generated when the last line has reached, signalling monitor to start a new frame. Standard resolution of VGA is 640×480, however, there are additional pixels and lines called front and back porch, those are not rendered on screen and were used to allow time for electron beam to move to the next line/frame. Those adds another 100 or so pixels on each line, and another 40 new lines. The porch also gives time to update video buffer.

When it comes to actually designing the video interface, you have to keep in mind that Z80’s resources is actually rather limited. If we are to drive the VGA at 640×480, we need to put what each pixel is displaying somewhere in a memory, and that is called a Video RAM. For 640×480 we’ll need 307200 bits just to store a 1-bit black and white image, and if we want colors, it will use more than 300K of memory if I use one byte for each pixel, and Z80 can only natively address 64K of memory. What’s more, the processor will have to update more than 300K dots at every single frame, which is way too slow for the 1970s CPU. Since most of what was to be displayed was text anyway, a lot of person computers at the time implemented what called the Text Mode. Instead of individual pixels, the screen is divided in to a number of text cells, each one containing a character, and with character ROM and some special circuits, a text screen can be rendered much faster, and the content of a screen can be manipulated easier too. And most importantly, this method saves a lot of memory, a 80×30 text mode screen in 640×480 only needs 2400 bytes, compared to 300K in bitmap mode. If I want color/underline/blinking/etc I could use an additional byte for those functions, this is often called the attribute. Therefore two bytes are often used in VRAM in text mode, one byte to specify what character it is, and another to specify the color and a number of other attributes of the said character.

I’m using FPGA for the VGA video controller, since it’s much faster than bigbanging VGA with a microcontroller. A class I took a few years ago used a FPGA board for a few projects, but I have forgotten most it by now, that class was also in VHDL, while a lot of the resources online are in Verilog, so this is basically starting anew for me. I used Mojo V3 with Spartan 6 chip. I like it because it’s open source, has a lot of pins, cheap, and it uses a ATmega32u4 microcontroller to configure the FPGA so I don’t need to spend more to buy a programmer, most importantly though it doesn’t have random peripherals that I don’t need taking up pins, it’s just a simple, clean, minimalist board with all the pins broken out and nothing else, just how I like it.

mojov3front_2 — The Mojo V3 FPGA Board, image source embeddedmicro.com

Here is FAP’s video spec that I planned: 640×480 resolution, 80×30 text mode with 8×16 monospace font, 64 colors, 2.4K character RAM and 2.4K attribute RAM. It’s rather basic, but I can always add more fancy stuff later.

I wanted to start with something simple like displaying some patterns, and then go from there. Fortunately for me VGA signal generation is one of the most common tasks for FPGA, and there are tons of resources online. I used some code of this rather excellent example, it simply displays some color bars on screen. I tweaked the code so it uses 2-bit color instead of 3, giving the video card 4*4*4 = 64 colors. The color outputs are digital, with 6 bits in total, which means I need to build a simple DAC to convert it to analog signal bewteen 0 and 0.7V, a R-2R DAC is enough, I used 1K-2K for the job.

	// display 100% saturation colorbars
	// ------------------------
	// Combinational "always block", which is a block that is
	// triggered when anything in the "sensitivity list" changes.
	// The asterisk implies that everything that is capable of triggering the block
	// is automatically included in the sensitivty list. In this case, it would be
	// equivalent to the following: always @(hc, vc)
	// Assignment statements can only be used on type "reg" and should be of the "blocking" type: =
	always @(*)
	begin
	// first check if we're within vertical active video range
	if (vc >= vbp && vc < vfp)
	begin
	// now display different colors every 80 pixels
	// while we're within the active horizontal range
	// -----------------
	// display white bar
	if (hc >= hbp && hc < (hbp+80))
	begin
	red = 2'b11;
	green = 2'b11;
	blue = 2'b11;
	end
	// display yellow bar
	else if (hc >= (hbp+80) && hc < (hbp+160))
	begin
	red = 2'b11;
	green = 2'b11;
	blue = 2'b00;
	end
	// display cyan bar
	else if (hc >= (hbp+160) && hc < (hbp+240))
	begin
	red = 2'b00;
	green = 2'b11;
	blue = 2'b11;
	end
	// display green bar
	else if (hc >= (hbp+240) && hc < (hbp+320))
	begin
	red = 2'b00;
	green = 2'b11;
	blue = 2'b00;
	end
	// display magenta bar
	else if (hc >= (hbp+320) && hc < (hbp+400))
	begin
	red = 2'b11;
	green = 2'b00;
	blue = 2'b11;
	end
	// display red bar
	else if (hc >= (hbp+400) && hc < (hbp+480))
	begin
	red = 2'b11;
	green = 2'b00;
	blue = 2'b00;
	end
	// display blue bar
	else if (hc >= (hbp+480) && hc < (hbp+560))
	begin
	red = 2'b00;
	green = 2'b00;
	blue = 2'b11;
	end
	// display black bar
	else if (hc >= (hbp+560) && hc < (hbp+640))
	begin
	red = 2'b00;
	green = 2'b00;
	blue = 2'b00;
	end
	// we're outside active horizontal range so display black
	else
	begin
	red = 0;
	green = 0;
	blue = 0;
	end
	end
	// we're outside active vertical range so display black
	else
	begin
	red = 0;
	green = 0;
	blue = 0;
	end
	end

view raw colorbar.v hosted with ❤ by GitHub

With everything hooked up, I uploaded the code and hooked it up to an old LCD monitor that I picked up for $7 just for this project.

Voilà, Color bars! However, the colors in question looks extremely dark, despite the monitor on full brightness. I tried another monitor and it was still the same, after measuring the output of DAC it turns out it’s only putting out 0.09V instead of 0.7V at full intensity, it looks like 1K-2K DAC can’t provide enough current to drive VGA inputs, which I should have known. I tried to use an op-amp buffer, but it was too slow for the 25MHz pixel clock. In the end I just used smaller value resistors for the DAC, I used 150 and 330 ohm, it’s not exactly double but it’ll have to do, and the peak output with VGA connected is 0.68V, pretty close to 0.7V in the specification. With the new DAC, the image is back to full brightness.

Eagle-eyed viewers might have spotted the color bars have changed colors, I did it to see the DAC performance, in 00, 01, 10, 11 increments, first only red, then all channels. Well it looks exactly like how it should be like, reds are red, and greys are grey without color casts.

Then it’s time to tackle the details of the text rendering. First I need two counters to know which pixel I’m at, one for each axis, hpos[9:0] is the horizontal counter that goes from 0 to 639, and vpos[9:0] is the vertical counter that goes from 0 to 479. As the controller draws each pixel I also need to know which pixel in the 8*16 font I am at, this is simply the lower 3/4 bit of the hpos/vpos counter. While the upper 7/6 bits of hpos/vpos is the index of the current character being rendered in the 80×30 grid. The controller will fetch a byte from character RAM, which contains the character to render, as well as an attribute byte to see what color that character is. For start I’ll ignore attribute for now, and the memory location of each character would be 80 * vchar + hchar, where vchar and hchar is the coordinate of the letter. And with the memory address, the video controller can give it to character RAM and get a byte of data back to render. I changed the character rendering code to see if it works.

	always @(*)
	begin
	// inside active region
	if ((vc >= vbp && vc < vfp) && (hc >= hbp && hc < hfp))
	begin
	vram_rd_low = 0;
	hpos = hc - hbp;
	vpos = vc - vbp;
	hdot = hpos[2:0];
	vdot = vpos[3:0];
	hchar = hpos[9:3];
	vchar = vpos[9:4];
	vram_addr = 80 * vchar + hchar;
	red = vram_data[1:0];
	green = vram_data[3:2];
	blue = vram_data[5:4];
	end
	// outside active region
	else
	begin
	vram_rd_low = 1;
	red = 0;
	green = 0;
	blue = 0;
	end
	end

view raw fap_textmode_blocktest.v hosted with ❤ by GitHub

As you can see it calculates a memory address vram_addr based on the pixel counter, and feeds the lower 6 bits of data to the VGA output, I hooked up a leftover SRAM, and here’s what greeted me when I ran the code:

Beautiful isn’t it? You can see each character cell, where letters will go in, only right now it’s just random colors from junk data inside the SRAM. Say I want to display “A” in a character cell, the content of memory of that cell will be 65, the ASCII code of A. Right now the VGA controller gets the data of 65 and just put its lower 6 bit out as color, next step is making a character ROM, so the VGA controller will look up 65 in it, and see which pixel it should render to put the letter A on screen instead of just a block of color.

And what do you know? Open source community to the rescue again! I found the exactly thing I need on github, a 8*16 code page 437 font. After dropping it in and giving it column and row data, I got this:

We actually got text! Instead of color blobs the VGA controller is rendering characters from garbage data in the SRAM. The characters do look somewhat fuzzy though, and some characters are not rendered properly. To compare with a known example, I filled up a EEPROM with 0 to 255 and dropped it in, it should look like this:

And here is what I got:

It looks even worse than before! If you look really closely you can see it follows the pattern of the code page 437, but every single character is mangled, sometimes beyond recognition. I first thought it was a breadboard noise problem, as I’m running 25MHz on it, and Quinn had this issue while designing hers. However looking closer I can tell each character are mangled in the exactly the same way, and they don’t change when I wiggle some wires, noise tends to be random, so it is not the case. I then thought something might be wrong with the character ROM code, but that seemed unlikely. I then thought it was my crappy monitor, but it looks the same on a better one I tried. Drunk and out of ideas, I though I hooked my wires wrong and basically started messing around with the SRAM data lines, while doing so I found something interesting, this is what the screen looks like while hooking up the 2nd data line directly to the address line:

Happy faces! And most importantly it looks perfect! So it’s not noise nor character ROM problem. I then hooked up the same wire to the data output of the EEPROM:

It looks like the faces are shifted to the left, looks like a timing problem to me. Time to break out the logic analyzer.

Screen Shot 2016-03-05 at 8.14.00 AM

Channel 4 is pixel clock, channel 0 is address, channel 2 is data, here you can see data lags behind address for a whopping 140ns, that is more than 4 pixel clocks, and that’s why those faces looks shifted 4 pixels, it turned out my SRAM and EEPROM is too slow, who would have thought that? The AT28C256-15PU I have is rated 150ns, so it’s pretty close, the SRAM I have is rated 50ns, which is still not fast enough, that’s why the image with SRAM is still fuzzy, but not as bad as the one with EEPROM.

Time to buy some new parts, IS61c256 looks promising, it’s the same 32K SRAM but with only 10 – 15ns of delay, which is 14 times faster than the one I’m using. It’s in SOJ package though, so I ought a fee breakout board too, I also picked up a newer Z80 CPU to replace the current one I’m using, this one is CMOS so it uses less power and can run up to 8MHz.

Soldered the new SRAM, hope I didn’t burn it out.

And after plugging it in, here’s how it looks like:

The speed does make a difference! All the character are sharply rendered now, it’s still displaying garbage because that’s what in the SRAM and the time, but the system is working.

Now that the text rendering is working, next part is the interface between the processor and the VGA controller. I’m so drunk and tired now so I’m just going to end it here, stay tuned for the next post!

Previous Post: Programming FAP

Next Post: Character Attributes, VGA board, Double Buffering, and CPU Interface

Programming FAP

Previous Post: Memories of FAP

Next Post: VGA Character Generator

Project Github Repo

It has been a few days since I last wrote about FAP, it’s not that I have given up on it, although I did cool off a bit and spent more time on other stuff. I’ve been developing a video card for FAP, but it’s not finished yet so I’ll leave it to another post. This time, it’s all about writing programs for FAP to execute.

With 16KB of RAM and 16KB of ROM, FAP is ready to function like any other proper computers! Although without any means of input or output, it’s not entirely useful at this stage, but I’m not going to let that stop me from at least making it execute a simple program to see if it actually works.

However, before doing that, some more housekeeping to do. I need a streamlined way to write an assembly program, have it assembled and uploaded to FAP and have it executed. I began by adding a “programming mode”, which is entered by default upon powering up. In this mode the STM32 controller kicks the CPU off the bus, takes control of the memory and listens on the serial input, I can then issue read or write commands via bluetooth serial and have it program the EEPROM. When I actually want FAP to start executing the program, I reset the STM32 controller while holding down the second button, and it will enter execute mode and start running the program, easy. While I was at it, I also added another button as well as labeling them. From left to right are reset, single clock, single instruction, and full speed start/stop.

The new program mode, and updated buttons.

Here is a code snippet of the program mode function, in the loop it checks if there is a command in the UART receive buffer, and execute it if there is. The command could be read, write, or zeroing the EEPROM.

	void program_mode()
	{
	// pull down BUSREQ
	HAL_GPIO_WritePin(CPU_CTRL_PORT, CPU_BUSREQ_PIN, LOW);\
	// cycle clock until BUSACK is low, now CPU is kicked off the bus
	while(HAL_GPIO_ReadPin(CPU_CTRL_PORT, CPU_BUSACK_PIN) != LOW)
	cycle_clock(1);

	// switch memory control signals to output to take over the bus
	GPIO_InitTypeDef GPIO_InitStruct;
	GPIO_InitStruct.Pin = CPU_MREQ_PIN \| CPU_RD_PIN \| CPU_WR_PIN;
	GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
	GPIO_InitStruct.Speed = GPIO_SPEED_HIGH;
	HAL_GPIO_Init(CPU_CTRL_PORT, &GPIO_InitStruct);

	// enable memory read
	HAL_GPIO_WritePin(CPU_CTRL_PORT, CPU_MREQ_PIN, LOW);
	HAL_GPIO_WritePin(CPU_CTRL_PORT, CPU_RD_PIN, LOW);
	HAL_GPIO_WritePin(CPU_CTRL_PORT, CPU_WR_PIN, HIGH);

	addr_output();
	data_input();
	CPU_ADDR_PORT->ODR = 0x0;

	lcd_say("cls WHITE");
	lcd_say("xstr 10,10,310,45,2,BLACK,YELLOW,0,0,1,\"Program Mode\"");
	lcd_say("xstr 10,60,310,45,1,BLACK,YELLOW,0,0,1,\"Awaiting commands...\"");

	// start receiving serial commands
	memset(debug_recv_buf, 0, DEBUG_RECV_SIZE);
	HAL_UART_Receive_IT(&huart2, debug_recv_buf, DEBUG_RECV_SIZE-1);

	while(1)
	{
	char *cmd_start = has_command(debug_recv_buf, DEBUG_RECV_SIZE);

	// if something is in recv buffer
	if(cmd_start != NULL)
	{
	// print the buffer to LCD
	memset(lcd_xmit_buf, 0, LCD_XMIT_SIZE);
	sprintf(lcd_xmit_buf, "xstr 10,120,310,45,2,BLACK,YELLOW,0,0,1,\"%s\"", cmd_start);
	lcd_say(lcd_xmit_buf);

	// eeprom read
	if(strncmp(cmd_start, "r ", 2) == 0)
	{
	int16_t arg1_pos = goto_next_arg(0, cmd_start);
	uint16_t addr = atoi(cmd_start + arg1_pos);
	uint8_t data = read_eeprom(addr);
	memset(debug_xmit_buf, 0, DEBUG_XMIT_SIZE);
	sprintf(debug_xmit_buf, "rd:a=%d,d=%d\r\n", addr, data);
	HAL_UART_Transmit(&huart2, debug_xmit_buf, strlen(debug_xmit_buf), 1000);
	}

	// eeprom write
	if(strncmp(cmd_start, "w ", 2) == 0)
	{
	int16_t arg1_pos = goto_next_arg(0, cmd_start);
	uint16_t addr = atoi(cmd_start + arg1_pos);
	int16_t arg2_pos = goto_next_arg(arg1_pos, cmd_start);
	uint8_t data = atoi(cmd_start + arg2_pos);
	write_eeprom(addr, data);
	uint8_t data_readback = read_eeprom(addr);
	memset(debug_xmit_buf, 0, DEBUG_XMIT_SIZE);
	sprintf(debug_xmit_buf, "wr:a=%d,d=%d\r\n", addr, data_readback);
	HAL_UART_Transmit(&huart2, debug_xmit_buf, strlen(debug_xmit_buf), 1000);
	}

	// zeroing eeprom
	if(strncmp(cmd_start, "z\r\n", 3) == 0)
	{
	for (int i = 0; i < 0x4000; i++)
	write_eeprom(i, 0);
	HAL_UART_Transmit(&huart2, "zeroing..\r\n", strlen("zeroing..\r\n"), 1000);
	HAL_UART_Transmit(&huart2, "zero complete\r\n", strlen("zero complete\r\n"), 1000);
	}

	memset(debug_recv_buf, 0, DEBUG_RECV_SIZE);
	HAL_UART_Receive_IT(&huart2, debug_recv_buf, DEBUG_RECV_SIZE-1);
	}
	}
	}

view raw program_mode.c hosted with ❤ by GitHub

With that done, time to focus on the PC side. I need a Z80 cross assembler, debugger, and some way to upload my program. The assembler I picked is zmac, a long lived and feature rich cross assembler, and open source too. It comes with a windows executable but you can compile it from source yourself. The debugger is z80sim, I downloaded it but haven’t got a chance to use it because all my programs are really simple right now. Anyway, that only leaves the matter of uploading the program. I wrote a simple and rather crappy python script for it:

	import sys
	import time
	import serial

	ser = serial.Serial(sys.argv[1], 115200, timeout=0.5)
	print("connected")

	def read_eep(addr):
	while 1:
	print("reading addr " + str(addr) + "...")
	ser.write(('r ' + str(addr) + '\r\n').encode())
	result = ser.readline().decode().replace('\r\n', '')
	try:
	if result.startswith("rd:"):
	result = result.lstrip('rd:').split(',')
	if int(result[0].split("a=")[1]) != addr:
	print("eepread: address mismatch")
	continue
	return int(result[1].split("d=")[1])
	except Exception as e:
	print("--------------- read exception -------------")
	print(e)
	print("----------")
	continue
	print("read timeout, retrying..")

	def write_eep(addr, data):
	while 1:
	print("writing addr " + str(addr) + " " + str(data))
	ser.write(('w ' + str(addr) + " " + str(data) + '\r\n').encode())
	result = ser.readline().decode().replace('\r\n', '')
	try:
	if result.startswith("wr:"):
	result = result.lstrip('rd:').split(',')
	if int(result[0].split("a=")[1]) != addr:
	print("eepwrite: address mismatch")
	continue
	if int(result[1].split("d=")[1]) != data:
	print("eepwrite: data mismatch")
	continue
	return
	except Exception as e:
	print("--------------- read exception -------------")
	print(e)
	print("----------")
	continue
	print("write timeout, retrying..")

	def zero_eeprom():
	print("zeroing EEPROM...")
	ser.write(('z\r\n').encode())
	while 1:
	result = ser.readline().decode().replace('\r\n', '')
	if "zero complete" in result:
	print("done")
	return

	zero_eeprom()
	start_addr = 0
	with open(sys.argv[2], "rb") as f:
	while 1:
	byte = f.read(1)
	if byte == b"":
	break
	print(byte[0])
	write_eep(start_addr, byte[0])
	start_addr += 1;

view raw FAP_uploader.py hosted with ❤ by GitHub

It basically go through each byte in a binary file and send write commands for each, while making sure it has been written properly. It’s pretty ugly but it works.

Now we have everything set up, time to write a proper program, and what a program it is!

	org 0x0
	start jmp start

view raw helloworld.z80 hosted with ❤ by GitHub

The program starts at address 0x0, and just jump to itself endlessly. The assembled program is 3 bytes: C3 00 00, C3 is JMP instruction, and 00 is the address its supposed to jump to. If everything goes well, we should see the address bus go in a loop forever.

Here you can see me uploading the program wirelessly to FAP via bluetooth, reset the STM32 controller while holding the second button to enter execute mode, and then a few single clock steps. You can see the address went from 0x0 to 0x1, then 0x2, and then back to 0x0 again, also notice the C3 and 00 on the data bus as Z80 fetches instructions. After that I let it run at full speed and you can see the address looping. Success!

Now that FAP is going, I wrote a slightly more complicated program for it to test writing to SRAM:

	org 0x0 ; program starts at address 0x0
	xor A ; clear A
	ld sp, 0x7fff ; set up stack
	start add 0x10 ; add 0x10 to A
	push AF ; push AF to stack
	jmp start

view raw stack_test.z80 hosted with ❤ by GitHub

After clearing A and setting up stack pointer, it adds 0x10 to A, then push the content of AF to stack every loop. If everything goes well I can let it run for a while, reset the STM32 controller to go into program mode, and read back values in SRAM, and it should be 0, 10, 20, 30 etc. This program is also a whooping 10 bytes, so it must be at least 3 times better than the last one, right?

I upload the program, let it run for a while, and read back the beginning of the stack. You can see the address bus looping, and the SRAM address flashing by in each loop. Now let’s examine the content:

0x7fff: 0x0
0x7ffe: 0x10
0x7ffd: 0x0
0x7ffc: 0x20
0x7ffb: 0x20
0x7ffa: 0x30
0x7ff9: 0x20
0x7ff8: 0x40
0x7ff7: 0x0
0x7ff6: 0x50
0x7ff5: 0x0
0x7ff4: 0x60
0x7ff3: 0x20
0x7ff2: 0x70
0x7ff1: 0x20

At first glance it might look like I cocked it up, since it’s not strictly increasing as the code suggests. However, push instruction pushes both A and F to stack, F first then A, so the value of A is at even addresses, i.e. 0x7ffe, 0x7ffc, 0x7ffa etc. And if you look at the numbers you can see they are indeed increasing. The odd addresses are contents of flag register, it should be 0 for the first few loops, but some of them are 0x20. 0x20 is flag bit 5 set, which is actually a undocumented flag that is “a copy of bit 5”, which is exactly what happened here, so after all, everything is alright.

And there you have it! FAP, built from a 8 ancient chips and spaghetti of wires, actually running proper programs that I wrote for it. It’s a big milestone in this build, and as you expect I’m thrilled by it. However, I’m going to put the F in FAP as I develop a VGA graphics card for the Z80 computer in the next post, stay tuned!

Previous Post: Memories of FAP

Next Post: VGA Character Generator

Memories of FAP

Next Post: Programming FAP

Previous Post: Bus Board, CPU Board, and Freerunning FAP

Project Github Repo

Now that FAP’s Z80 is free running with hardwired NOP, the next step would be to add some memories so I can store programs into it and thus executed by the CPU. So in this part of the build log I’m going to add some ROM and RAM to my computer.

I purchased a couple of AT28C256 EEPROM as well as HM62256 Static RAM. Both are 32KB chips in DIP, and fairly often used in homebrew computer builds. But before that, some housekeeping to do. Since I’m going to make another card with both memories on it, I need to add another slot on the backplane. And because the backplace strips are 5-wide, which is too close to put another card right next to the CPU card, I bridged it with male double row headers and some jumpers.

From top to bottom: CPU card, bridge, memory card slot.

Now comes the difficult part, before I start building I need to plan out the memory map of FAP, only when that is done can I start designing memory decoding logic and all the rest of it. After some reading I decided to keep it simple, and the memory map of FAP will be like this: 16KB ROM from address 0x0 to 0x3FFF, and 16KB RAM from 0x4000 to 0x7FFF. Since Z80 can address 64KB of memory directly, that leaves me with 32KB of unmapped memory, which can use for VRAM or some other memory mapped IO device. I guess 16KB is of RAM and ROM is plenty for now, if I needed more I can make another card.

Now comes the memory decoding, I’m going to use address lines for chip select signal, and MREQ’, RD’ and WR’ for OE’ and WE’. For chip select, notice how the most significant two bit of address select the memory. When both A15 and A14 are 0, address are within 0x0000 and 0x3FFF, which selects the ROM. Since both A15 and A14 has to be low for CS’ on ROM chip to be low, it’s easy to see a simple OR gate is enough. Memoy read is enabled when both MREQ’ and RD’ are low, again, OR gate. Memory write is exactly the same, active when both MREQ’ and WR’ are low, another OR gate. 3 OR gates, perfect for one 74HC32, which I have. The CS’ and WE’ signals are shared between two chips. And since we’re using only half of the chip, A14 on each chip is tied to the ground.

Now for selecting RAM, its CS’ needs to be low when A15 is low and A14 is high, which means inverted A15 NAND A14, I can use a 74HC00 for that.

So, below is the schematic for FAP’s memory card, only 4 chips, not bad.

Now comes the question of how to program the EEPROM, fortunately it’s largely not a problem because I can just pull low the BUSREQ’, clock the CPU until BUSACK’ is low, this put the address, data and control lines into high-Z and kicks the CPU off the bus, then I just write to the EEPROM by driving the bus with the STM32 board. Which is much simpler than Veronica‘s solution with a dedicated programmer and bus arbitration switches. Everything is done in the STM32F1 board, which is the big advantage I talked about in my first post.

Time to build the board first on breadboard and test it, there’s not much to write about the building process, apart it consuming my entire collection of jumpers. I also put two LEDs on CS’ lines so I can see which chip is being selected. So does it work? Look at the video below.

In this demo I simply put some address on the bus in 0x1000 increments, you can see the ROM chip select’s LED lights up when address is bewteen 0x0 and 0x3FFF, and RAM’s LED lights up when address is between 0x4000 and 0x7fff.

Next up is trying to read or write some values into the ROM, it’s relatively slow to write to EEPROM, each byte may take up to 10ms, that means it’ll take a whopping 160 seconds to write 16KB into it. It does support page write mode that writes 64 bytes at the same time, but it’s not implemented for now.

As reading EEPROM is fast while writing is slow, I wrote my EEPROM write function to check the existing data first, if they’re the same a costly write can be skipped. In the video below I’m writing the first 256 bytes of the ROM with its address, as in writing 0x1 to address 0x1, 0x2 to address 0x2 etc. In the first pass you can see it’s writing into the EEPROM, the second pass the value is already the same so it’s skipping every single write.

So far so good. Things went a bit wrong when trying to write to SRAM though, the write seems to work, but when read back the data is close, but not exactly the same as what has been written to it. I thought it was the timing difference between the ROM and RAM write, after almost an hour of tinkering with the code to no avail, I decided to check the connection and discovered that I wired two of SRAM’s address lines wrong. I should have known since it’s as you can see it’s such a mess. After correcting the wiring, it worked like a charm, which means it’s time to go on the actual card.

Bit of soldering later, I put ROM and its decoding logic on the card.

FAP memory card with just ROM and its decoding logic

Next step is RAM, however if you look at the datasheet, you’ll realize the pinout of the two chips are exactly the same, so instead of soldering another 28 wires I decide to simply piggyback the RAM on top of ROM, a technique Quinn also used in her Veronica.

By soldering every pin of the two chips together apart from the chip select, I can put the piggyback chips back in the socket and only hook up the cs line for the new RAM chip. Below is the completed memory card.

To test it, I loaded up ROM with 0 and let it ran, it was exactly the same as the free run last time, so I didn’t take a video, but it worked nevertheless!

Now the my computer have its own memories, I can finally upload some program into it and let it actually do stuff. However, without any means of input or output, FAP is going to be a hard sell for anyone else. That would be my next step, maybe making a VGA video card, or a UART, or even reading a keyboard! Stay tuned to find out on the next episode of the Adventure of FAP!

As usual, you can find the up-to-date resources on the Github repo of this project.

Next Post: Programming FAP

Previous Post: Bus Board, CPU Board, and Freerunning FAP

Bus Board, CPU Board, and Freerunning FAP

Project Github Repo

Previous Post: Long Time Coming: Building the FAP Z80 Computer

Next Post: Memories of FAP

Finally we’re going to to actually start building the computer! I’m probably going to rather frequently compare what I’m doing to what Quinn Dunki did with Veronica, since a large part of this project’s inspiration comes from hers.

Let’s start from the very basics, the construction material. Quinn etches, drills and solders her own circuit board, I don’t have such luxury here so I’m going to make mine out of prototyping boards, first comes to mind is the perfboard, I already have a couple of them laying around and thought it’s a good idea to use them. As it turns out 3 hours later, no, just say no.

If you don’t know about perfboard, it’s a regular PCB with drilled holes in a square grid of 0.1 inch spacing, and each hole is electrically isolated from each other, and that means you have to either wire wrap, which I don’t have, or use point to point soldering, which means soldering every single connections using 30AWG wires, which is hard to strip and position because they won’t stay in place. It’s alright for smaller projects, but if I did the whole thing using perfboard I might well be dead before even got to the F part of the FAP.

A bit of internet search later, I got some new prototyping boards that I’ll be using to build the computer, first of all is the large “motherboard” or “backplane” where all the modules will be plugged in. It’s a board with 5-hole conductive strips grouped together, so it’s sort of like a stripboard but not really. This board is around 6 x 8 inches, enough for a couple of expansions later on.

Now we have the motherboard, next up is the boards that our modules use, for that I bought what some may call “solderable breadboards”, which is basically PCBs with breadboard patterns. This is much better than perfboard because you can use solder wires to the adjacent pins of components, and it already has power buses available. This board is 5.5 x 3.7 inch, perfect fit for plugging into the backplane vertically.

Now a little bit about bus design, Quinn uses ISA connectors for her Veronica with 31 pins available, it works when you make your own PCB, but obviously not here. So I’m going to go with the tried and true pin headers. Double row female pin headers on the backplane, double row right angle male pin headers on the modules, and it’ll plug right in. And because the two rows are connected on the module card, it will bridge the gap between 5-hole groups on the backplane as well. What’s more, pin header comes 40 pins wide, a nice number for a system bus, and I can add more later if I want.

With the basics done, it’s time to design the bus pinout and the CPU board. Because it’s my first time doing this, I’m going to play it safe and put almost all CPU signals onto the bus. Pin 1 will be 5V, pin 2 to 9 is DATA[0:7], pin 10 to 25 is ADDRESS[0:15], pin 26 to 37 is every single CPU control signals apart from REFRESH and HALT, since we’re not using dynamic memories, pin 40 is GND. This leaves 2 pins free on the 40 pin bus, and have the option to make it even wider by just adding more pin headers, pretty good.

For the CPU board itself, it’s pretty straightforward. ADDRESS and DATA lines are buffered with 3 74HC245s, while control signals are connected to bus directly, if it turns out those needs buffering too I can add them later. Since ADDRESS is output only, the DIR pin on the ‘245 is tied to GND so B side is input. As for bidirectional DATA lines, I connected the DIR pin to WR’. When WR’ is active(low), DATA lines are output, and in 245 data goes from B to A, when WR’ is inactive(high), DATA lines are input, and 245 data goes from A to B. The output enable of the 245s are connected to inverted BUSACK’ signal, so when BUSACK’ is low, output enables are high, ADDR and DATA goes into high impedance, kicking the CPU off the bus. Below is the schematic.

Time to start building. Here are all the chips in place with all the power connections, as well as the DATA line from the buffer to the header. I had to cut a single line off the bottom of the board so the male header actually sticks out.

One hour later, all the connections are done, it looks like a mess, but that’s what happens with this many signals.

Making sure there is no shorts between power buses and all the connection are correct with the continuity tester, the CPU board is done. The logical next thing to do it test it. It’s not as easy as it sounds though, the CPU needs clock signal, data inputs, and 6 control signals, and I need a way to see what’s on the ADDRESS and DATA bus to see what the CPU is doing, we need some way to do all those.

What Quinn did was designing a dedicated clock circuit with monostable 555 for single-step clock, as well as crystal oscillator for full speed clock, and SPDT switch to select between the two. And for visualizing the bus content, the HexOut display module. It’s fun and all designing all those support circuits, but I’m going to skip all that trouble and use a microcontroller to control the FAP in the early stage of the development. Since the microcontroller is programmable I can clock the CPU as fast or slow as I want, put anything I want on the DATA bus, see what’s on the ADDRESS bus, achieve reliable power-on resets, monitor and change all the control signals, and it also acts as a USB power supply. It’s much more robust and customizable than using dedicated circuits, takes much less space, and probably cheaper in the end too.

The microcontroller in question is the STM32F103VCT6 on its Minimum System Development Board from ebay. This little beast of a uC has 5 16-bit GPIO ports, a total of wooping 80 GPIO pins, plenty enough for FAP. It also has 256KB of flash memory and 64 KB of SRAM, which means I can fit the entire addressable space of Z80 inside the flash memory of this uC 4 times if I want. It also runs at 72MHz, 18 times faster than Z80, and has a million peripherals that I’m not going to use in this project. And the best thing about it is that it doesn’t have all the bullshit components tacked on like they do in the Discovery boards ST makes themselves, just a simple minimal working system with all the pin broken out, and nothing else. How much for this? $6.5. If not because of the extensive libraries, there is almost no reason left use Arduino now we have this. All in all, not bad at all.

I thought I would finish the backplane board before testing the CPU board. The blackplane is pretty simple too, all the bus signals goes into the microcontroller, a couple of buttons for reset and clock control, and two UARTs will be used, one for debug output, one for serial LCD. The entire PORTE will be used for ADDRESS, lower 8 bit of PORTD for DATA, PORTC for control signals, PORTB for buttons, and PORTA for 2 UARTS.

123123123

The construction is similar to the CPU board, just tons of wires solder, which isn’t very interesting. I needed to cut across two traces so that the double row headers on the controller board doesn’t get connected together.

Finished backplane, the 40-pin double-row header is for CPU card, the controller goes to the lower right, and buttons are for reset, single step, and run/stop.

Well not actually quite finished, I forgot to add the two UART headers. I used a bluetooth module, so I don’t have to attach another wire to it. Here is the really finished board:

Now it’s finally time to test the CPU board. We want the CPU to execute some instructions, but since we don’t actually have any memories, I’ll start with a simple one. What I’ll do is to tie the DATA line to GND, all the control inputs to VCC, hold RESET low for at least 3 clock cycles to properly reset it, then clock it normally. The CPU will start executing from address 0x0, and fetch an instruction from the DATA bus, it will be 0x0 since I tied all of them to GND. The Z80 will therefore execute the 0x0 instruction, which happens to be NOP, which doesn’t do anything at all. It will simply go to the next address and fetch another instruction, which is still 0x0, another NOP. In the end it just keeping going to the next addresses, and we can see the address increase in the ADDRESS bus, this is called free running the CPU, a simple way to make sure my CPU is working.

I set lower 8 bits of PORTD, which is connected to DATA bus, to output 0, and PORTE, connected to ADDRESS bus, as input. The content of ADDRESS and DATA bus are displayed on the LCD screen, and CPU are clocked by pressing a button, or set to run continuously at the press of another button.

Well does it work? See for yourself:

As you can see, I started by single stepping the clock, each time I press the button the clock advances 1 cycle. It takes 4 cycles for the address to advance 1, which is in line with what datasheet states. Capture

After letting it free run for a while, you might notice the address is fluctuating instead of simply going up. This is due to the memory refresh operations that is built into the Z80 so it would be easier to use with the cheaper DRAM. I stopped the clock to single step again, and you can see the address changes every two clock cycles now, just like the what datasheet say again. The first two cycle is PC, and the last two is refresh address.

345

After thinking long and hard about the design, my FAP finally lives! All is well, I’m glad it works after all the work. Next step is to add some ROM and RAM, so it can actually execute some meaningful instructions. Stay tuned to find out more!

Again, you can find the up-to-date resources on the Github repo of this project.

Previous Post: Long Time Coming: Building the FAP Z80 Computer

Next Post: Memories of FAP

	free iptv on FAP reborn – Backpl…
	Eric on FAP reborn: the new I/O c…
	vapula on FAP reborn: the new I/O c…
	vectormune on FAP reborn – Backpl…
	Sweeney on Got My Mojo Working: Character…