2016-09-15

Installing Linux on a SolidRun Clearfog+eMMC board

A few months ago Marvell and SolidRun kindly offered me a Clearfog-A1 (now called Clearfog Pro) board. It is a nice development and/or production board adding an impressive connectivity to quite a powerful Armada388 SoC. This SoC contains two Cortex-A9 CPU cores and high performance I/O like its predecessors but provides 3 GigE ports. It makes me think of the ArmadaXP but consumes much less power and is significantly faster. I've been keeping that board in my bag with all my stuff and using it for kernel testing and various performance tests on ARM.

By wandering on the SolidRun web site recently I noticed that SolidRun had made a new board of half its size and still with all the connectivity (they removed one PCIe slot and the Ethernet switch). The board is amazingly appealing as a small network development board, being fanless, supporting wide voltage ranges, and affordable. So I ordered one. I decided to pick the eMMC version that will save me from losing the micro-SD card all the time.

When I received it I was quite disappointed. It wouldn't boot. No message, nothing. I exchanged the CPU modules between my two boards and found that the CPU module was the culprit. While leaving the board powered on and unattended, I noticed a message "Trying UART" which made me think it wouldn't boot from my micro-SD card and would be trying the UART port instead. After a few exchanges with SolidRun's support, they confirmed that the same lines are used for the eMMC and the micro-SD so the CPU cannot use the micro-SD at all when eMMC is soldered on the board. Not fun at all. I was disappointed that no bootloader was flashed on the board before it was shipped, but for their defence, the board is very new and apparently still being worked on.

But this message "Trying UART" I saw reminded me of the Mirabox. Thus I thought I would try the same procedure I used a few years ago to unbrick it. I first started by trying all possible 32 combinations of the SW1 DIP switches to know which ones allow to boot from what device. Some combinations never returned anything, but the apparently valid ones are reported below. It's a very long and tedious process because most of the time the messages appear after a failed attempt. Values are indicated with switch 1 on the left, switch 5 on the right, OFF = 0, ON = 1.


First valueLast valueDevice attempted to boot from
0000000101SPI flash, not working
00010-SPI flash, working!
00110-MMC but unusable ("card doesn't respond to voltage select")
00111-MMC, working!
01001-UART
01010-NOR
0110001101NOR
1001010111NAND
11000-NOR
1101011011PEX0

Thus I've set the board to value "01001" to enable booting from the UART. And now here's how to proceed.

What you need

This howto assumes that you have a full-featured, networked Linux-based machine with superuser privileges, a properly working ARM toolchain built with soft float (ARMv5 will work fine), Git, a micro USB cable, the usb-serial driver supporting the FTDI chips, a terminal client like "screen", "minicom", "cu", a TFTP server and a network cable.

Build the U-Boot boot loader

Let's first download the Marvell-enabled U-Boot boot loader :
$ git clone https://github.com/MarvellEmbeddedProcessors/u-boot-marvell
$ cd u-boot-marvell
$ git checkout u-boot-2013.01-15t1-clearfog

Pick a soft-float toolchain. Here we use an ARMv5 toolchain. If your toolchain was built with hard-float only support, the build will fail due to some unexpected VFP registers at the end. Configure and build the boot loader for the clearfog board :
$ make CROSS_COMPILE=/toolchain_prefix armada_38x_clearfog_config
$ make CROSS_COMPILE=/toolchain_prefix -j 8 u-boot.mmc
...
 Ext. headers = 1, Header size = 79360 bytes Hdr-to-Img gap = 0 bytes
New image size = 0xd6350[877392] Source image size = 0xd634c[877388]
====>>>> u-boot.mmc was created

The image now needs to be repackaged for U-Boot for booting over UART and from the on-board SPI flash. It requires changing the first byte of the image header and recomputing the image signature. The "doimage" utility does it automatically for us :
./tools/marvell/doimage -T uart -D 0x0 -E 0x0 -G tools/marvell/bin_hdr/bin_hdr.uart.bin u-boot.bin u-boot.uart
./tools/marvell/doimage -T flash -D 0x0 -E 0x0 -G tools/marvell/bin_hdr/bin_hdr.bin u-boot.bin u-boot.flash

Configure the board to boot from the UART

As indicated in the table above, the SW1 DIP switches have to be set to 01001 or OFF,ON,OFF,OFF,ON. You can use a toothpick for this, it's better than a pen and will not leave ink on the switches. If you don't do this, it will still work but will take ages because the BootROM code will first try to boot from the configured devices. If you run a terminal emulator on the serial port, you should see the following appear after a few seconds when powering the board up :
BootROM - 1.73

Trying Uart

If it doesn't appear, it may indicate that the SW1 DIP switches are not properly set and that the board is trying to boot from another device. Don't worry, it will eventually try the UART after it times out on the other devices. It can take up to 3-4 minutes sometimes, thus why it's best to properly configure it.

Upload U-Boot to the board

Now connect the board's to your development machine via the micro-USB connector. A usb-serial device should appear :
# lsusb
Bus 001 Device 101: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

The "usbserial" and "ftdi_sio" modules usually need to be loaded to support this board.
In order to upload the image, it is necessary to send a "magic" header to the serial port (assumed to be /dev/ttyUSB0 here) and wait for an ACK (0x15) from the board, then start to send the data using the Xmodem protocol. Since we don't have any flow control here and we don't know when the header will be detected, we send it in loops over the serial port until we receive the ACK indicating the board is waiting for us to upload the boot loader :
# (while :; do
    printf "\xbb\x11\x22\x33\x44\x55\x66\x77"
    if read -t 0 && read -rN1 && [ -n "$REPLY" ]; then
      set -- $(echo -n "$REPLY" | od -tx1 -An);
      [ "$1" = "15" ] && break
    fi
  done) </dev/ttyUSB0 >/dev/ttyUSB0

Once this command is started, connect the power to the board. After about 5 seconds, the loop above returns to the shell. It means the board is ready to receive the boot loader.
If you make a mistake and have to try again, do not worry, just press the reset button (close to the power connector), and run the script above again.
We can then use the "sx" utility to send the UART image of the boot loader using Xmodem. It will take about 90 seconds to upload about 950 kB at 115200 bps :
# sx u-boot.uart </dev/ttyUSB0 >/dev/ttyUSB0
Sending u-boot.uart, 7464 blocks: Give your local XMODEM receive command now. Xmodem sectors/kbytes sent: 2556/319k
After the file is transferred, the board immediately boots from this boot loader sitting in memory. Since there's nothing installed on the eMMC, the boot loader cannot boot and will stop at the prompt, waiting for your commands.

Connect to the boot loader

Use your favorite client to connect to the boot loader via the serial port, for example here with "screen". Press Enter, you should see the "Marvell>>" prompt :
$ screen /dev/ttyUSB0 115200
Marvell>>

Now the board is alive and you control it. Here there are multiple possibilities, one of which consists in loading a kernel and an initrd via TFTP, boot the machine and complete the installation. However, the risks of getting it wrong is high, and having to reboot via the serial port again is painfully slow. The board also features an SPI flash, but U-Boot doesn't seem to be able to use it at the moment, every attempts results in a system freeze. So we'll flash the boot loader to the eMMC instead.

Flash the new boot loader to eMMC

Before having to connect many cables, we'll reuse the "sx" utility to send the boot loader over the serial port again, but the MMC version this time, that can be flashed. First, let's make U-Boot wait for a file to be sent over Xmodem :
Marvell>> loadx
## Switch baudrate to 115200  115200 bps and press ENTER ...
## Ready for binary (xmodem) download to 0x02000000 at 115200 bps...

Now you need to quit the terminal client. For "screen", it will be Ctrl-A, "\" then "y". Or "killall screen" from another window will do it fine. Then from the command line we send the MMC image :
# sx u-boot.mmc </dev/ttyUSB0 >/dev/ttyUSB0

At the end of the transfer, reconnect the terminal client, and verify that the image was properly transferred. "AE 01" at the beginning indicates an MMC image :
Marvell>> md.b 0x02000000
02000000: ae 01 00 00 98 69 0d 00 01 01 00 36 00 36 01 00    .....i.....6.6..
02000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 a6    ................
02000020: 02 01 50 35 02 00 00 00 5b 00 00 00 00 00 00 00    ..P5....[.......
02000030: ff 5f 2d e9 c1 02 00 fa 00 00 a0 e3 ff 9f bd e8    ._-.............

The mmc command supports 512-byte blocks. Here we've uploaded almost 1 MB, or 2048 blocks :
Marvell>> mmc write 0x02000000 0 2048
MMC write: dev # 0, block # 0, count 8264 ... 8264 blocks write: OK

Verify the copy by reading to another address and dumping it again :
Marvell>> mmc read 0x03000000 0 2048
Marvell>> md.b 0x03000000
03000000: ae 01 00 00 98 69 0d 00 01 01 00 36 00 36 01 00    .....i.....6.6..
...

Configure the board to boot from eMMC

As indicated in the table above, the SW1 DIP switches have to be set to 00111 or OFF,OFF,ON,ON,ON.

Boot the board from eMMC

Start your terminal and press reset to boot the board from eMMC. It will display a lot of useful information and wait for your prompt. The board will be able to be rebooted as often as needed without all this complex procedure now :
# screen </dev/ttyUSB0 >/dev/ttyUSB0
BootROM - 1.73

Booting from MMC
BootROM: Bad header at offset 00000000


General initialization - Version: 1.0.0
Detected Device ID 6828
High speed PHY - Version: 2.0

Init Customer board board SerDes lanes topology details:
 | Lane # | Speed|    Type     |
 ------------------------------|
 |   0    |  3   |  SATA0      |
 |   1    |  0   |  SGMII1     |
 |   2    |  5   |  PCIe1      |
 |   3    |  5   |  USB3 HOST1 |
 |   4    |  5   |  PCIe2      |
 |   5    |  0   |  SGMII2     |
 -------------------------------
PCIe, Idx 1: detected no link
PCIe, Idx 2: detected no link
High speed PHY - Ended Successfully
DDR3 Training Sequence - Ver TIP-1.39.0
DDR3 Training Sequence - Switching XBAR Window to FastPath Window 
DDR3 Training Sequence - Ended Successfully
BootROM: Image checksum verification PASSED

 __   __                      _ _
|  \/  | __ _ _ ____   _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|    \_/ \___|_|_|
         _   _     ____              _
        | | | |   | __ )  ___   ___ | |_ 
        | | | |___|  _ \ / _ \ / _ \| __| 
        | |_| |___| |_) | (_) | (_) | |_ 
         \___/    |____/ \___/ \___/ \__| 
 ** LOADER **


U-Boot 2013.01-gc1d6f3e (Sep 28 2015 - 00:17:00) Marvell version: 2015_T1.0p11

Board: A38x-Customer-Board-1
SoC:   MV88F6828 Rev A0
       running 2 CPUs
CPU:   ARM Cortex A9 MPCore (Rev 1) LE
       CPU 0
       CPU    @ 1600 [MHz]
       L2     @ 800 [MHz]
       TClock @ 250 [MHz]
       DDR3    @ 800 [MHz]
       DDR3 32 Bit Width,FastPath Memory Access, DLB Enabled, ECC Disabled
DRAM:  1 GiB
MMC:   mv_sdh: 0
sdhci_transfer_data: Error detected in status(0x408000)!
PCI-e 0: Detected No Link.
PCI-e 1: Detected No Link.
USB2.0 0: Host Mode
USB3.0 0: Host Mode
USB3.0 1: Host Mode

Map:   Code:                    0x3fed1000:0x3ff974d4
       BSS:                     0x3ffef15c
       Stack:                   0x3f9c0f20
       Heap:                    0x3f9c1000:0x3fed1000
       U-Boot Environment:      0x000f0000:0x00100000 (MMC)

Board configuration detected:
Net:   
|  port  | Interface | PHY address  |
|--------|-----------|--------------|
| egiga0 |   RGMII   |     0x00     |
| egiga1 |   SGMII   |   In-Band    |
| egiga2 |   SGMII   |   In-Band    |
egiga0 [PRIME], egiga1, egiga2
Hit any key to stop autoboot:  0 
Marvell>> 

If nothing comes, wait a few minutes and you may see it tried to boot from a different device, indicating you got the switch wrong. Just fix them and reboot.

Boot a Linux kernel

With a working boot loader, it becomes possible to boot a kernel from the network. Everything is not perfect yet but it's getting good. There are a few issues to be aware of that will save you a lot of time. First, regarding the network boot, U-Boot will default to using port 0 (the closest to the USB port). But after any network transfer error, U-Boot will automatically switch to the second port which never works. So it's important to always start a sequence by forcing the active Ethernet port to port 0. Second, there is a bug in this version of U-Boot. It can load a kernel, an initrd and a device tree. But if you pass it an initrd, it will ignore the device tree and will silently fail without any message on the serial port (since the kernel doesn't even know where to speak). The workaround against this is to always append the DTB to the zImage and never try to load the DTB from U-Boot.
Here is the boot sequence I used and which works for me :
Marvell>> setenv ethact egiga0
Marvell>> setenv bootargs 'root=/dev/ram0 rootfstype=squashfs console=ttyS0,115200n8'
Marvell>> kerneladdr=0x2000000 ; ramdiskaddr=0x6000000 ;
Marvell>> tftpboot ${kerneladdr} zImage-38x.dtb
Marvell>> tftpboot ${ramdiskaddr} uInitrd-clearfog
Marvell>> bootz ${kerneladdr} ${ramdiskaddr}

Your TFTP server will have to accept connections on IP address 10.4.50.38 from address 10.4.50.170.

Moving the boot loader

Since I wanted to partition my eMMC, it would overwrite U-Boot. I noticed in the error messages I faced initially that the BootROM code looks for the boot loader at two places on the eMMC memory :
  • 0x00000000 : this is where an MBR could be installed.
  • 0x00200000 : address 2 MB, there will usually be nothing there
It makes sense to properly partition the eMMC and skip the first 4 MB so that the address at 2 MB can contain the boot loader. That's what I did with a first partition starting at sector 8192 (4 MB), and I flashed the boot loader at the address above. It can also be done during the very first installation with the mmc write command.
Additionally I thought it would be a nice safety measure to install the boot loader on the SPI flash (4 MB). I did it from Linux since U-Boot cannot see it. But by default the DTS doesn't enable the SPI flash, so it needs to be modified for this to be done. I will cover this in a later post.

That's all for now

Please note that for now I'm using my own kernel and initrd which work fine on the my clearfog board. I may document later how to build a working kernel for this board but as long as you use a kernel more recent than 4.5.x, you won't even need to patch it and a working default config is present in mainline. Otherwise you can download some debian or ubuntu images from the SolidRun site, and extract the kernel, device tree and initrd from there.

Migration to blogspot

After many missed opportunities to share some quick information about new hacks in progress or discoveries due to the difficulty to update my web site, I decided to migrate the articles to a platform like blogspot. I'm not fond of the blog format but the editor is hassle-free, uploading photos is very easy, and I can put labels on the articles to help navigate through them, so I hope in the future I'll feel less reluctance against posting news when I have things to share.

I tried hard to keep the original post dates, it seems to have worked. There are quite some other (now old) stuff I never published that I may or may not add later (mostly electronics stuff).

Future experimentations with build farms and small servers will be posted here instead of polluting the Ant-Computing Wiki with something looking like a changelog.

2015-12-07

Repairing Korg Krome's blank screen

I'm posting this here in hope that it helps other people facing the same issue.

This week-end, my Korg Krome synth's display became blank again. Given that it's a touch device, it becomes pretty useless once the display doesn't work anymore. It's not the first time it happens, I even disassembled it once hoping it was only a cable or solder issue, but it worked again after being reassembled for no apparent reason. I noticed that often after it failed, it would re-appear after a few hours/days, and sometimes it would disappear again. I thought the LCD was dead. I ordered a new one on the net (almost any 7" 50-pin LCD with a resolution of 800x480 and about 15cm of cable will work). It cost me $15, it worked and failed again after one week. 

Finally I disassembled the Krome again and observed the LCD power board, and found that the LCD reset pin (44) had a strange voltage of 1.5V instead of 3.3V. The reason is a design error in the choice of resistor R28. It's 10K while it should be around 1K. With 10K it doesn't have enough strength to completely release the reset and it depends on the LCD's tolerance (which probably changes with aging). 

I simply soldered a 1K resistor on top of it, verified that the voltage on the reset pin is now 3.3V, and the problem is now fixed. 

In order to fix it, one must proceed like this : 

1) put the Krome top-down on a soft surface like a bed. Take care of not putting too much strength on the joystick 

2) remove all screws on the back. There are a lot, something like 37. Important : there's no hidden screw, so it's not needed to remove the rubber feet. 

3) gently pull the back vertically, it will very easily come. If it doesn't, you forgot a screw. 

4) you'll see the mainboard at the center, close to the back where the SD card is. You'll have to remove the screw with the plastic washer, and gently pull the soft plastified tin foil which protects against radio emissions I guess. 

5) remove then all cables going to the motherboard (no need to confuse them later, though it's better to take a photo). In order to remove the flat ribbon at the bottom, you first need to pull the brown part of the connector outwards to unlock it (do not force, it will come by alternatively pulling each extremity with your nail). 

6) remove the 6 screws from the motherboard, then take the motherboard out of the system. 

7) remove the metal frame that supported the motherboard. 6 screws again IIRC. 

8) you now see the small board with its cables like in this photo :
LCD controller board



It is not strictly necessary to remove the board to fix it but it's better as static electricity could destroy the LCD. The large flat cable at the top goes to the LCD. The small one on the top right is for the touch pad. The other small one at the bottom right goes to the audio board on the right. In order to remove the LCD cable, you first need to pull the white part of the connector upwards to unlock it (do not force, it will come by pulling with your nail). The other ones need to be pulled in their own direction without bending them. 

9) remove the 2 screws holding the board, and slightly unscrew the 4 other ones holding the metal frame so that you can get about 2-3 mm of clearance. That will be enough to release the board from the 2 plastic tips which hold it. 

10) find the resistor on the board. It's called R28 and marked "103" (10*10^3 ohms = 10K). There are two such resistors at the bottom left of the "CN17A" marking. The upper one is R29 and doesn't need to be touched. The bottom one is your friend. These two photos help locate it better, click on them to zoom in : 


11) DO NOT REMOVE IT! It's a small component, if you're not at ease with soldering small components, you'll certainly destroy the board by pulling off a copper lane. Instead, just find a 1 to 2k resistor on another unused board such as a dead motherboard. Such a resistor is marked 102 to 222. Note that the first two digits are less important than the 3rd one which is critical (exponent). It MUST in fact exactly be "10" to "22" followed by a "2". If you have something smaller than a 2 on the last digit, reset will never work. If you have something larger, your fix is useless. 

12) Once you've found this new resistor, directly solder it on top of R28. If you're having trouble soldering something that small, first stick it on top of the other one using some superglue then put a very small drop of solder on each side and that will be all. You must not heat it more than 1 or 2 seconds per side. The amount of tin to add is around 1mm * 1mm only. 

13) verify with an ohm-meter that you have slightly below 1-2k when you measure across the previous 10k resistor. If so you did it fine. 

14) reinstall the board, then tighten the metal frame's screws, screw the board, reconnect all connectors (be careful with the small ones, you don't want to bend them or they could cut). Then pass the motherboard wires that you might have moved close to the board, reinstall the top metal frame, then the motherboard, reconnect all wires and power on. It must work. Once it's OK, you can finish to reassemble everything. 

I noticed that black screws are used outside and grey ones are used inside, except a few holding the LCD frame and the keybed which are also black. 

Try not to mark your screws. A screwdriver made for PC parts is perfect and will not mark. Count around 10-15 minutes just to disassemble everything, about as much to reassemble, and as much to fix the board once you have found a suitable resistor. 

If you don't have a resistor, you can find one at an electronics components shop. Just go there with your board and they'll find the proper size. I guess it's a "0603" type of resistor (it indicates the size) though I'm not certain and forgot to measure. 

I forgot to say, I noticed that the original Krome's LCD is of very good quality. When you replace it with a cheap one, images are really ugly. But that was better than nothing. 

Hoping this helps other people! I've read on various forums that many other people got in trouble with the same issue and it's too bad that some repair shops take them hundreds of bucks to fix this. So do not hesitate to spread the message and to share your experience.

2013-12-13

Line rate HTTP server on the OpenBlocks AX3

This article explains how I'm using the OpenBlocks AX3 as a line-rate HTTP server for testing purposes.

Original idea

In 2003, while doing some performance tests on Netfilter, I realized how frustrating it was to always be limited by the load generators performance.You generally need at least 4-6 machines to load a firewall, with 2-3 HTTP clients and 2-3 HTTP servers. The second one of each is here to ensure that the bandwidth is never limited by a single machine, and the third one is here to prove that the limit reached with the first two cannot be overcome with more clients.And it's generally hard to find that many similar machines, you generally know that some are faster for sending, others for receiving, or that some are more efficient with large packets and others with small packets. In practice you're never totally confident in your own tests.

Two years later, while running some network benchmark to compare several firewall products for a customer, I faced the same issue again, especially when trying to stress the firewall with many short requests to maximize the connection rate. Then I got the idea of a dummy HTTP server which would only work in packet mode, without creating real TCP sessions. That would make it lighter and improve its ability to get close to line rate. Unfortunately, working with SOCK_PACKET by then was not really faster than the local TCP stack so I temporarily gave up on this idea.

After I recently became the lucky owner of an OpenBlocks AX3/4 microserver, the idea of exploiting to the maximum extents its high networking capabilities immediately woke up my old idea of stateless server. The platform is very recent and I needed to go deep into some kernel drivers, which explains why it took quite some time to reach a point where it's working.

The OpenBlocks AX3/4 microserver

The OpenBlocks AX3/4 microserver is a very neat device built by Japanese company Plat'Home.
The microserver compared to a 3,5" floppy disk for scale
This fanless device runs a dual-core 1.33 GHz Marvell Armada XPCPU (ARMv7), has 3 GB of RAM, 128 MB of NOR flash, 20 GB of SATA SSD, and, best of all, 4 true Gigabit Ethernet ports (I mean not over USB nor an internal switch nor crippled by design like in many Cortex-A9 based CPUs). In terms of average performance, it is comparable to a dual-core Atom running at the same frequency, though it consumes 4x less power.And indeed, even at full load, it becomes just warm to the touch. The design is robust and compact, so I now carry it everywhere with me as it's a very convenient device for many usages. The only criticism I could make is that it's a bit expensive, it clearly targets the enterprise market, which will value its benefits for building an ideal router, firewall, web server or monitoring device. But even then, many companies will prefer a cheaper low-end x86 box if they don't value the device's strong differenciators.

Where this device really shines is in the area of network communications. The 4 GigE ports are included in the Armada XP itself, so they're much closer to the CPU caches than usual devices which communicate via a PCIe bus. And this design pays off. After hacking a little bit the mvneta driver, it becomes obvious that each port is capable of both sending and receiving in parallel at line rate for all packet sizes, resulting in exactly 1.488 million of packets per second (Mpps) in each direction. This is something rare and very hard to achieve with more conventional hardware, so that made me want to try to port some network stress testing tools to this platform.

Note that there are other devices using the same family of CPU. I also have a Mirabox running an Armada 370, which is a low-end single-core CPU with a 16-bit memory bus and a smaller cache. It includes two of the same network controllers. What I'm describing here also works with the Mirabox to a certain extent. The limited memory bandwidth and the fact it's a single core prevent this from scaling to multiple ports. The peak performance is also about 10% lower.

Stateless HTTP server : principle

HTTP is a pretty simple protocol when you only look at the exchanges on the wire. It's what I call a "ping-pong" protocol : each side sends one thing and waits for the other side to respond. This is only true for small data transfers, and does not take pipelining into consideration. But for what I need in tests, it's very simple.

I've long been wondering if it was possible to use this "ping-pong" property to build a totally stateless server, which means a server which would only consider the information it gets from the packets and which would not store any session. Looking what a transfer looks like at the TCP level, it's clear that it is possible. Even when optimized, there's everything there for the job (please consult RFC793 if you have difficulties following these exchanges, as I won't paraphrase it here) :
Basic HTTP fetch
Faster HTTP fetch
For the server, all the information is provided in the client's ACK. If you look at the ACK and compare it to the initial SEQ sent by the server, you can determine exactly what step is being processed, so how to act accordingly. The problem is that after the response is sent, the server does not necessarily know how long the response was, so by how much it could have shifted the next sequence numbers. So the idea was to use only the lower bits of the sequence numbers to store the state. That way, each response size just needs to be adjusted so that the next sequence number matches the value we want to assign it.

For this first implementation, I wanted to support multi-packet responses, so I decided to have a limit of 16 states, resulting in 4 bits for the state and the rest for the transfers. That means that responses have to be rounded up to the next multiple of 16 bytes plus or minus the shift to reach the desired state. In HTTP we can easily do this using headers. So I added an "X-Pad" header which serves exactly that purpose. Another point is that the size of the Content-Lengthheader varies with the size of the response. So we need to adjust X-Pad last. Both the SYN flag and the FIN flag count as one unit in sequence numbers (just like one byte), so when we plan on sending any of them, we must also count one unit. This imposes some constraints on the states ordering, but they are easily met. For example, the response contains both the data and the FIN packet. Some clients will ACK the data first, then the FIN. This results in two ACKs offsetted by exactly one point. So in order to properly handle these two different acknowledgements, the two respective states must have a value with a difference of exactly one.

The beauty of this mechanism is that it even supports HTTP keep-alive (serving multiple objects over the same connection) and resists to packet losses since the client will retransmit either a request or an acknowledgement and the server will always do the same thing in response. Note that the multi-packet feature is not totally reliable for two reasons :
  • clients generally wait 40ms before acknowledging one segment, so the transfer is slow, unless segments are sent two at a time, but then we need a reliable way to distinguish their acks and to recover from partial losses
  • if a client's ACK for an intermediate packet is lost, the session will remain stuck as nobody will retransmit.

I found one ugly solution to all of these issues, which can work when the client supports the SACK extension. The principle is to send all segments but the first one so that the client constantly acks the first one and indicates in the SACK extension what parts were received. But this becomes complex, not universally usable and in the end does not provide much benefit. Indeed, when I designed this mechanism, I had objects up to 5-10kB in mind in order to try to fill the wire, I didn't imagine I would saturate a wire with single packet objets! So a next implementation will probably only use 2 bits to store the 4 states needed to perform a single-packet transfer and will not support the multi-packet mode anymore. Also with only 4 states, we'll be able to send even-sized packets more often than now. The complete state machine looks like this :
Complete state machine

Stateless HTTP server : first implementation

The first implementation of this server was made as a module for Linux kernel 3.10.x. This module registered a dummy interface which responds to any TCP port accessed through it. The concept is ugly but it was easy to implement. The performance was quite good. On the OpenBlocks, 42000 connections per second were achieved this way, using a single external NIC bound to a single CPU core. This means that about 84kcps could be reached with incoming traffic split on two NICs, which was confirmed. This is not bad at all, it's basically the same level of performance that httpterm gives me on a Core2 Duo at 2.66 GHz. But it's not huge. The issue is that the packets have to pass via all the routing stack, defeating a little bit the purpose of the server. However this mode is convenient to run locally because there is no inter-cpu communications, a response packet is produced for each incoming packet in the context of the sending process.

Stateless HTTP server : NFQueue implementation

The second implementation was done using NFQueue (Netfilter Queue). It's very easy to use and allows packets to be returned very early (in the raw table). So I wanted to give it a try. The result is basically the same as with the interface, except that two CPU cores are involved this time, one for the network and the other one for the user process acting as the server. However for local tests when you have lots of spare cores, it becomes more interesting than the interface version because it reduces the overhead in the network stack, increasing the limit of performance a single process may observe (typically 105k conn/s vs 73k on a Core2 Quad 3 GHz, with one CPU at 100% for the server).

Ndiv framework to the rescue

These numbers are both encouraging and frustrating. They're encouraging because they prove that the mechanism is good and efficient. And they're frustrating because we spend most of our time at places we'd prefer to avoid as much as possible.

So I decided it was time for me to be brave and finish the work I started 6 months ago on my ndiv framework. This is the Ethernet Diverter framework with which I could verify that the mvneta NICs are able to saturate the wire in both directions. Basically it consists in intercepting incoming packets the closest possible to where they're collected in the drivers, and deciding whether to let them pass, drop them or emit another packet in response. I already had an unfinished line-rate packet capture module using it. I temporarily stopped developing on it by lack of time, of needs, and feedback. I needed to implement the ability to forge response packets but I was not happy with its API which was already difficult to use an inefficient. I presented it in details to my coworker Emeric Brun with whom we could define a new "ideal" API that would be optimal for hardware assisted drivers and well balanced so that neither the application nor the driver has too much work to do.

After one full day of work, I could adapt the mvneta driver to the new ndiv API and make it respond packets! The driver looks like the diagram below with the framework plugged into it. The beige part is the ndiv "application" called by the ndiv-compatible driver.
How NDIV is inserted into the network stack
Among the cool things provided by the framework, we can enumerate the fact that it considers the role of the driver (or NIC) to validate incoming protocols and checksums, and to compute outgoing checksums if the application needs so. This makes sense because noawadays, most NICs do all this stuff for free and we'd rather not have the application do it. Similarly, if some checksums have to be computed by the driver or NIC on outgoing packets, it's the responsibility of the application to indicate the various header lengths because it already knows them.

Stateless HTTP server as an Ndiv application

After completing the port of ndiv to mvneta, I was absolutely impatient to see the stateless server run directly in the driver as an ndiv application. It did not take long to port it, just a few hours, and these hours were spent changing the sequencing of the code to clean it up since it was not needed anymore to compute checksums in the application.

The results are astonishing. First, when bombarded with a SYN flood from 5 machines, the theorical limit is immediately reached with 1.488 Mpps in both directions. The CPU usage remains invisible since the periods are too short for the system to measure them. I developped a tool just for this instead.

Second, it appears that line rate is almost always achieved for whatever object size. In keep-alive mode, line rate is achieved for objects of 64 bytes and above, at 564000 requests per second and 94% of one CPU core. Empty responses go higher, 663000 requests per second, but the wire is not full (816 Mbps). The reason is that Ethernet frames are padded to 64 bytes and that for too short responses, there's automatically some padding appended. It is also important at these rates not to forget about Ethernet's preamble (8 bytes) and Inter-Packet-Gap (IPG) of 12 bytes, totalizing 20 bytes. This overhead is represented in yellow on the diagram below.
Performance at various object sizes
The transfers in HTTP close mode are excellent as well. The OpenBlocks reaches 340000 HTTP connections per second. This means a connection establishment, an HTTP request, a fast close (FIN then RST). This is 3 packets in one direction, 2 in the other one. The theorical limit for this test is 496000 connections per second (1.488 M/3). It happens that my client (inject36) sends very large requests (about 166 IP bytes). So if we do the math, we have :
  • 64 + 8 + 12 bytes for the SYN packet = 84 bytes
  • 166 + 14 + 8 + 12 bytes for the request = 200 bytes
  • 64 + 8 + 12 bytes for the RST packet = 84 bytes
So for each request, the clients have to upload 368 bytes on the wire. This times 340000 equals exactly one gigabit (125000000 bytes). So in practice we're still not saturating the device nor its CPU, just the wire again. Just for the comparison, it's 3 times as fast as what I can achieve on a Core i7 3.4 GHz using httpterm.

Conclusion

First thing is that one may note that I rarely spoke about CPU usage. That's the beauty of this device. The CPU is fast enough so that a whole HTTP request parsing + response takes less than 1.4 microsecond and supports being done at line rate. The second point is that the network connectivity inside it as fantastic. I can achieve with this device packet rates that I cannot achieve with some very respectable 10G NICs. Now I urge Marvell to develop a next generation of Armada XP with a 10G NIC on chip! Now what is absolutely cool is that I finally know I won't ever have any problem anymore in benchmarks with the components being too short. Well I still need the clients... By the way, in theory it is possible to develop a client on the same model. The only thing is that the applications I implement in ndiv are reactive, which means they need some traffic to respond to. So we won't initiate a connection this way. One elegant solution however could be to use a classical SYN flooder on the device to initiate connections to the server, which in turn will respond and sollicit the client. But I'm still not completely convinced.

Other things I'd like to experiment with in the near future is porting the ndiv framework to more NICs (at least my laptop's e1000e) and to the loopback interface, so that we can even use the stateless server when developing on the local machine. I've started the ndiv project with a line-rate packet capture module which is not complete. I'm wondering if other uses can arise from this framework (eg: accelerators, load balancing, bridges, routing, IDS/IPS, etc...). Thus I'm not sure whether it's worth submitting for mainline. Any feedback would be much appreciated.

Concerning the stateless HTTP server itself. It has limited uses beyond test environments. But still I can think about delivering very small objects (favicon, redirects, ...) that fit in a single TCP segment and do not require any security. It can also be used for various types of monitoring devices which are ethernet-connected and which prefer to report measures using HTTP to make it easier for their clients to retrieve them. Some system identification or configuration might also be retrieved using such a mechanism embedded in very dumb devices which don't even have an IP stack.

Downloads

The code is available here in the form of Linux kernel patches. They are supposed to be applied on top of 3.13-rc3..rc7. There are 6 distinct series in these patches :
  • 01xx : add support for retrieving the device's MAC address from the boot loader. Not strictly needed but quite convenient as this avoids running with random MAC addresses ;
  • 02xx : some fixes for the mvneta driver ; they are required.
  • 03xx : improvements for the mvneta driver ; they are required as well.
  • 04xx : the NDIV network frame diverter framework. Required of course!
  • 05xx : driver support for the NDIV framework (currently only mvneta).
  • 06xx : the SLHTTPD server.

Useful links

2013-02-24

Mirabox: much better than GuruPlug

This quick review aims at describing my first contact with GlobalScale's Mirabox, how it compares with the other machines I've used before, namely the GuruPlug and the Dockstar, then how I managed to unbrick it.

Switching to the Mirabox

The Dockstar is nice but still limited to one port and has limited performance. After all the overheating issues, GlobalScale finally abandonned the GuruPlug Server Plus and replaced it with the DreamPlug which was a much nicer and safer design. I had one in hands and was considering buying one. I'm among the people who complain a lot about poor quality and point the finger at companies who put awful products on the market and do nothing to fix them. But when these people go back to the blackboard to completely redesign the product, I applaud. So I was OK with ordering a new product from them again.

When wandering on GlobalScale's web site last year, I noticed some teasing for the upcoming Smile Box, then the Mirabox. Both were using the same platform, a new Marvell Armada370 (ARMv7) at 1.2 GHz. The Mirabox has 1GB of RAM, 1GB of NAND flash, 2 GigE ports, a PCIe port, a MicroSD slot, an RTC with a battery, 2 USB3 ports, well it looked really nice. It was not much more expensive than the DreamPlug, so I finally decided to order one as well as a JTAG adapter in case things go wrong.

As soon as I received the box, I couldn't resist opening it. I was quite impressed by the quality of the hardware design. There is a very clean PCB with BGA chips on both sides, not a single wire at all, not even a heatsink. The device is very thin, basically the thickness of the RJ45 ports. There are jumpers inside, as well as serial and JTAG connectors that are compatible with the GuruPlug's adapter. Be careful when opening, the small plastic part which conducts the light from the leds sits in an unstable position and is annoying to reinstall. I finally glued it to the case.

Board inside
Enclosure
Unstable parts
WiFi antenna
Board bottom
Board top
Among some nice things I noted, I found that the internal serial ports were connected to the same serial port as the USB console (which goes to a PL2303 chip), and since they're using pull-ups, both are usable simultaneously. The PL2303 chip is powered by the USB and not by the Mirabox so that you don't lose the ttyUSB from the client when you power-cycle the Mirabox. This is much appreciated, the Snowball should adopt such a design. The device does not heat much and the CPU can always be touched. The MicroSD and MMC internal connectors are directly connected to the USB2.0 ports of the SoC. The jumpers are there to change the CPU/cache/DRAM frequencies though I only identified a few of them at the moment.

The Good, the Bad and the Ugly

A debian is installed on the device. I quickly installed haproxy to see if the network performance was any better than with the Dockstar. I noticed that traffic would not flow at all ! After installing strace, I discovered that the splice() system call would systematically fail in an unexpected way, meaning that some nasty untested patches were applied to the 2.6.35.9 kernel. So I went to their site to download the sources an found none at all. The only thing I could find there were a binary kernel image and a broken boot loader image (Note that a few days ago, the boot loader image was fixed there, and the kernel and U-Boot sources were finally released on Plugcomputer.org).

So I continued the tests by disabling splice() in haproxy and found that the performance was very low due to iptables and conntrack being hard-linked and impossible to disable.

So I looked on the net to find another kernel. I found one in Arch Linux ARM. Fine! Tried to boot it, it booted correctly and UBIFS complained about a lot of errors, then the kernel died consecutive to the inability to mount the root FS. After that the original kernel would also fail to mount the rootfs. Thanks to the captures I had taken earlier, I finally found that the config in the boot loader is wrong about the partitions sizes, which must have been hard-coded into their proprietary kernel that ignores the boot loader's settings. And it seems that UBIFS performs some recovery attempts before failing, resulting in a corrupted FS. Pfff.....

I could boot it with a Formilux rootfs and their proprietary kernel to recover the installation by reformating the partition and reflashing the original rootfs which is provided on their site. I got a bit angry at the product because it's full proprietary and bogus. I want to use it as a gigabit network sniffer and its network performance sucks because of the proprietary kernel!

First hope

Looking at kernel 3.7-rc sources, I found that the platform was recently introduced by the cool guys at Free Electrons, namely Thomas and Grégory, which are already known for porting Linux to a number of devices. So I contacted them to get some pointers and they told me about the Git repository where all their work is. I could test their latest work and could start hacking the device and providing them some feedback on some of their patches.

Second crash

Unfortunately I managed to boot a kernel with the incorrect partition table a second time and it again destroyed my rootfs and marked a lot of blocks as bad... Except that this time even after several passes of flash_erase and nand_write, there were some remains of "UBI" on some blocks, and I couldn't get rid of the fake bad blocks. Probably that the NAND driver is still a bit young... I finally used the nand scrub option in the boot loader to completely erase the rootfs partition... Error! This one is bogus too, it randomly erases other areas, and marks block 0 as bad ! (the boot loader). So now I must not cut power! I only had one attempt left! I reformated the whole flash using nand scrub again and all went fine. I had to reflash the U-Boot boot loader, but the one on the site above was defective. I finally found one on another site. It looked right, contained references to the reference design board. I crossed fingers and flashed it, checked that it was correctly flashed, then rebooted... Aie, Bricked!

BootROM 1.08                                                                    
Booting from NAND flash
BootROM: Bad header at offset 00000000
BootROM: Bad header at offset 00004000
BootROM: Bad header at offset 00008000
BootROM: Bad header at offset 0000C000
...

Second disappointment

My only hope was flashing the NAND via JTAG. I took my GuruPlug JTAG adapter which directly connects to this board and to the GPIO board. But Armada370 is totally unknown from OpenOCD, and no patches are available. Then I was really angry at GlobalScale. They sold me a JTAG device to unbrick the mirabox and which I cannot use at all because the software to use it does not exist! I contacted the support, they offered me to reprogram it for $25, except that shipping costs are as high as the device's price. And they'd reinstall the same bogus kernel that I cannot use. So I declined and thought that I'd rather find some time to try to reverse-engineer the JTAG TAP and the flash controller on a spare week-end. (Note: I was recently told that GlobalScale is considering sponsoring a port to OpenOCD, which is nice then).

New hope

One or two months after leaving the device unused on my desk, I decided to get it to work using OpenOCD. I downloaded all the doc, read it, started editing some board and target files, and could detect the TAP ID, which means the JTAG link is OK. I managed to reset the board via JTAG, which was good.

While reading the OpenOCD doc, I noticed something changed in an xterm behind. It was my minicom connected to the Mirabox which stopped scrolling on the "BootROM Bad Header" messages, and which displayed "Trying to boot from UART" ! UART ? I searched the net and found references to kwboot which is a tool made for loading firmwares into Marvell SoC via the serial port. In fact, Thomas had suggested it to me a while ago but I forgot since I couldn't figure how I could use it. Anyway now I got hope again and left OpenOCD to completely focus on kwboot.

Booting an image using kwboot

So I downloaded kwboot. I initially didn't find it alone, it's integrated into U-Boot in this tree and depends on a number of includes from this tree. I was about to clone the whole repository when I found a precompiled binary version here.

This utility sends a "magic" sequence to the BootROM boot loader installed in the Armada370's ROM (or maybe in the small I2C flash that's next to it, I don't know). The magic is a 8-byte sequence : 0xBB 0x11 0x22 0x33 0x44 0x55 0x66 0x77. The boot loader reads this image before the first attempt to boot, and each time it loops over the whole flash image (which is quite long on the 1 GB NAND). So it's better to start sending the sequence with the Mirabox powered down, and then to power it up. That's why it's very important that USB consoles are powered by the PC and not by the device!.

Fixing the boot loader image

The original U-Boot image could not be loaded via kwboot. The reason is that the first byte of the image indicates the boot device. Here I have 0x8B which indicates the image boots from the NAND. To boot on the serial port, we need 0x69. Changing it by hand will not work because there's a checksum. kwboot is able to patch this byte and recompute the checksum. But it looks like the cheksumming algorithm has changed on this new platform, because a fixed image does not boot either, and the original image does not show the correct checksum.

I found another tool, kwuartboot. I tried it in case it would handle a different checksumming algorithm, but that did not work either, it failed very similarly. So I concluded that I had to find how to regenerate my boot loader image to boot from the serial port and write the correct checksum.

Find new doimage

I found various incompatible versions of the "doimage" utility used to produce the boot image. I finally found that a version packaged for ArmadaXP was compatible with my Mirabox. (Note that since the patched U-Boot sources have finally been released, there is no need anymore to use the ArmadaXP image). Here's how I managed to rebuild a working doimage utility :

1. retrieve this U-Boot patch

2. rebuild the sources from the patch :

$ mkdir tmp-tools-doimage
$ cd tmp-tools-doimage
$ patch -Ntp1 < ../u-boot-2009.08-mv78460-20110404.patch >/dev/null 2>&1
$ cd tools
$ tar cf - doimage_armada_xp | gzip -9 > ../../doimage_armada_xp.tar.gz
$ cd ../..
$ rm -rf tmp-tools-doimage
3. rebuilt the executable from the sources :
$ tar zxf doimage_armada_xp.tar.gz
$ cd doimage_armada_xp
$ make

Find the various parts in the original image

The sources helped me a lot understand the image format. There are two binary images included in the U-Boot image, one is the DDR3 initialization code, and another one is the boot loader itself. There are several headers in the image and checksums that are computed on the global image. So it is possible to extract the embedded images, if necessary to modify them, then to reassemble them together and put the 32-bit checksum on them. After probably more than one hundred of attempts, I found that I needed the following parts of my original mtd0 partition :
TypeOffsetLengthmd5sumBegins with
DDR30x2448584f4b165ce...02 00 00 00 5B 00 00 00
U-Boot0xC000675420c739a005...12 00 00 EA 14 F0 9F E5
These images can then be assembled together using the shiny new doimage utility, to produce a new u-boot.bin image :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr.bin \
  mtd0-uboot.bin u-boot.bin

Still fails to load at 48k and issue DDR3 on the console

I then tried to flash the image this way :
$ kwboot -b u-boot.bin /dev/ttyUSB0

Unfortunately, the mirabox would still reject this image, but in a new and more consistent way :   it systematically rejects the image after 48kB of data, and the green LED "D5" turns on   just at the moment of the hang. I switched to kwuartboot to see if I got the same error, and it   behaved exactly the same way. So I modified it to display the invalid characters it got that   confused it, and I saw "DDR3 Training Sequence". Wow! This means that the DDR3 code was   executed, because this line is normally displayed at boot when the Mirabox boots. So I suspected   that the DDR3 initialization code is loaded into the cache or some SRAM on the device, which must   then initialize DDR3 to load the rest. So if this code is executed early during the boot sequence   and displays things on the console port I'm using to upload an image, it is conceivable that it   breaks the upload sequence... 

Patch the DDR3 thing to talk to ttyS1 instead

So I wondered how to shut that DDR3 code down. Since the device has two serial ports, I thought that it could be easier to make it chat on ttyS1 instead. ttyS0 is located at 0x10000 and ttyS1 is at 0x12000. I looked for occurrences of 0x10000 in the image and patched them to use 0x12000 instead. I was a bit scared because on ARM you don't have many bits for immediate values, and I was afraid not to be able to add a 2 there if some higher bit was set. This was not the case, the values were used absolute. I found 4 locations which needed to be patched in mtd0-hdr.bin : 0x1BC0, 0x1C08, 0x1D0C, 0x1D30. I then rebuilt the whole image using doimage :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr-uart1.bin \
  mtd0-uboot.bin u-boot-uart1.bin

New attempt to boot

Last step was to try to boot the image again :
$ ./kwboot  -b u-boot-uart1.bin -t /dev/ttyUSB0
Sending boot message. Please reboot the target...\
Sending boot image...
0 % [......................................................................]
1 % [......................................................................]
2 % [......................................................................]
...
Well, it goes further than 48kB this time, let's cross fingers... After about 45s, I got this :
99 % [.........................................................]
[Type Ctrl-\ + c to quit]

__   __                      _ _
|  \/  | __ _ _ ____   _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|    \_/ \___|_|_|
        _   _     ____              _
       | | | |   | __ )  ___   ___ | |_ 
       | | | |___|  _ \ / _ \ / _ \| __| 
       | |_| |___| |_) | (_) | (_) | |_ 
        \___/    |____/ \___/ \___/ \__| 
** LOADER **

U-Boot 2009.08 (Sep 16 2012 - 22:50:06)Marvell version: 1.1.2 NQ
U-Boot Addressing:
       Code:            00600000:006AFFF0
       BSS:             006F8E40
       Stack:           0x5fff70
       PageTable:       0x8e0000
       Heap address:    0x900000:0xe00000
Board: DB-88F6710-BP
SoC:   MV6710 A1
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access
PEX 0: Detected No Link.
PEX 1: Root Complex Interface, Detected Link X1
DRAM:   1 GB
       CS 0: base 0x00000000 size 512 MB
       CS 1: base 0x20000000 size 512 MB
       Addresses 14M - 0M are saved for the U-Boot usage.
NAND:  1024 MiB
Bad block table found at page 262016, version 0x01
Bad block table found at page 261888, version 0x01
FPU not initialized
USB 0: Host Mode
USB 1: Host Mode
Modules/Interfaces Detected:
       RGMII0 Phy
       RGMII1 Phy
       PEX0 (Lane 0)
       PEX1 (Lane 1)
phy16= 72 
phy16= 72 
MMC:   MRVL_MMC: 0
Net:   egiga0 [PRIME], egiga1
Hit any key to stop autoboot:  0 
Marvell>>

Yesss! For those who want to experiment with this without bricking their devices,   I'm putting here  this working image with a modified prompt to display "Recover>>" instead of   "Marvell>>" so that it's always easy to tell what boot loader you're running from. 

Now reflash the recovered device

Now that I'm on the device again for the first time in a few months, I don't want to see it go away. It's urgent to erase and reflash it. WARNING! do not copy-paste what follows if you don't understand what it is about, it will wipe out your whole device and may render it unusable! First, erase the whole flash :
Marvell>> nand erase clean
Marvell>>

Let's check that the flash correctly shows 0xff everywhere :
Marvell>> nand dump 0
Page 00000000 dump:
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
OOB:                                                                            
        ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff
Marvell>>

OK. We'll pre-fill the memory with 0xFF before loading the boot loader there, to avoid   flashing crap : 
Marvell>> mw.l 0x7000000 0xffffffff 0x00100000
Marvell>> md.b 0x7000000 100
07000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
...
070000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
Marvell>>
Now copy the original mtd0 image from a TFTP server (from a file called "mtd0" there) to that location :
Marvell>> tftpboot 0x7000000 mtd0
Marvell>> md.b 0x7000000 100
07000000: 8b 00 00 00 60 4e 0a 00 01 00 00 c0 00 c0 00 00    ....`N..........
07000010: 00 00 60 00 00 00 6a 00 00 02 01 00 00 00 01 0a    ..`...j.........
...
070000f0: aa 25 ad 00 ed 19 2b 60 00 23 ab 20 80 00 c0 19    .%....+`.#. ....
Marvell>>

The contents look fine (note the 8b at the beginning which means that   this is a NAND flash image). If everything looks OK and only in this case, you   can write the memory contents down to the flash then control that it looks similar to   the dump above : 
Marvell>> nand write 0x7000000 0 0x00400000
Marvell>> nand dump 0

Then reset the device, it should reboot directly from the flash. If it fails, you just   have to retry the procedure above and figure what you got wrong. 
Marvell>> reset

Back to hacking the device again

Since the device is fixed and I don't fear losing it anymore, I'm back playing with it. I'm running it with a 3.8 kernel with all development patches from Free-Electrons, as well as a recent attempt I made to port Marvell's NAND Flash Controller driver to this kernel. Right now it works but the code is ugly, contains many copy-pastes and I don't trust it a lot. I found it safer to buy a microSD card and install my FS there.

I could also remove some of Marvell's patches from their ugly kernel (I could get splice() to work again) and rebase it on 2.6.35.14, but at this point in time, I think it does not make sense to spend more time with this dead kernel, better try to get most features working with 3.8 and 3.9-rc. Oh, and BTW, UBIFS managed to destroy my rootfs again using Marvell's kernel when it tried to mount an uncleanly shutdown rootfs! I don't know if it's the FS or the Flash controller which is to blame, but what's certain is that a filesystem driver should not destroy the data it's responsible for, and that it should at least offer tools to fix devices which report errors. Here, after any minimal error, the file system is definitely lost. So that's one more reason for switching to a replaceable microSD for the root FS.
There are still a number of issues that I would like to see fixed in future versions :
  • the mtdparts variable in the boot loader does not match the real partition size, which apparently is responsible for UBIFS corrupting my root FS.
  • the Ethernet MAC addresses at the bottom of the box are different from those on the sticker on the board ! I don't know which ones are the right one, so if I plug this device on a network with another one sharing the same addresses, it could cause trouble.
  • their bogus kernel needs some fixing. I don't understand why they patched (and broke) the splice() system call. Also, having iptables+conntrack hard-linked really is problematic (and there is no NOTRACK target).
  • the boot loader only reserves 4 MB for the kernel, which is too small for development kernels. Anyway you'll probably have to repartition the flash after the rootfs gets corrupted.
  • the internal jumpers and connectors are not documented.
  • I tried a dual-gigE mini-PCIe boards (i350-based Jetway ADMPEIDLA) and it did not work, it was not even detected. I don't know which one is the culprit since the NIC works on an Atom board and other cards work on this board.

A few captures

Here come a few captures of what people always want to see from a new board :-)

cpuinfo

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
BogoMIPS        : 1196.85
Features        : swp half fastmult vfp edsp vfpv3 vfpv3d16 tls
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0x581
CPU revision    : 1

Hardware        : Marvell Armada 370/XP (Device Tree)
Revision        : 0000
Serial          : 0000000000000000

meminfo

# cat /proc/meminfo
MemTotal:        1034348 kB
MemFree:          998484 kB
Buffers:            4476 kB
Cached:             9820 kB
SwapCached:            0 kB
Active:             7116 kB
Inactive:           8576 kB
Active(anon):       1768 kB
Inactive(anon):       76 kB
Active(file):       5348 kB
Inactive(file):     8500 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        270336 kB
HighFree:         248016 kB
LowTotal:         764012 kB
LowFree:          750468 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1424 kB
Mapped:             2484 kB
Shmem:               448 kB
Slab:               4388 kB
SReclaimable:        952 kB
SUnreclaim:         3436 kB
KernelStack:         280 kB
PageTables:          152 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      703356 kB
Committed_AS:       5236 kB
VmallocTotal:     245760 kB
VmallocUsed:        6120 kB
VmallocChunk:     237500 kB

lspci

# lspci -nnv
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Capabilities: [fc] <chain broken>

00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: c1000000-c10fffff
        Capabilities: [fc] <chain broken>

02:00.0 USB Controller [0c03]: Device [1b73:1009] (rev 02) (prog-if 30)
        Subsystem: Device [1b73:0000]
        Flags: bus master, fast devsel, latency 0, IRQ 105
        Memory at c1000000 (64-bit, non-prefetchable) [size=64K]
        Memory at c1010000 (64-bit, non-prefetchable) [size=4K]
        Memory at c1011000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd

iomem

# cat /proc/iomem
00000000-3fffffff : System RAM
  00008000-0044b073 : Kernel code
  00738000-007f6393 : Kernel data
c1000000-c100ffff : xhci_hcd
d0010300-d001031f : d0010300.rtc
d0012000-d001201f : serial
d0018100-d001813f : /soc/gpio@d0018100
d0018140-d001817f : /soc/gpio@d0018140
d0018180-d00181bf : /soc/gpio@d0018180
d0050000-d00504ff : ehci_hcd
d0051000-d00514ff : ehci_hcd

interrupts

# cat /proc/interrupts
           CPU0
 16:      29435  armada_370_xp_irq  armada_370_xp_per_cpu_tick
 17:       2592  armada_370_xp_irq  serial
 23:        684  armada_370_xp_irq  mvneta
 26:          1  armada_370_xp_irq  d0010300.rtc
 27:        231  armada_370_xp_irq  ehci_hcd:usb1
 28:          0  armada_370_xp_irq  ehci_hcd:usb2
105:          0  armada_370_xp_irq  xhci_hcd:usb3
106:          2  armada_370_xp_irq  d0060800.xor
107:          2  armada_370_xp_irq  d0060800.xor
108:          2  armada_370_xp_irq  d0060900.xor
109:          2  armada_370_xp_irq  d0060900.xor
Err:          0

dmesg

Booting Linux on physical CPU 0x0
Linux version 3.8.0-mbx (willy@pcw) (gcc version 4.5.2 (Sourcery G++ Lite 2011.0
3-41) ) #2 Sun Feb 24 11:58:31 CET 2013
CPU: ARMv7 Processor [561f5811] revision 1 (ARMv7), cr=10c53c7d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine: Marvell Armada 370/XP (Device Tree), model: Globalscale Mirabox
Memory policy: ECC disabled, Data cache writeback
On node 0 totalpages: 262144
free_area_init_node: node 0, pgdat c075ba60, node_mem_map c0b90000
  Normal zone: 1520 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 193040 pages, LIFO batch:31
  HighMem zone: 528 pages used for memmap
  HighMem zone: 67056 pages, LIFO batch:15
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
Kernel command line: console=ttyS0,115200 mtdparts=armada-nand:4m(uboot),4m(uima
ge),8m(nv),16m(rescue),480m(rootfs),-(pad) ubi.mtd=4 root=/dev/ram0
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
__ex_table already sorted, skipping sort
Memory: 1024MB = 1024MB total
Memory: 1020756k/1020756k available, 27820k reserved, 270336K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc044b074   (4365 kB)
      .init : 0xc044c000 - 0xc0736934   (2987 kB)
      .data : 0xc0738000 - 0xc075c480   ( 146 kB)
       .bss : 0xc075c480 - 0xc07f6394   ( 616 kB)
NR_IRQS:16 nr_irqs:16 16
Aurora cache controller enabled
l2x0: 4 ways, CACHE_ID 0x00000100, AUX_CTRL 0x1a086302, Cache size: 262144 B
sched_clock: 32 bits at 18MHz, resolution 53ns, wraps every 229064ms
Calibrating delay loop... 1196.85 BogoMIPS (lpj=5984256)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x354c20 - 0x354c78
devtmpfs: initialized
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 1024 KiB pool for atomic coherent allocations
irq: Cannot allocate irq_descs @ IRQ33, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ69, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ102, assuming pre-allocated
Initializing Coherency fabric
bio: create slab <bio-0> at 0
mvebu-pcie pcie-controller.1: PCIe0.0: link down
mvebu-pcie pcie-controller.1: PCIe1.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
pci_bus 0000:00: root bus resource [mem 0xc1000000-0xc8ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: scanning bus
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:01.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:00:02.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:02.0: calling pci_fixup_ide_bases+0x0/0x11c
pci_bus 0000:00: fixups for bus
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:01: scanning bus
pci_bus 0000:01: fixups for bus
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: bus scan returning with max=01
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:02: scanning bus
pci 0000:02:00.0: [1b73:1009] type 00 class 0x0c0330
pci 0000:02:00.0: reg 10: [mem 0x42000000-0x4200ffff 64bit]
pci 0000:02:00.0: reg 18: [mem 0x42010000-0x42010fff 64bit]
pci 0000:02:00.0: reg 20: [mem 0x42011000-0x42011fff 64bit]
pci 0000:02:00.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:02:00.0: supports D1
pci 0000:02:00.0: PME# supported from D0 D1 D3hot
pci 0000:02:00.0: PME# disabled
pci_bus 0000:02: fixups for bus
PCI: bus2: Fast back to back transfers disabled
pci_bus 0000:02: bus scan returning with max=02
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci_bus 0000:00: bus scan returning with max=02
pci 0000:00:01.0: fixup irq: got 135
pci 0000:00:01.0: assigning IRQ 135
pci 0000:00:02.0: fixup irq: got 135
pci 0000:00:02.0: assigning IRQ 135
pci 0000:02:00.0: fixup irq: got 105
pci 0000:02:00.0: assigning IRQ 105
pci 0000:00:02.0: BAR 8: assigned [mem 0xc1000000-0xc10fffff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:02:00.0: BAR 0: assigned [mem 0xc1000000-0xc100ffff 64bit]
pci 0000:02:00.0: BAR 0: set to [mem 0xc1000000-0xc100ffff 64bit] (PCI address [
0xc1000000-0xc100ffff])
pci 0000:02:00.0: BAR 2: assigned [mem 0xc1010000-0xc1010fff 64bit]
pci 0000:02:00.0: BAR 2: set to [mem 0xc1010000-0xc1010fff 64bit] (PCI address [
0xc1010000-0xc1010fff])
pci 0000:02:00.0: BAR 4: assigned [mem 0xc1011000-0xc1011fff 64bit]
pci 0000:02:00.0: BAR 4: set to [mem 0xc1011000-0xc1011fff 64bit] (PCI address [
0xc1011000-0xc1011fff])
pci 0000:00:02.0: PCI bridge to [bus 02]
pci 0000:00:02.0:   bridge window [mem 0xc1000000-0xc10fffff]
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
pci 0000:00:01.0: enabling bus mastering
PCI: enabling device 0000:00:02.0 (0140 -> 0143)
pci 0000:00:02.0: enabling bus mastering
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource armada_370_xp_clocksource
NET: Registered protocol family 2
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP: reno registered
UDP hash table entries: 512 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
NET: Registered protocol family 1
pci 0000:02:00.0: calling quirk_usb_early_handoff+0x0/0x8c4
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 10608K
bounce pool size: 64 pages
squashfs: version 4.0 (2009/01/31) Phillip Lougher
msgmni has been set to 1486
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
armada-370-pinctrl d0018000.pinctrl: registered pinctrl driver
mv_xor d0060800.xor: Marvell XOR driver
mv_xor d0060800.xor: Marvell XOR: ( xor cpy )
mv_xor d0060800.xor: Marvell XOR: ( xor fill cpy )
mv_xor d0060900.xor: Marvell XOR driver
mv_xor d0060900.xor: Marvell XOR: ( xor cpy )
mv_xor d0060900.xor: Marvell XOR: ( xor fill cpy )
Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
d0012000.serial: ttyS0 at MMIO 0xd0012000 (irq = 17) is a 8250
console [ttyS0] enabled
brd: module loaded
loop: module loaded
libphy: orion_mdio_bus: probed
mvneta d0070000.ethernet eth0: mac: f0:ad:4e:01:a5:f3
mvneta d0074000.ethernet eth1: mac: f0:ad:4e:01:a5:f4
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
orion-ehci d0050000.usb: Marvell Orion EHCI
orion-ehci d0050000.usb: new USB bus registered, assigned bus number 1
orion-ehci d0050000.usb: irq 27, io mem 0xd0050000
orion-ehci d0050000.usb: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Marvell Orion EHCI
usb usb1: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb1: SerialNumber: d0050000.usb
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
orion-ehci d0051000.usb: Marvell Orion EHCI
orion-ehci d0051000.usb: new USB bus registered, assigned bus number 2
orion-ehci d0051000.usb: irq 28, io mem 0xd0051000
orion-ehci d0051000.usb: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: Marvell Orion EHCI
usb usb2: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb2: SerialNumber: d0051000.usb
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
ehci-pci: EHCI PCI platform driver
xhci_hcd 0000:02:00.0: enabling bus mastering
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:02:00.0: enabling Mem-Wr-Inval
xhci_hcd 0000:02:00.0: irq 105, io mem 0xc1000000
usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: xHCI Host Controller
usb usb3: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb3: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 4
usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: xHCI Host Controller
usb usb4: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb4: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
rtc-mv d0010300.rtc: rtc core: registered d0010300.rtc as rtc0
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
IPv4 over IPv4 tunneling driver
TCP: bic registered
TCP: cubic registered
NET: Registered protocol family 17
8021q: 802.1Q VLAN Support v1.8
VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 6
UBI error: ubi_init: UBI error: cannot initialize UBI, error -19
rtc-mv d0010300.rtc: setting system clock to 2013-02-24 14:03:59 UTC (1361714639
)
Warning: unable to open an initial console.
Freeing init memory: 2984K
usb 1-1: new high-speed USB device number 2 using orion-ehci
usb 1-1: New USB device found, idVendor=1a40, idProduct=0101
usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-1: Product: USB 2.0 Hub
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.1: new high-speed USB device number 3 using orion-ehci
usb 1-1.1: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.1: Product: USB Storage
usb 1-1.1: Manufacturer: Generic
usb-storage 1-1.1:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi0 : usb-storage 1-1.1:1.0
usb 1-1.2: new high-speed USB device number 4 using orion-ehci
usb 1-1.2: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.2: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.2: Product: USB Storage
usb 1-1.2: Manufacturer: Generic
usb-storage 1-1.2:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi1 : usb-storage 1-1.2:1.0
scsi 0:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 0:0:0:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 1:0:0:0: [sdb] 15523840 512-byte logical blocks: (7.94 GB/7.40 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 03 00 00 00
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1 sdb2 sdb3 < sdb5 > sdb4
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] Attached SCSI removable disk
mvneta d0070000.ethernet eth0: link up

Jumpers

Zoom on jumpers
Jumpers are numberred like this : J7 J4 J6 J5 J9 J8 J1 J2 J3. Pin 1 is at the top, pin 2 in the middle, and pin 3 at the bottom. We'll note 1 for jumpers between pins 1 and 2, and 0 for jumpers connecting pins 2 and 3.

The only combinations I found to work so far are the following ones. Overall we could say that J6/J5/J2 affect the CPU/L2/DDR frequencies, and athat J7/J4/J9 prevent the system from booting if changed.

CPU@1200 MHz, L2@600, DDR@600 (Default settings)

J7J4J6J5J9J8J1J2J3
011000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1200, L2@800, DDR@400

J7J4J6J5J9J8J1J2J3
011000111
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 800Mhz
       DDR @ 400Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1000, L2@500, DDR@500

J7J4J6J5J9J8J1J2J3
010000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1000Mhz, L2 @ 500Mhz
       DDR @ 500Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1067, L2@534, DDR@534

J7J4J6J5J9J8J1J2J3
010100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1067Mhz, L2 @ 534Mhz
       DDR @ 534Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1333, L2@667, DDR@667

Note: this configuration did not boot.
J7J4J6J5J9J8J1J2J3
011100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1333Mhz, L2 @ 667Mhz
       DDR @ 667Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

Useful links