<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="font-family:Arial,Helvetica,sans-serif">On Wed, Oct 21, 2020 at 6:49 PM Raoul Duke <<a href="mailto:raould@gmail.com">raould@gmail.com</a>> wrote:</span><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">hah somebody should make a startup to make universal assembly code translator. </div><div dir="auto"><br></div></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">That was almost done historically. See e.g., FX!32[1] though it was targeting the major x86 ISA at that point.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">However, the more modern approach seems to attack the problem at a different layer, probably because source code is more readily available and thus avoids the need for a binary translation pass. LLVM's IR is essentially a typed assembly language which is universal. Instruction selection then decides what real machine instructions to map (tiles of) IR ops to. Another example is Go's assembler, which derives from Plan9's ditto. Here, the assembler is written such that a lot of the instructions are common among all architectures and use the same coherent style. For each architecture a specialized set is added to get access to them. This means you can often get a new ISA working by targeting the common instruction set first, and then blend in the more specialized and optimized instructions later to make the architecture shine.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Real world examples there are prevalent:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Apple uses a "bitcode" format for iOS, I presume a variant of LLVM IR. So they can retarget the binary to a new hardware instruction at a later point. It also gives good statistics on what instructions to add.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Java's bytecode is a famous example of the above scheme.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Microsoft's Managed code for the CLR does the same.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">All provide the benefit of later translation if needed to optimize older code. It isn't perfect, but neither is direct binary translation of assembly. And higher level representations often have structures that are easier to exploit. Though in the case of the JVM and the CLR the translation is JIT with an interpretative step.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">[1] <a href="https://dl.acm.org/doi/10.1109/40.671403">https://dl.acm.org/doi/10.1109/40.671403</a></div><br></div></div></div>