[kwlug-disc] about silicon
Doug Moen
doug at moens.org
Sun Dec 13 14:06:37 EST 2020
libre-soc.org also pivoted from designing an open source GPU to an integrated CPU/GPU SOC with a unified memory architecture, which decision the Apple M1 also seems to validate.
On Sun, Dec 13, 2020, at 2:01 PM, Doug Moen wrote:
> It would be awesome if somebody builds this kind of architecture for RISC-V. But that would require someone with deep pockets and a business case. I don't follow RISC-V so I dunno if anything like that can happen. The only libre GPU project I know of is https://libre-soc.org/ and they abandoned RISC-V a year or two ago for a different architecture with massive superscalar out-of-order execution, something they apparently thought was lacking in RISC-V. What Jason said about OOE on the M1 puts their decision in context for me, since I am not a CPU nerd.
>
> On Sun, Dec 13, 2020, at 1:44 PM, jason.eckert wrote:
>> Beefing up the out of order execution prediction is definitely the main reason why the M1 SoC performs well - but this sort of execution can only be efficient if the instruction size remains constant, as is the case with RISC-only architechtures like ARM. This is where SGI MIPS was headed before they died.
>>
>> The main takeaway here IMO is that now that Apple has demonstrated that a phone SoC can be beefed up to perform general-purposed computing well, we'll start seeing more of this hit the market in the workstation space. And when those fast SoC systems start running Linux, developers will flock to them and that will accelerate the adoption of ARM in the cloud/datacenter. Yes, Amazon has their nice Graviton platform, but without developers running ARM on their workstations, adoption of ARM in the cloud/datacenter is not going to gain a lot of traction.
>>
>> If Apple allowed Linux to run natively on their M1 SoC, it would actually be a game-changer in this space. But that would require they release their SoC documentation to the open source community, as well as digitally sign Linux boot components im their secure enclave (neither of which is likely because Apple is as closed as Oracle's wallet ;-)
>>
>> What I'm most interested in seeing in the coming years is what Nvidia is planning for ARM (no matter what they say, they definitely have a plan in mind if they bought ARM).
>>
>>
>>
>> Sent from my Samsung device running Android (basically Linux in drag)
>>
>>
>> -------- Original message --------
>> From: Mikalai Birukou via kwlug-disc <kwlug-disc at kwlug.org>
>> Date: 2020-12-13 13:06 (GMT-05:00)
>> To: kwlug-disc at kwlug.org
>> Cc: Mikalai Birukou <mb at 3nsoft.com>
>> Subject: Re: [kwlug-disc] about silicon
>>
>>
>>> Found a nice blog post explaining why M1 is fast.
>>> https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2
>> I knew it! I felt it all my life! It takes insurmountable amount of time to prepare place for painting, more than painting itself takes. ... Eight preppers of micro-ops in M1 versus four in Intel/AMD.
>> I still have feeling that co-locating memory also helps preppers' result, besides the benefit of RISC's constant length of instruction.
>> It also explains talks of AMD going with ARM. RISC-y business :)
>>>> Rust provides both Atomic Reference Counting (called Arc) and non-atomic Reference Counting (called Rc). You choose the one that makes sense. Hopefully the type system complains if you use Rc in a context where atomicity is required, but I don't use Rust. C++ provides only atomic refcounting in the standard library; for the other kind you roll your own (which I have done).
>>>>
>>>>> <moving into discussing silicon and near it>
>>>>>> Another trick is that Apple's dev languages and frameworks (Swift and Objective-C) use reference counting, which requires atomic increments and decrements. On Intel, these operations are five times slower than non-atomic operations; on Apple Silicon they run at the same speed. This is something I wish the other CPU vendors would get right, because refcounting has some technical advantages over tracing GC, and I use it in software I write. C++ and Rust, both "performance" languages, provide refcounting but not tracing GC.
>>>>>>> Regarding M1. My Understanding is that placement of RAM inside of processor package/silicon is the trick that makes it run fast. Is there anything else?
>>>>>>>
>>>>>>>> The Apple M1 looks decent, but since Apple no longer lets you run Linux on their hardware, I have no desire to ever buy one.
>>>>> Does Rust standard refcounting, or implementation of such pointers need to use atomic in/decrements? Can't it use non-atomic something, given a more detailed knowledge of ownership? Just wondering.
>>>>>
>>>>> _______________________________________________
>>>>> kwlug-disc mailing list
>>>>> kwlug-disc at kwlug.org
>>>>> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> kwlug-disc mailing list
>>>> kwlug-disc at kwlug.org
>>>> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>>>
>>> _______________________________________________
kwlug-disc mailing list
>>> kwlug-disc at kwlug.org
>>> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>>>
>> --
>> Mikalai Birukou
>> CEO | 3NSoft Inc.
>> _______________________________________________
>> kwlug-disc mailing list
>> kwlug-disc at kwlug.org
>> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20201213/ee5e379a/attachment.htm>
More information about the kwlug-disc
mailing list