The Cryptonight algorithm is
described as ASIC resistant, in particular because of one feature:
A megabyte of internal memory is almost unacceptable for the modern ASICs.
EDIT: Each instance of Cryptonight requires 2MB of RAM. Therefore, any Cryptonight multi-processor is required to have 2MB per instance. Since CPUs are incredibly well loaded with RAM (ie: 32MB L3 on Threadripper, 16 L3 on Ryzen, and plenty of L2+L3 on Skylake Servers), it seems unlikely that ASICs would be able to compete well vs CPUs.
In fact, a
large number of people seem to be incredibly confident in Cryptonight's ASIC resistance. And indeed, anyone who knows how standard DDR4 works knows that DDR4 is unacceptable for Cryptonight. GDDR5 similarly doesn't look like a very good technology for Cryptonight, focusing on high-bandwidth instead of latency.
Which suggests only an ASIC RAM would be able to handle the 2MB that Cryptonight uses. Solid argument, but it seems to be missing a critical point of analysis from my eyes.
What about "exotic" RAM, like RLDRAM3 ?? Or even QDR-IV?
QDR-IV SRAM
QDR-IV SRAM is absurdly expensive. However, its a good example of "exotic RAM" that is available on the marketplace. I'm focusing on it however because QDR-IV is really simple to describe.
QDR-IV costs roughly $290 for 16Mbit x 18 bits. It is true Static-RAM. 18-bits are for 8-bits per byte + 1 parity bit, because QDR-IV is usually designed for high-speed routers.
QDR-IV has none of the speed or latency issues with DDR4 RAM. There are no "banks", there are no "refreshes", there are no "obliterate the data as you load into sense amplifiers". There's no "auto-charge" as you load the data from the sense-amps back into the capacitors.
Anything that could have caused latency issues is
gone. QDR-IV is about as fast as you can get latency-wise. Every clock cycle, you specify an address, and QDR-IV will generate a response every clock cycle. In fact, QDR means "quad data rate" as the SRAM generates 2-reads and 2-writes per clock cycle. There is a slight amount of latency: 8-clock cycles for reads (7.5nanoseconds), and 5-clock cycles for writes (4.6nanoseconds). For those keeping track at home:
AMD Zen's L3 cache has a latency of 40 clocks: aka 10nanoseconds at 4GHz Basically, QDR-IV BEATS the L3 latency of modern CPUs. And we haven't even begun to talk software or ASIC optimizations yet.
CPU inefficiencies for Cryptonight
Now, if that weren't bad enough... CPUs have a few problems with the Cryptonight algorithm.
- AMD Zen and Intel Skylake CPUs transfer from L3 -> L2 -> L1 cache. Each of these transfers are in 64-byte chunks. Cryptonight only uses 16 of these bytes. This means that 75% of L3 cache bandwidth is wasted on 48-bytes that would never be used per inner-loop of Cryptonight. An ASIC would transfer only 16-bytes at a time, instantly increasing the RAM's speed by 4-fold.
- AES-NI instructions on Ryzen / Threadripper can only be done one-per-core. This means a 16-core Threadripper can at most perform 16 AES encryptions per clock tick. An ASIC can perform as many as you'd like, up to the speed of the RAM.
- CPUs waste a ton of energy: there's L1 and L2 caches which do NOTHING in Cryptonight. There are floating-point units, memory controllers, and more. An ASIC which strips things out to only the bare necessities (basically: AES for Cryptonight core) would be way more power efficient, even at ancient 65nm or 90nm designs.
Ideal RAM access pattern
For all yall who are used to DDR4, here's a special trick with QDR-IV or RLDRAM. You can pipeline accesses in QDR-IV or RLDRAM. What does this mean?
First, it should be noted that Cryptonight has the following RAM access pattern:
- Read
- Write
- Read #2
- Write #2
QDR-IV and RLDRAM3 still have latency involved. Assuming 8-clocks of latency, the naive access pattern would be:
- Read
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Write
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Read #2
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Write #2
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
- Stall
This isn't very efficient: the RAM sits around waiting. Even with "latency reduced" RAM, you can see that the RAM still isn't doing very much. In fact, this is why people thought Cryptonight was safe against ASICs.
But what if we instead ran four instances in parallel? That way, there is always data flowing.
- Cryptonight #1 Read
- Cryptonight #2 Read
- Cryptonight #3 Read
- Cryptonight #4 Read
- Stall
- Stall
- Stall
- Stall
- Stall
- Cryptonight #1 Write
- Cryptonight #2 Write
- Cryptonight #3 Write
- Cryptonight #4 Write
- Stall
- Stall
- Stall
- Stall
- Stall
- Cryptonight #1 Read #2
- Cryptonight #2 Read #2
- Cryptonight #3 Read #2
- Cryptonight #4 Read #2
- Stall
- Stall
- Stall
- Stall
- Stall
- Cryptonight #1 Write #2
- Cryptonight #2 Write #2
- Cryptonight #3 Write #2
- Cryptonight #4 Write #2
- Stall
- Stall
- Stall
- Stall
- Stall
Notice: we're doing 4x the Cryptonight in the same amount of time. Now imagine if the stalls were COMPLETELY gone. DDR4 CANNOT do this. And that's why most people thought ASICs were impossible for Cryptonight.
Unfortunately, RLDRAM3 and QDR-IV can accomplish this kind of pipelining. In fact, that's what they were designed for.
RLDRAM3
As good as QDR-IV RAM is, its way too expensive. RLDRAM3 is almost as fast, but is way more complicated to use and describe. Due to the lower cost of RLDRAM3 however, I'd assume any ASIC for CryptoNight would use RLDRAM3 instead of the simpler QDR-IV.
RLDRAM3 32Mbit x36 bits costs $180 at quantities == 1, and would support up to 64-Parallel Cryptonight instances (In contrast, a $800 AMD 1950x Threadripper supports 16 at the best).
Such a design would basically operate at the maximum speed of RLDRAM3. In the case of x36-bit bus and 2133MT/s, we're talking about 2133 / (Burst Length4 x 4 read/writes x 524288 inner loop) == 254 Full Cryptonight Hashes per Second.
254 Hashes per second sounds low, and it is. But we're talking about literally a two-chip design here. 1-chip for RAM, 1-chip for the ASIC/AES stuff. Such a design would consume no more than 5 Watts.
If you were to replicate the ~5W design 60-times, you'd get 15240 Hash/second at 300 Watts.
RLDRAM2
Depending on cost calculations, going cheaper and "making more" might be a better idea. RLDRAM2 is widely available at only
$32 per chip at 800 MT/s.
Such a design would theoretically support 800 / 4x4x524288 == 95 Cryptonight Hashes per second.
The scary part: The RLDRAM2 chip there only uses
1W of power. Together, you get 5 Watts again as a reasonable power-estimate. x60 would be 5700 Hashes/second at 300 Watts.
Here's Micron's whitepaper on RLDRAM2:
https://www.micron.com/~/media/documents/products/technical-note/dram/tn4902.pdf . RLDRAM3 is the same but denser, faster, and more power efficient.
Hybrid Cube Memory
Hybrid Cube Memory is "stacked RAM" designed for low latency. As far as I can tell, Hybrid Cube memory allows an insane amount of parallelism and pipelining. It'd be the future of an ASIC Cryptonight design. The existence of Hybrid Cube Memory is more about "Generation 2" or later. In effect, it demonstrates that future designs can be lower-power and give higher-speed.
Realistic ASIC Sketch: RLDRAM3 + Parallel Processing
The overall board design would be the ASIC, which would be a simple pipelined AES ASIC that talks with RLDRAM3 ($180) or RLDRAM2 ($30).
Its hard for me to estimate an ASIC's cost without the right tools or design. But a multi-project wafer like
MOSIS offers "cheap" access to 14nm and 22nm nodes. Rumor is that this is roughly $100k per run for ~40 dies, suitable for research-and-development. Mass production would require further investments, but mass production at the ~65nm node is rumored to be in the single-digit $$millions or maybe even just 6-figures or so.
So realistically speaking: it'd take ~$10 Million investment + a talented engineer (or team of engineers) who are familiar with RLDRAM3, PCIe 3.0, ASIC design, AES, and Cryptonight to build an ASIC.
TL;DR:
- Current CPUs waste 75% of L3 bandwidth because they transfer 64-bytes per cache-line, but only use 16-bytes per inner-loop of CryptoNight.
- Low-latency RAM exists for only $200 for ~128MB (aka: 64-parallel instances of 2MB Cryptonight). Such RAM has an estimated speed of 254 Hash/second (RLDRAM 3) or 95 Hash/second (Cheaper and older RLDRAM 2)
- ASICs are therefore not going to be capital friendly: between the higher costs, the ASIC investment, and the literally millions of dollars needed for mass production, this would be a project that costs a lot more than a CPU per-unit per hash/sec.
- HOWEVER, a Cryptonight ASIC seems possible. Furthermore, such a design would be grossly more power-efficient than any CPU. Though the capital investment is high, the rewards of mass-production and scalability are also high. Data-centers are power-limited, so any Cryptonight ASIC would be orders of magnitude lower-power than a CPU / GPU.
- EDIT: Greater discussion throughout today has led me to napkin-math an FPGA + RLDRAM3 option. I estimated roughly ~$5000 (+/- 30%, its a very crude estimate) for a machine that performs ~3500 Hashes / second, on an unknown number of Watts (Maybe 75Watts?). $2000 FPGA, $2400 RLDRAM3, $600 on PCBs, misc chips, assembly, etc. etc. A more serious effort may use Hybrid Cube Memory to achieve much higher FPGA-based Hashrates. My current guess is that this is an overestimate on the cost, so -30% if you can achieve some bulk discounts + optimize the hypothetical design and manage to accomplish the design on cheaper hardware.
submitted by _trendspotter at
/goodcoin brought up an evaluation we can use to at least measure, rate, or review cryptocurrency based on how good they are.
Here at scamcoin I think we should come up the evaluation on how BAD these guys are, just the opposite of his scale. I suggest you should use this guide also as a reference when you are looking to INVEST in a cryptocurrency.
If you guys have suggestions/ideas feel free to bring it up. It is no way being objective but at least we can try. Power to the people.
The problem I still trying to sort out is to weight some features properly:
Anyway, the proposal
Scammy Scale Rating with simple
Yes or
No:
1) NOT de-centralize & Not open source 2) No open-source at the release 3) "Public offering" period where one has to pay to mine early 4) No fair launch announcement, unreasonable pre-mine and/or any insta-mine 5) Extremely low starting difficulty 6) No adoption among merchants/vendors/region/country. 7) No trading at big exchanges (e.g. not on BTC-e) 8) Not offer any new feature. Clonecoins or Litecoin forking 9) Bad developer supports, lack of community or forum 10) Releasing bad software that impedes certain users from using/mining the coin. Or "faulty" start, "nodes" problem 11) Slow transaction or confirmation time 12) Short block time when mining 13) Inflated to billion of shares/coins or unreasonable supplies 14) Pseudo mumbo jumbo descriptions of the cryptocurrency 15) Poor official website, hastily done 16) No anonymity support (no Zerocoin implementation) 17) Weak security, vulnerable to 51% attack (No POS or Proof of Stake) 18) Aggressive marketing campaign, hijacking forums and threads, tons of giveaways and faucets 19) Bloat future blockchain, e.g. 2GB-60-100GB wallet you have to update even though your wallet has 2 ABC coin. 20) Weak to ASIC,GPU,FPGA miners, BotNet, or does not give everyone at least a fair chance to mine
21) Reversible transaction -- Not sure if I should leave this out
22) Low mining profitability (vs mining Bitcoin)
23) Designed to be inflationary nature
24) No interest rate of earning coin per year
25) No Multi-hashing algorithms
26) Recent release
27) Pump and dump announcements (Twitter, Facebook, subreddits or forums) associated with said cryptocurrency
28) "Rebooting" the coin, or "coin makeover" to make it fair mining/distribution again
29) Shady developers' history and/or shady major fund backers' history/intention
30) No trendsetter or no noise around the web. Use Google Trend as a way to monitor buzzes.
Example:
http://www.google.com/trends/explore#q=quark%20coin%2C%20bitcoin&cmpt=q 31) Not much liquidity and being dependent. How dependent of that alt-coin to that of Bitcoin? Meaning, if Bitcoin goes down 20% in value, will it also dip 20% or more?
32) Unknown or less coins' volume/share percentage jumped through the roof while other known coins have modest or small gain. Example: SexCoin jumped 1200% in a single day while Bitcoin gained 3.8%
33) "Pay first, deliver product [Bitcoin 2.0] later" - essentially investors are convinced they have to pay first in USD/bitcoin because they are promised by the developers/company to have an "EARLY START" or "EARLY SHARES" or "PRE-SHARES" on the best next-generation "Bitcoin 2.0" coin that will make Bitcoin obsolete and will be the next biggest thing in the universe. Except for the part where "the biggest thing in the universe" is nowhere to be seen.
34) Interoperability
35) Purely Proof-of-Stake (POS) coins
We then can further breakdown and give weigh point to each feature and start ranking them.
For example:
CrappyCoin : 10 yes, 15 No. A simple solution is each Yes = 1 point and each No = 0 point. However, subjectively some features should be weight more than others. I should point how to
merchant adoption and
trading/buy/sell at big exchanges,
multi-hashing and
fair mining for everyone should be scored higher. It goes hand-in-hand with its release date.
Feel free to discuss.
submitted by A
FPGA opensource miner has just been released running at 80Mhps but at a cost of $585. The efficiency is stated below quoted from a post in the thread.
At 80 MHps, I will need at least 3 of these to achieve a single 5830 hashrate. That is $595.-x 3 = $1785.- at full price, vs. $190.- for the 5830.
Giving the 5830 is consuming $11.- a month in electricity, and assuming this board will consume zero electricity, it will take more than 145 months, or 12 years to recover the investment, always comparing to a 5830.
BUT:
In
this thread, someone mentioned he is doing
210Mhash/sec after some optimization but he will
cease public posting of his development.
Apologies but no more development information will be posted. I've been offered a 25% share from someone that owns 2 FPGA clusters. If you haven't seen that type of hardware before think a 156 FPGAs per machine.
From those posts what we can understand is that the factors that affect FPGA
now are high procurement cost, low running cost and ease of scalability . What this means is that with the increasing total hash rate of the network (
30Ghash/day last difficultly adjustment) the question becomes when would the difficulty render GPU inefficient in contrast to running cost?
Remember to take into account FPGAs are usually run in clusters and even though it would not be beneficial to buy one outright, those who have access to FPGA are the first movers and eventual dominant forces of the mining market.
Of course, in the end, ASIC is where it's at. Anyone? =D
Edit: read more stuff, added info.
submitted by I've been attempting to read up on the recent acquisition by Intel of Altera and I understand what Altera does from the standpoint they make FPGA chips versus Intel's ServeWorkstation/Mobilty chips. But I can barely explain the difference between a Xeon Chip and a Core i7, wrapping my head around the tech of FPGA is really confusing.
I understand they're made for specific applications, like Wikipedia has them used in guided missles, switches, MRI machines, etc.
But why couldn't someone just code firmware for an intel chip? Are current-tech intel chips not able to use firmware? Is firmware where the "field programmable" part comes in?
I read where the CEO of intel made a comment talking about how this acquisition will enable the continuation of Moore's law. And I'm wondering how he's going to integrate or hybridize the current intel lineup with this brand new set of toys(patents) they just bought.
As always, any insight would be great. I learn best with examples, e.g. I understood the whole bitcoin mining GPU vs CPU situation really well. I just don't have a sense of anything that's using FPGA technology, and where it's advantages/disadvantages lie. Any insight into that decision making process would be fantastic as well.
As always, thank you reddit and thank you
/askscience submitted by FPGA stands for Field-Programmable Gate Array. These devices were very popular among users that did not want to keep mining in the competitive landscape of GPU mining activities. Those devices have been designed in a way that users can configure their integrated circuits once the manufacturing process is completed. Some time later, bitcoin ... FPGA Bitcoin Mining. At the foundation of block creation and mining is the calculation of this digital signature. Different cryptocurrencies use different approaches to generate the signature. For the most popular cryptocurrency, Bitcoin, the signature is calculated using a cryptographic hashing function. 2. FPGA. As most people are well aware of, FPGA stands for Field-Programmable Gate Array. In the bitcoin world, these devices were quite popular among miners once GPU mining became far too ... Currently, the good GPUs have a faster return of investment time, but they are highly dependant on difficulty and Bitcoin price. To be more specific, find your desired hardware in the "Mining Hardware Comparison" wiki page, and then do some calculations (you can for example use my mining calculator ). FPGA mining efficiency (hashing speed/power consumption) is very efficient, compared to GPU mining and drastically outperforms CPU mining. However, ASIC is still faster and more efficient than FPGA.
BitCoin Mining FPGA Card - Duration: ... 4:06. The Outlook on Cryptocurrency Mining - GPU vs ASIC vs FPGA - Duration: 19:57. VoskCoin 31,788 views. 19:57. FPGA ... Bitcoin Mining with FPGAs ... Will FPGA cards replace GPU cards for cryptocurrency mining? Let's review the best hardware for FPGA mining, mining profitability, and our new FPGA mining ri... VoskCoin livestream on the Outlook on Cryptocurrency Mining - GPU vs ASIC vs FPGA with Q&A. Text version of todays video - http://bit.ly/2LaZA5R -- The lands... Do you guys think FGPA's will one day take over GPU Mining? If certain coins do not change algorithms, we will see the dominance of FPGA. Today we take a look at a few coins that potentially have ... Send Your Mining Rig Pics in Discord be featured in upcoming Community Mining Rigs Episodes! Buy GPU's on Amazon - https://geni.us/46Bo1 Favorite GPU For Mining: https://geni.us/MaOtD