Myujth

Question

I was inspired after this thread on Stack Overflow to create a random 8 character alphanumeric password generator.

Sadly, it is closed, so I cannot provide an answer there. Anyway, here is the code. Let me know if there are any bugs/bias going on.

using System.Collections.Generic;
using System.Security.Cryptography;

public class AlphanumericGen

 public virtual string Generate(byte length)
 
 ICollection<char> chars = new LinkedList<char>();
 var buffer = new byte[1];
 short counter = 0;
 using (var rng = new RNGCryptoServiceProvider())
 
 while (counter < length)
 
 rng.GetBytes(buffer);
 var nextChar = (char)buffer[0];

 if (nextChar >= '0' && nextChar <= '9')
 
 goto addChar;
 

 if (nextChar >= 'A' && nextChar <= 'Z')
 
 goto addChar;
 

 if (nextChar >= 'a' && nextChar <= 'z')
 
 goto addChar;
 

 continue;

 addChar:
 chars.Add(nextChar);
 ++counter;
 
 

 return string.Concat(chars);

Why aren't you using Char.IsDigit() and Char.IsLetter() instead of your own range tests? — Nov 8 at 3:57
@Barmar: In this case Char.IsDigit() and Char.IsLetter() are probably the wrong choice, since the aim is to create a password with characters from a known range. — 2 days ago
Of course it's slightly more complex because you may need to concatenate mulitple calls but its should be trivial. Given that you need 8 characters and time spent to generate 256 random bytes is negligible...you may even optimize your code for the best case (just use StringBuilder to store chars and put above code in a while (stringBuffer.length < length) (change Take() to actually take only required number of characters...) — 2 days ago
@Shelby - Of course they include Unicode characters (0, for example). In this case, we seem to be selecting from the Latin-1 set, given that buffer contains a byte. We could mask that down to ASCII using & 0x7F (and increase the number of random values we actually use, rather than discard). — 2 days ago
Moderator note: Not everyone has to mention how bad goto is. Many of the comments that were just moved to chat did not add anything constructive at all. You don't have to come up with new ways to say how bad goto is. — yesterday

Wai Ha Lee 1506 · Answer 1 · 2018-11-08 10:19:24Z

up vote
49
down vote

As a general rule of thumb, any time you feel the need to use goto, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.

In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.

I think too a StringBuilder would do a lot better for storing the characters than a LinkedList.

edited 2 days ago

Wai Ha Lee

1506

answered Nov 7 at 20:37

tinstaafl

6,240727

Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday

As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't support break n syntax construct
– Agnius Vasiliauskas
17 hours ago

1

@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
16 hours ago

@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
13 hours ago

@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
3 hours ago

add a comment |

score 20 · Answer 2 · 2018-11-07 21:03:08Z

Data structure

Why are you using a LinkedList<char>? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char array?

Actually, you may want to consider returning the result as a char rather than as a string. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.

Flow control

goto should be avoided, and its use is not justified here.

The loop is a bit clumsy, referring to counter all over the place.

char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())

 for (int counter = 0; counter < length; )
 
 return new string(chars);

Algorithm

The algorithm is rather inefficient:

It reads a byte at a time from the random number generator. .GetBytes() is thread safe, so each call should incur some synchronization overhead.

It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.

A smarter approach would be to fetch a bit more than $3 over 4 mathrmlength$ random bytes, base64-encode it, and discard just the resulting + and / characters.

Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast. — Nov 7 at 20:49
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than a char for this application. — Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-) — Nov 7 at 21:00
@t3chb0t That makes no sense. Both versions should call GetBytes() the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment out rng.GetBytes() and produce the same character every time, mine is much much faster. So, the running time is completely dominated by rng.GetBytes(), and my Base64 suggestion should make the biggest difference of all. — Nov 7 at 21:43
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong. — 2 days ago

score 12 · Answer 3 · 2018-11-09 00:03:11Z

Readability & GoTo

If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.

rng.GetBytes(buffer);
var nextChar = (char)buffer[0];

var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';

if (isNumber || isUppercaseLetter || isLowercaseLetter)

 chars.Add(nextChar);
 ++counter;

Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.

public bool IsAlphaNumericCharacter(char c)
 isUppercaseLetter

Then your loop becomes shorter and clearer.

while (counter < length)

 rng.GetBytes(buffer);
 var nextChar = (char)buffer[0];

 if (IsAlphaNumericCharacter(nextChar))
 
 chars.Add(nextChar);
 ++counter;

Bytes

Is there a particular reason you're using byte for length instead of int? Also, was there a reason for var buffer = new byte[1]; being an array instead of just var buffer = new byte();

If the answer to both of those questions is no, then you could have a character array of size length instead of a LinkedList<char>.

StringBuilder

StringBuilder could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).

You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course. — Nov 7 at 20:53
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha. — Nov 8 at 2:18
"|| is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note. — 2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad. — 2 days ago

Morgen 63139 · Answer 4 · 2018-11-08 03:26:28Z

Avoid premature optimizations, they're really hurting this code.

In the comments you defend the use of goto using the argument that it avoids branch prediction.

Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if statements than it would from a conventional if-else construct, lets look at the context of that optimization to see what it might get you.

As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.

Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.

That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.

The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.

score 5 · Answer 5 · 2018-11-09 18:19:26Z

In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.

It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:

using (var rng = new RNGCryptoServiceProvider())

 var buffer = new byte[length * 5];

 var password = new StringBuilder(length);
 while (password.Length < length)
 
 rng.GetBytes(buffer);
 password.Append(buffer
 .Select(x => (char)x)
 .Where(Char.IsLetterOrDigit)
 .Take(length - password.Length)
 .ToArray()
 );

Notes:

Decision to use length * 5 is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call to GetBytes() is not enough.

It performs pretty well (GetBytes() overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder is highly optimized to build strings).

It should be slightly more fast than base64 encoding (which otherwise is a clever idea).

It's really easy to optimize for the best case scenario adding some more code (for example when ToArray() returns the required string you can avoid StringBuilder() - and its initial allocation - all together).

It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).

Given that your input is a single byte then you can use Char.IsLetterOrDigit directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).

The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.

Are we done? There are some security concerns you should consider, see last section.

As you did imagine goto is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.

goto usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto and if/else differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto (because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.

I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto for most of your career (with few more chances for its little brother goto case). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.

Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:

while (chars.Count < length)

 rng.GetBytes(buffer);
 char nextChar = (char)buffer[0];

 if (Char.IsLetterOrDigit(nextChar))
 chars.Add(nextChar);

What did we do? We moved some logic into a separate function, this has two advantages:

It makes logic easier to follow because we can read the function name instead of understanding the code.

If frees us from the need of that goto because we can reduce those multiple ifs.

What if there wasn't Char.IsLetterOrDigit()? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto): counter is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.

Security considerations

You're correctly using RNGCryptoServiceProvider to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.

However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).

I'm pretty sure the outher while could be squeezed into a nice linq chain by using TakeUntil ;-] — 2 days ago
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced to rng.Generate()....Take().ToArray()! — 2 days ago
I'm not sure, you can make that cast from byte to char using Cast<char>() - you'll have to use Select(b => (char)b). If you move rng.GetBytes(buffer); inside the while-loop as the first thing you'll have a fresh set of bytes for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits — 2 days ago
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why? — 2 days ago
Btw, coincidentally the optimization with getting a larger buffer and callig GetBytes fewer time beats any other version presented here. — 2 days ago

score 4 · Answer 6 · 2018-11-09 15:13:06Z

There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.

I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.

Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator that is the base class of the RNGCryptoServiceProvider. It would enumerate only chars that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of chars.

public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)

 var buffer = new byte[bufferSize];
 while (true)
 
 rng.GetBytes(buffer);
 foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
 
 yield return c;

By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.

From here, it's now very easy to build your password generator. Create any random-number-generator, Take as many chars as you need and create the string.

public static string Generate(int length)

 using (var rng = new RNGCryptoServiceProvider())
 
 return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());

And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes (discovered by @Adriano?).

So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!

There is one thing about goto that hasn't been said yet... and you're using something very similar everyday, it's called switch:

public static void Main()

 var num = 1;

 switch (num)
 
 case 1:
 goto case 3;
 case 2:
 Console.WriteLine("Hi goto!");
 break;
 case 3:
 goto case 2;

Yes, you can jump within its scope from case to case in any order.

score 3 · Answer 7 · 2018-11-08 20:54:07Z

goto, part deux

I don't agree with the mass consciousness that goto is a code smell.

code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.

Why goto could be bad

We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.

Why goto is bad

goto murders the bedrock of all good programs - structured programming and modular construction. goto literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.

As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto is a code lobotomy. Many factors contribute to crumbling code but the more gotos the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."

So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful

A very readable, practical, nicely concise "using goto wisely" discussion

These are the best arguments against goto I've ever seen.
– t3chb0t
yesterday — yesterday

score 1 · Answer 8 · 2018-11-08 18:47:55Z

I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.

The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte based random selection be biased because 256 % 62 != 0. The problem can be solved by either adding two more chars resulting in 64 char:

 const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";

or reduce the valid chars to 32:

const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";

There are pros and cons to both obviously, but both can be used i an efficient algorithm like:

public string Generate(int length)

 const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";

 using (var rng = new RNGCryptoServiceProvider())
 
 char result = new char[length];
 byte buffer = new byte[length];
 int range = chars.Length;
 rng.GetBytes(buffer);

 for (int i = 0; i < length; i++)
 
 result[i] = chars[buffer[i] % range];
 

 return new string(result);

This should be unbiased.

I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.

stephan mantler 112 · Answer 9 · 2018-11-10 13:40:52Z

Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.

An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.

As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.

score -3 · Answer 10 · 2018-11-09 10:37:57Z

up vote
-3
down vote

I would use a HashSet:

HashSet<byte> validChars = new HashSet<byte> 48, 49, 50 ; //add all valid

........

rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))

 chars.Add(char(buffer[0]));
 ++counter;

edited yesterday

answered 2 days ago

paparazzo

4,9911733

3

Mind elaborating on why you'd use a HashSet?
– Shelby115
2 days ago

@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago

2

@paparazzo: but this aspect of the original code is also O(1). Using a HashSet here is slightly slower due to extra overhead.
– Pieter Witvoet
yesterday

@PieterWitvoet Not buying. Good day.
– paparazzo
yesterday

2

Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
yesterday

|
show 1 more comment

Wai Ha Lee 1506 · Answer 11 · 2018-11-08 10:19:24Z

up vote
49
down vote

As a general rule of thumb, any time you feel the need to use goto, take a couple of aspirin and lay down until the feeling passes. They probably should have been deprecated decades ago.

In this particular case, using a string of allowed characters and randomly picking an index in that string would do away with your goto's.

I think too a StringBuilder would do a lot better for storing the characters than a LinkedList.

edited 2 days ago

Wai Ha Lee

1506

answered Nov 7 at 20:37

tinstaafl

6,240727

Comments are not for extended discussion; this conversation has been moved to chat.
– Jamal♦
yesterday

As a general rule of thumb, any time you feel the fear of using goto, drink several drops of valerian and lay down until the feeling passes - then start using goto's. Sometimes goto is an invaluable tool - for example to break N nested loops if some language doesn't support break n syntax construct
– Agnius Vasiliauskas
17 hours ago

1

@AgniusVasiliauskas if you feel the need to break several loops at once, your code might benefit from some factoring or a custom iterator
– NieDzejkob
16 hours ago

@NieDzejkob 1) In general - employer don't wants to pay salary for code refactoring - it gives no added value for a business, so you can't refactor just because "code looks ugly" for you. 2) There are a lot of legacy code which are very complex in structure and may be not-refactorable easily at all. Big TTA (Time-To-Accomplish) is a no go for business 3) "Refactoring" means different things to different people (no single cure). I for example- would refactor bunch of iterators into several goto's, if that would help to decrease code size, etc.
– Agnius Vasiliauskas
13 hours ago

@AgniusVasiliauskas factoring. Not refactoring.
– NieDzejkob
3 hours ago

add a comment |

score 20 · Answer 12 · 2018-11-07 21:03:08Z

Data structure

Why are you using a LinkedList<char>? Linked lists involve a lot of overhead for each node. You know exactly how long the result should be, so why not write to a char array?

Actually, you may want to consider returning the result as a char rather than as a string. A security-sensitive program may wish to wipe the contents of the secret by overwriting it with NUL bytes, rather than waiting for the garbage collector to clean up the string.

Flow control

goto should be avoided, and its use is not justified here.

The loop is a bit clumsy, referring to counter all over the place.

char chars = new char[length];
using (var rng = new RNGCryptoServiceProvider())

 for (int counter = 0; counter < length; )
 
 return new string(chars);

Algorithm

The algorithm is rather inefficient:

It reads a byte at a time from the random number generator. .GetBytes() is thread safe, so each call should incur some synchronization overhead.

It throws away most of the bytes it reads, keeping each byte with a probability of $62 over 256$.

A smarter approach would be to fetch a bit more than $3 over 4 mathrmlength$ random bytes, base64-encode it, and discard just the resulting + and / characters.

Linked lists involve a lot of overhead for each node. - do they? I many cases they are pretty damn fast. — Nov 7 at 20:49
@t3chb0t "Another disadvantage of linked lists is the extra storage needed for references, which often makes them impractical for lists of small data items such as characters or boolean values, because the storage overhead for the links may exceed by a factor of two or more the size of the data. … It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory pools." Definitely much much worse than a char for this application. — Nov 7 at 20:53
ok, in general this is true but for a small password generator not really critical - I wouldn't panic - unless OP is generating several millions of them - they are like linq, convinient and with convenience there is a price :-) — Nov 7 at 21:00
@t3chb0t That makes no sense. Both versions should call GetBytes() the same number of times (statistically speaking). When I measured, mine is marginally faster. Furthermore, if I comment out rng.GetBytes() and produce the same character every time, mine is much much faster. So, the running time is completely dominated by rng.GetBytes(), and my Base64 suggestion should make the biggest difference of all. — Nov 7 at 21:43
@t3chb0t When the results make no sense, then it is time to consider whether your test is wrong. — 2 days ago

score 12 · Answer 13 · 2018-11-09 00:03:11Z

Readability & GoTo

If you want to do the same action for 3 different if-statements make them one if-statement or make them a function. goto causes massive headaches when it's been months since you've touched the code and you come back to it not remembering a thing.

rng.GetBytes(buffer);
var nextChar = (char)buffer[0];

var isNumber = nextChar >= '0' && nextChar <= '9';
var isUppercaseLetter = nextChar >= 'A' && nextChar <= 'Z';
var isLowercaseLetter = nextChar >= 'a' && nextChar <= 'z';

if (isNumber || isUppercaseLetter || isLowercaseLetter)

 chars.Add(nextChar);
 ++counter;

Now with the conditions named even an idiot like me can come in here and see that you're checking if the character is alpha-numeric. If you wanted to make it even more obvious you could go one step further and make it a function.

public bool IsAlphaNumericCharacter(char c)
 isUppercaseLetter

Then your loop becomes shorter and clearer.

while (counter < length)

 rng.GetBytes(buffer);
 var nextChar = (char)buffer[0];

 if (IsAlphaNumericCharacter(nextChar))
 
 chars.Add(nextChar);
 ++counter;

Bytes

Is there a particular reason you're using byte for length instead of int? Also, was there a reason for var buffer = new byte[1]; being an array instead of just var buffer = new byte();

If the answer to both of those questions is no, then you could have a character array of size length instead of a LinkedList<char>.

StringBuilder

StringBuilder could be beneficial, but if you're at a low amount of characters (like your link suggests) then it's probably not going to make a difference performance-wise (always best to test yourself instead of listening to a stranger on the internet though :P).

You might be doing more work than the usual short-circuit equivalent, by storing comparison results in those (nicely named!) variables before testing. Unless your compiler optimises that away, of course. — Nov 7 at 20:53
I trust the compiler to do great things, hasn't caused me problems yet. That said, 99% of my performance issues are database bottlenecks (mostly do ASP.NET) so I've never really needed to care about my code's performance haha. — Nov 8 at 2:18
"|| is short-circuit" even though you use variables -- Are you sure about that? No, really, that's not me being sarcastic! I'm curious now if the same bytecode is generated. If it is, that'd be worth adding to your answer as a side note. — 2 days ago
uhh whoops, slight oversight on my part. Good catch, my bad. — 2 days ago

Morgen 63139 · Answer 14 · 2018-11-08 03:26:28Z

Avoid premature optimizations, they're really hurting this code.

In the comments you defend the use of goto using the argument that it avoids branch prediction.

Accepting the dubious premise that the compiler produces more optimized code from a series of sequential if statements than it would from a conventional if-else construct, lets look at the context of that optimization to see what it might get you.

As other answers have pointed out, the data structure is poorly chosen, as the cost of a node allocation is going to dwarf the cost of a missed branch prediction.

Assuming you swap that out, as pointed out in another answer, you'll also want to grab multiple bytes at a time from the RNG. That'll help, but even fully optimized that call will cost more than you could possibly save using a questionable branch prediction optimization.

That final bit converting the characters to a string? Yep, that's a traversal so you're looking at one conditional per node, plenty of chances for a bad branch prediction.

The bottom line is that, by focusing on something which makes you feel like you're "thinking for yourself" and "questioning authority" but doesn't actually help, you've given away readability for no measurable benefit.

score 5 · Answer 15 · 2018-11-09 18:19:26Z

In this case, if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version.

It might even be something like this (completely and absolutely untested) which - in average - should also perform better than original implementation:

using (var rng = new RNGCryptoServiceProvider())

 var buffer = new byte[length * 5];

 var password = new StringBuilder(length);
 while (password.Length < length)
 
 rng.GetBytes(buffer);
 password.Append(buffer
 .Select(x => (char)x)
 .Where(Char.IsLetterOrDigit)
 .Take(length - password.Length)
 .ToArray()
 );

Notes:

Decision to use length * 5 is arbitrary (you take only 58 values from the 256 possible). For small strings there are good chances that it doesn't work well, run this few thousands times and see how many times one call to GetBytes() is not enough.

It performs pretty well (GetBytes() overhead is - typically - payed only once) and you're not building lists of expensive structures (StringBuilder is highly optimized to build strings).

It should be slightly more fast than base64 encoding (which otherwise is a clever idea).

It's really easy to optimize for the best case scenario adding some more code (for example when ToArray() returns the required string you can avoid StringBuilder() - and its initial allocation - all together).

It scales well (parallel generation of N passwords or even generation of a huge password in parallel from M chunks).

Given that your input is a single byte then you can use Char.IsLetterOrDigit directly because all the other non Latin letters and digits are > 255 when UTF-16 encoded).

The very next (or previous?) step is to test this code. Can you reliably an extensively test a function which uses a random number generator? Hardly. t3schb0t covers this aspect in his answer.

Are we done? There are some security concerns you should consider, see last section.

As you did imagine goto is the controversial part of your code. Let me address this separately because, it seems, you're prone to use it.

goto usually hurts legibility and you're also (probably) preventing some optimizations compiler might do. Depending where in the compiler pipeline the optimization is applied (near the front-end where goto and if/else differ, or near the back-end - or even after JIT - where they already generated the exact same code) a compiler is probably not optimizing goto (because it's uncommon to be worth of the time and when you're using it then you know what you're doing). In short: if it's because of performance than 99/100 compiler can do a better job than you.

I'm not saying that there are not use-cases (there are) but hey're so rare that hat you may live without writing a goto for most of your career (with few more chances for its little brother goto case). Please do not generalize this to every language, in C - for example - it might be useful for clean-up code and error handling but in higher level languages we have other powerful tools.

Does it make the code shorter or more readable? In your specific case, even without changing anything else, the very same code might be rewritten as:

while (chars.Count < length)

 rng.GetBytes(buffer);
 char nextChar = (char)buffer[0];

 if (Char.IsLetterOrDigit(nextChar))
 chars.Add(nextChar);

What did we do? We moved some logic into a separate function, this has two advantages:

It makes logic easier to follow because we can read the function name instead of understanding the code.

If frees us from the need of that goto because we can reduce those multiple ifs.

What if there wasn't Char.IsLetterOrDigit()? You write your own and 90% you will get rid of those jumps. There are more things in the original code (beside goto): counter is useless, you can use an array instead of a linked list (size is known), it's preferable to avoid var for primitive types, you're hard-coding magic constants, and...you're reducing the entropy of the generated password.

Security considerations

You're correctly using RNGCryptoServiceProvider to generate the random numbers however you're not using a proper one-to-one encoding algorithm to transform those numbers into text. Simply dropping some values reduces the entropy. This is unavoidable if you want a fixed length alphanumeric string.

However if your goal is to generate a password with the desired amount of entropy then for this step you should use base62 (as elsewhere suggested base64 for performance reasons), a Diceware algorithm, or a perfect hash function (no, you can't use modulus for this because it, obviously, has multiple collisions).

I'm pretty sure the outher while could be squeezed into a nice linq chain by using TakeUntil ;-] — 2 days ago
definitely! If working with RNG OP has already (probably) - a generator function and everything might be reduced to rng.Generate()....Take().ToArray()! — 2 days ago
I'm not sure, you can make that cast from byte to char using Cast<char>() - you'll have to use Select(b => (char)b). If you move rng.GetBytes(buffer); inside the while-loop as the first thing you'll have a fresh set of bytes for every iteration to avoid (or at least minimize) the risk of having the buffer filled with only non-nums/digits — 2 days ago
if performance are not a MEASURED issue (read: you're not generating 1,000,000 passwords) then I'd definitely pick the most readable version. - this is the wisest sentence here - OP didn't say anthing about performance but all reviews are extremely focused on it and everyone tries to write the fastest code... why? — 2 days ago
Btw, coincidentally the optimization with getting a larger buffer and callig GetBytes fewer time beats any other version presented here. — 2 days ago

score 4 · Answer 16 · 2018-11-09 15:13:06Z

There is only one answer so far, that by @Adriano Repetti where clean code is mentioned. All others are havily focused on performane.

I'd like to add to this one answer and say that reusibility comes always first. I agree that if performance is not of great importance, you should primarily create an API that is easy to use, mantain and test.

Organizing your code from this point of view would mean to first create an extension for the RandomNumberGenerator that is the base class of the RNGCryptoServiceProvider. It would enumerate only chars that you are interested in. It doesn't have to know how many, that's what we have LINQ for so it creates an endless collection of chars.

public static IEnumerable<char> EnumerateLetterOrDigit(this RandomNumberGenerator rng, int bufferSize = 256)

 var buffer = new byte[bufferSize];
 while (true)
 
 rng.GetBytes(buffer);
 foreach (var c in buffer.Select(b => (char)b).Where(char.IsLetterOrDigit))
 
 yield return c;

By doing so you could use a different provider or even mock it for testing and always return the same values. The options are there.

From here, it's now very easy to build your password generator. Create any random-number-generator, Take as many chars as you need and create the string.

public static string Generate(int length)

 using (var rng = new RNGCryptoServiceProvider())
 
 return new string(rng.EnumerateLetterOrDigit().Take(length).ToArray());

And even though it havily uses LINQ, it's accidentally faster then your code because of the reduced number of calls to GetBytes (discovered by @Adriano?).

So saying that LINQ is always slow is once again proven wrong because most of the time there will be something else that isn't optimal. LINQ is just convenient and this is so great about it!

There is one thing about goto that hasn't been said yet... and you're using something very similar everyday, it's called switch:

public static void Main()

 var num = 1;

 switch (num)
 
 case 1:
 goto case 3;
 case 2:
 Console.WriteLine("Hi goto!");
 break;
 case 3:
 goto case 2;

Yes, you can jump within its scope from case to case in any order.

score 3 · Answer 17 · 2018-11-08 20:54:07Z

goto, part deux

I don't agree with the mass consciousness that goto is a code smell.

code smell means a possible problem, not that is is a problem. So yes, it is a code smell. I too have used goto, only once in the last decade - and it was with C#.

Why goto could be bad

We do not trust ourselves to use it wisely and well. This is borne out in history. There are millllions of COBOL LOC demonstrating that when used routinely goto tends to corrupt flow control, create inter and intra method coupling, and spawns "one off" code duplication; all to such a degree that one should assume the code will break when changed. In other words the code becomes unmaintainable.

Why goto is bad

goto murders the bedrock of all good programs - structured programming and modular construction. goto literally goes around them. Anything you can say about modern languages, coding principles, object orientation, etc. is fundamentally rooted in these two precepts. full stop.

As for history: the unstructured nature of COBOL, for example, gave rise to an "hard jump" coding style and mind set. It's fair to say the liberal use of goto is a code lobotomy. Many factors contribute to crumbling code but the more gotos the more schizophrenic the code becomes. I have worked on a system that gave a dreadful feeling of random execution when stepping through with the debugger. Many existing bugs could not be duplicated and almost no bug could be reproduced reliably. 99 percent of code tweaks induced bugs. I am not exaggerating at all here! Our in house customer head guy told me "we don't bring up problems anymore because we know they won't be fixed."

So wide spread was the problem that Edsger Dijkstra wrote a seminal paper GOTO statement considered harmful

A very readable, practical, nicely concise "using goto wisely" discussion

These are the best arguments against goto I've ever seen.
– t3chb0t
yesterday — yesterday

score 1 · Answer 18 · 2018-11-08 18:47:55Z

I can only add to the already said, that your algorithm has the potential to run for ever - theoretically speaking.

The most effective approach is to select chars from a list of valid chars. If using the numbers [0, 9] and the English letters [a, z] and [A, Z] we end up with 62 entries which will cause a byte based random selection be biased because 256 % 62 != 0. The problem can be solved by either adding two more chars resulting in 64 char:

 const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";

or reduce the valid chars to 32:

const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV";

There are pros and cons to both obviously, but both can be used i an efficient algorithm like:

public string Generate(int length)

 const string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-+";

 using (var rng = new RNGCryptoServiceProvider())
 
 char result = new char[length];
 byte buffer = new byte[length];
 int range = chars.Length;
 rng.GetBytes(buffer);

 for (int i = 0; i < length; i++)
 
 result[i] = chars[buffer[i] % range];
 

 return new string(result);

This should be unbiased.

I deliberately avoid the use of LINQ, because it is nearly always slower than more "native" code. This not to say, I in general avoid using LINQ, because I use it very often and gladly, but it is my experience that when it comes to performance, LINQ is often significantly slower than comparable, well written traditional code.

stephan mantler 112 · Answer 19 · 2018-11-10 13:40:52Z

Testing for character ranges presumes ordering and continuity of those code points. Assuming the Latin alphabet, the algorithm of course works as expected using ASCII and Unicode, but breaks down if another alphabet should be used or in the highly unlikely case of more obscure character sets such as EBCDIC.

An improved solution would be defining an array of allowable code points (as in "ABC...YZabc...xyz012..89", and then using the random number to index into this array. This resolves all above issues, and the function is trivially extendable to support additional characters as desired.

As regards the goto statements, I don't see any real fault here. It is as easy to read and logically identical to a similarly constructed switch() statement with range cases.

score -3 · Answer 20 · 2018-11-09 10:37:57Z

up vote
-3
down vote

I would use a HashSet:

HashSet<byte> validChars = new HashSet<byte> 48, 49, 50 ; //add all valid

........

rng.GetBytes(buffer);
if(validChars.Contains(buffer[0]))

 chars.Add(char(buffer[0]));
 ++counter;

edited yesterday

answered 2 days ago

paparazzo

4,9911733

3

Mind elaborating on why you'd use a HashSet?
– Shelby115
2 days ago

@Shelby115 Uh, O(1) speed?
– paparazzo
2 days ago

2

@paparazzo: but this aspect of the original code is also O(1). Using a HashSet here is slightly slower due to extra overhead.
– Pieter Witvoet
yesterday

@PieterWitvoet Not buying. Good day.
– paparazzo
yesterday

2

Feel free to measure for yourself if you don't believe me.
– Pieter Witvoet
yesterday

|
show 1 more comment

搜尋此網誌

Myujth

Random alphanumeric password generator with GOTOs

10 Answers
10

Data structure

Flow control

Algorithm

Security considerations

goto, part deux

Your Answer

Post as a guest

10 Answers
10

10 Answers
10

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Security considerations

Security considerations

Security considerations

Security considerations

goto, part deux

goto, part deux

goto, part deux

goto, part deux

Post as a guest

Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

ReactJS Fetched API data displays live - need Data displayed static

政党

Random alphanumeric password generator with GOTOs

10 Answers 10

Data structure

Flow control

Algorithm

Security considerations

goto, part deux

Your Answer

Sign up or log in

Post as a guest

Post as a guest

10 Answers 10

10 Answers 10

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Data structure

Flow control

Algorithm

Security considerations

Security considerations

Security considerations

Security considerations

goto, part deux

goto, part deux

goto, part deux

goto, part deux

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

ReactJS Fetched API data displays live - need Data displayed static

政党

10 Answers
10

10 Answers
10

10 Answers
10