Saturday, March 30, 2013
Saturday, March 23, 2013
Software and IT Law
Finally, the slides from Telerik Academy's "Software and IT Law" presentation are up! Only in Bulgarian, unfortunately, but I might translate them at some point if the authors agree to that.
Copyrigh 2013 (c) Georgi Dimitrov, Teodor Sarakchiev
Copyrigh 2013 (c) Georgi Dimitrov, Teodor Sarakchiev
Wednesday, March 13, 2013
Chlock
Another little donationware app. This time it's the chess clock simulator that I promised my sisters.
chlock.exe - A simple and convenient chess clock.
Readme: https://github.com/staafl/chlock/blob/master/README.md
Application: https://raw.github.com/staafl/chlock/master/build/chlock-latest.zip
Source: https://github.com/staafl/chlock
Sunday, March 10, 2013
Making Sense of Endianness
Machine endianness is one of those tricky concepts that are easy enough to define, but hard to really understand and conceptualize - like numeral systems, polymorphism, pointers and recursion, to give a few more examples. In this brief meditation I hope to share some insights on the topic that I've managed to reach through learning from others and long reflection. Considering that even experienced IT professionals occasionally make false statements regarding what endianness is and how it affects - or doesn't - a particular situation or software component, this overview might help someone reach a comfortable level of understanding more quickly and with less pain and confusion along the way.
Basics: The Definition of Endianness
Wikipedia defines endianness as "the ordering of components (such as bytes) of a data item (such as an integer), as stored in memory or sent on a serial connection." When a software professional talks about endianness, he is referring to some structured binary data which may be interpreted differently depending on how the individual bytes are ordered, just as date notation has a different meaning depending on whether you expect 01-12 to be the 1st of December or the 12th of January. This is the kind of thing most people don't normally think about until something unexpectedly goes wrong - in life and in software.
Whenever an Intel processor reads a 4-byte unsigned integer, it expects the first byte to represent the total number modulo 2^8, the second byte - over 2^8, modulo 2^8, etc. For example:
An AMD processor expects the opposite - there, the same byte sequence would signify the number 2,701,098,260. On the other hand, if the same bytes were read as two adjacent 2-byte unsigned integers, the result would be A0 + 256*FF, 7D + 256*14 on a Pentium and 256*A0 + FF, 256*7D + 14 on an Itanium or a PlayStation 3. The specific endianness-es of the various architectures and the reasons for the different conventions are well described elsewhere, so we will not dwell on them here. Note that in the example above, if the data was read as 4 adjacent single bytes, the result would be the same regardless of the CPU.
Standard networking protocols use big-endian notation for integer fields such as port numbers, while the data itself has no predefined endianness other than the one assumed by the application. Various data formats and devices have conventions about how numeric data (i.e., all data) is to be stored, which occasionally leads to painful complications for the unexpecting software engineers.
Insight: The Meaning of Endianness
In its essence, "endianness" is the way in which the CPU interprets binary sequences in memory. The processor has no concept of "number", unlike humans who associate numbers with intuition and experiences. However, if we design a bijective correspondence between numbers (which are interesting to us) and binary strings (which the CPU can handle), we can use the processor to manipulate the binary strings according to its instruction set, and then we can decode the results back into abstract numbers - or have IO devices help us by representing the bytes in some other fashion, e.g.. as characters on a screen.
In this sense, the endianness of a CPU is the way it handles strings of bytes with respect to the basic arithmetic operations - and some basic primitives such as comparison, jumping and memory addressing. Everything else is built on top of that - looping and branching, array indexing, functions, input and output, and operates correctly so long as the CPU's operations honor the bijection.
Basically, endianness is an encoding between (abstract) numbers and (concrete) binary strings in the same way that UCS is an encoding between (abstract) characters and (abstract) numbers and UTF-8 is an encoding between (abstract) numbers and (concrete) sequences of bytes.
This is why, just like a piece of text data is meaningless unless we know its encoding and language, binary data is meaningless without knowing the endianness with which it was encoded - even if we know that such and such bytes represent a 4-byte signed integer, we can't tell if it's the number 2 or 33,554,432.
Practical issues: When does endianness matter?
Short answer, when doing IO. In the case of user input/output, it's not much of a concern, since your OS and device drivers will transparently take care of the matter and make sure you get the correct byte strings in memory. Regardless of the technology stack you're working in, a correctly working program working on properly represented input data will produce correct, properly represented output. No one (except for the OS), needs to know how the CPU does arithmetic on binary strings. Once you have a properly constructed data object, the logic will work.
However, when dealing with files and networks, if the data is in an endianness-sensitive format and originated on another computer with different or unknown endianness, you need to take steps to insure it's properly interpreted at your end. Just as for security you need to watch your input and sanitize possibly tainted data, for correctness you need to keep track of which data may have been encoded on a device with different endianness, and process it accordingly before feeding it into the business logic of your application - and you should also be aware when some supposedly helpful intermediate component does the conversion for you so you won't corrupt your data by processing it again.
The other case you need to keep endianness in mind is when for some you're reading raw memory. Here's a tiny C# program for testing your processor's byte endianness:
I plan to give some examples of when endianness-related issues in practice a bit later later.
As a footnote, UTF-16 "Big Endian" and UTF-16 "Little Endian" have nothing to do with CPU endianness - they're two number-to-bytestring encodings. The fact that the bytestrings are related so that wrong assumed encoding and different CPU endianness cancel each other out is in this sense almost a coincidence - so stop being confused.
Edit history:
2013-03-14 - I'd wrongly stated that AMD processors tend to be big-endian. I now know that all x86/x64 CPUs are little-endian, as are most modern achitectures - in addition to bi-endian ones (sic) such as SPARC, ARM and PowerPC. Also corrected misleading statement about data endianness in network transmissions.
Basics: The Definition of Endianness
Wikipedia defines endianness as "the ordering of components (such as bytes) of a data item (such as an integer), as stored in memory or sent on a serial connection." When a software professional talks about endianness, he is referring to some structured binary data which may be interpreted differently depending on how the individual bytes are ordered, just as date notation has a different meaning depending on whether you expect 01-12 to be the 1st of December or the 12th of January. This is the kind of thing most people don't normally think about until something unexpectedly goes wrong - in life and in software.
Whenever an Intel processor reads a 4-byte unsigned integer, it expects the first byte to represent the total number modulo 2^8, the second byte - over 2^8, modulo 2^8, etc. For example:
A0 FF 7D 14 = A0(16) + 256(10) * FF(16) + 65536(10) * 7D(16) + 16777216(10) * 14(16) = 343,801,760(10)
An AMD processor expects the opposite - there, the same byte sequence would signify the number 2,701,098,260. On the other hand, if the same bytes were read as two adjacent 2-byte unsigned integers, the result would be A0 + 256*FF, 7D + 256*14 on a Pentium and 256*A0 + FF, 256*7D + 14 on an Itanium or a PlayStation 3. The specific endianness-es of the various architectures and the reasons for the different conventions are well described elsewhere, so we will not dwell on them here. Note that in the example above, if the data was read as 4 adjacent single bytes, the result would be the same regardless of the CPU.
Standard networking protocols use big-endian notation for integer fields such as port numbers, while the data itself has no predefined endianness other than the one assumed by the application. Various data formats and devices have conventions about how numeric data (i.e., all data) is to be stored, which occasionally leads to painful complications for the unexpecting software engineers.
Insight: The Meaning of Endianness
In its essence, "endianness" is the way in which the CPU interprets binary sequences in memory. The processor has no concept of "number", unlike humans who associate numbers with intuition and experiences. However, if we design a bijective correspondence between numbers (which are interesting to us) and binary strings (which the CPU can handle), we can use the processor to manipulate the binary strings according to its instruction set, and then we can decode the results back into abstract numbers - or have IO devices help us by representing the bytes in some other fashion, e.g.. as characters on a screen.
In this sense, the endianness of a CPU is the way it handles strings of bytes with respect to the basic arithmetic operations - and some basic primitives such as comparison, jumping and memory addressing. Everything else is built on top of that - looping and branching, array indexing, functions, input and output, and operates correctly so long as the CPU's operations honor the bijection.
Basically, endianness is an encoding between (abstract) numbers and (concrete) binary strings in the same way that UCS is an encoding between (abstract) characters and (abstract) numbers and UTF-8 is an encoding between (abstract) numbers and (concrete) sequences of bytes.
This is why, just like a piece of text data is meaningless unless we know its encoding and language, binary data is meaningless without knowing the endianness with which it was encoded - even if we know that such and such bytes represent a 4-byte signed integer, we can't tell if it's the number 2 or 33,554,432.
Practical issues: When does endianness matter?
Short answer, when doing IO. In the case of user input/output, it's not much of a concern, since your OS and device drivers will transparently take care of the matter and make sure you get the correct byte strings in memory. Regardless of the technology stack you're working in, a correctly working program working on properly represented input data will produce correct, properly represented output. No one (except for the OS), needs to know how the CPU does arithmetic on binary strings. Once you have a properly constructed data object, the logic will work.
However, when dealing with files and networks, if the data is in an endianness-sensitive format and originated on another computer with different or unknown endianness, you need to take steps to insure it's properly interpreted at your end. Just as for security you need to watch your input and sanitize possibly tainted data, for correctness you need to keep track of which data may have been encoded on a device with different endianness, and process it accordingly before feeding it into the business logic of your application - and you should also be aware when some supposedly helpful intermediate component does the conversion for you so you won't corrupt your data by processing it again.
The other case you need to keep endianness in mind is when for some you're reading raw memory. Here's a tiny C# program for testing your processor's byte endianness:
using System; // OR use BitConverter.IsLittleEndian class Program { unsafe static void Main() { int num = 1; bool le = *(byte*)&num == 1; Console.WriteLine("{0} endian", le ? "LITTLE" : "BIG"); } }
I plan to give some examples of when endianness-related issues in practice a bit later later.
As a footnote, UTF-16 "Big Endian" and UTF-16 "Little Endian" have nothing to do with CPU endianness - they're two number-to-bytestring encodings. The fact that the bytestrings are related so that wrong assumed encoding and different CPU endianness cancel each other out is in this sense almost a coincidence - so stop being confused.
Edit history:
2013-03-14 - I'd wrongly stated that AMD processors tend to be big-endian. I now know that all x86/x64 CPUs are little-endian, as are most modern achitectures - in addition to bi-endian ones (sic) such as SPARC, ARM and PowerPC. Also corrected misleading statement about data endianness in network transmissions.
Labels:
encoding,
endianness,
hardware,
low level,
meditations
Thursday, March 7, 2013
Mind Map - Input and Output in .NET
I love it when I get assigned to do something cool and meaningful. Here is the Mind Map I had to chart for Telerik Academy's "Teamwork and Knowledge Sharing" course, on the topic of "Input and Output in .NET" (click on the images to see them full-sized).
Doing them was fun! I'm so glad I got introduced to this concept and had motivation to practice it. Not only is mind mapping a fun exercise, and great for getting a sense of the structure of a topic, it seems like a really good way to document API.
I have so many ideas for maps right now, perhaps I'll start with Ruby's standard library so I won't have to open the documentation so often.
PS: Apparently, Blogspot has decided there is a worldwide shortage of pixels and downscaled the detailed mind map. Here it is in its original HD glory.
Overview, with images |
Detailed, without images |
I have so many ideas for maps right now, perhaps I'll start with Ruby's standard library so I won't have to open the documentation so often.
PS: Apparently, Blogspot has decided there is a worldwide shortage of pixels and downscaled the detailed mind map. Here it is in its original HD glory.
Tuesday, March 5, 2013
Blink!
blink.exe - An extremely simple and lightweight utility that protects your eyes and helps you avoid Computer Vision Syndrome - by reminding you to blink more often!
Readme: https://github.com/staafl/blink/blob/master/README.md
Application: https://raw.github.com/staafl/blink/master/build/blink-latest.zip
Source: https://github.com/staafl/blink
Sched!
sched.exe - an extremely simple and lightweight task scheduling utility
Readme: https://github.com/staafl/sched/blob/master/README.md
Application: https://raw.github.com/staafl/sched/master/build/sched-latest.zip
Source: https://github.com/staafl/sched
Wake Up!
My latest exciting donationware project is here: https://github.com/staafl/wakeup/blob/master/README.md
No more painful morning snooze marathons! Wake up when you want to!
Application: https://raw.github.com/staafl/wakeup/master/build/wakeup-latest.zip
Source: https://github.com/staafl/wakeup
No more painful morning snooze marathons! Wake up when you want to!
Application: https://raw.github.com/staafl/wakeup/master/build/wakeup-latest.zip
Source: https://github.com/staafl/wakeup
Saturday, March 2, 2013
C# Riddles - Part II - Common Mistakes
This time it's some nasty gotchas, some of them common, some of them obscure, that the C# compiler may or may not catch for you. The intended purpose of the code is described in the comments, you can try to figure out what goes wrong.
// replace A with B var mystr = Console.ReadLine(); mystr.Replace("A", "B"); Console.WriteLine(mystr);
// has more than 1 second elapsed since last time? var do_it_again = (DateTime.Now - last_time).Seconds > 1;
// replace sequences of vowels with "VOWELS" words = Regex.Replace(words, "\b[aoeiu]+\b", "VOWELS");
// print out a matrix for(var ii = 0; ii < rows; ++ii) { for(var jj = 0; jj < columns; ++ii) { Console.Write("{0} ", matrix[ii,jj]); } Console.WriteLine(); }
// rethrow exception var can_recover = false; try { DoStuff(ref can_recover); } catch(ApplicationException ex) { if(!can_recover) { throw ex; } }
// spinwait for 100 ticks long now = DateTime.Now.Ticks; while((DateTime.Now.Ticks - now).Ticks < 100); Retry();
// get a datestamp DateTime.Today.ToString("dd/mm/yyyy")
// add items to work queue for processing var list = GetDbValues(); lock(work_queue) { for(var ii = 0; ii < list.Count; ++ii) work_queue.Add(() => Process(list[ii])); } }
// run an action after 60 seconds var timer = System.Timers.Timer(60.0); timer.Elapsed += (_sender, _args) => Console.WriteLine("60 seconds have elapsed!");
// let's try again var timer = System.Windows.Forms.Timer(); timer.Interval = my_interval; timer.Tick += (_sender, _args) => Console.WriteLine("Time's up!");
// remove unwanted elements from a collection foreach(var elem in list) { if(ComplicatedLogic(elem)) { list.Remove(elem); } }
// hash the contents of a dictionary public long HashDict(Dictionarydict) { long ret = 0; unchecked { foreach(var kvp in dict) { ret += kvp.Key.GetHashCode() * kvp.Value; ret << 1; } } return ret; }
// raise event public event EventHandler StateChanged; void OnStateChanged() { StateChanged(this, new EventArgs()); }
// did it work? bool worked = DidItWork(); if(worked = true) { MessageBox.Show("Success!") } else { MessageBox.Show("Oops!") }
// let's make a simple struct public struct MyStruct { int field1; bool field2; public override bool Equals(MyStruct struct2) { return field1 == struct2.field1 && field2 == struct2.field2; } }
// ok, let's try with a class public class MyClass { public int field1; public bool field2; public int GetHashCode() { return field1; } public override bool Equals(MyClass obj2) { return field1 == obj2.field1 && field2 == obj2.field2; } }
// last try public class MyClass { public readonly int field1; public readonly bool field2; public int GetHashCode() { return field1; } public override bool Equals(MyClass obj2) { return field1 == obj2.field1 && field2 == obj2.field2; } }
// order by item1, then by item2 var ordered = from x in enumerable orderby x.Item1 orderby x.Item2 select x;
// simple assignment class Class { public Struct Struct { get; private set; } } struct Struct { public int field; } ... var obj = new Class(); obj.Struct.field = 5;
// find intersection between two users int customerID = 12345; var productsForFirstCustomer = from o in Orders where o.CustomerID = customerID select o.ProductID; // change customer ID and compose another query... customerID = 56789; var productsForSecondCustomer = from o in Orders where o.CustomerID = customerID select o.ProductID; if( productsForFirstCustomer.Any( productsForSecondCustomer ) ) { ... }
// start a new worker thread after some time void MyThreadMethod(object data) { var start_after = 100.0 + (double)data; } ... var thread_number = 10; Thread.Start(MyThreadMethod, thread_number);
// set sender's text to default (sender as TextBox).Text = default_text;
// simple time formatting Console.WriteLine("{0:уууу-ММ-dd}", DateTime.Today);
// using a default format string public void Print(object obj) { const string default_format = "<{0}>"; Print(default_format, obj); } public void Print(string str, params object[] objs) { Console.WriteLine(str, objs); } ... Print("hello");
// read file 10 lines at a time public IEnumeratorReadLines(string file) { using(var sr = new StringReader(file)) while(!sr.EndOfStream) yield return sr.ReadLine(); } public bool Print10(IEnumerator enm) { for(var ii = 0; ii < 10; ++ii) { if(!enm.MoveNext()) return false; Console.WriteLine(enm.Current); } return true; } ... var enm = ReadLines("input.txt"); while(true) { if(!Print10(enm)) break; if(Console.Read() != 'y' && Console.Read() != 'Y') break; }
// do something if conditions are met if(SomethingIsTheCase() && IFeelLikeIt() && TheTimeIsRight(() => { return DateTime.Now.AddDays(10); })); { DoSomething(); }
// parse a datestamp var date = dt.ToString("dd/MM/yyyy"); ... var day = int.Parse(date.Split('/')[0]);
// apply rotation transform // mygraphics is http://msdn.microsoft.com/en-us/library/system.drawing.graphics.aspx mygraphics.RotateTransform(-Math.Atan(tan));
// modify a list struct Point { public int x; public int y;} Listmypoints = ...; mypoints[i].x = 10;
// write files then delete them var files = Enumerable.Range(0, 5) .Select(i => Path.GetTempFileName()); foreach (var file in files) File.WriteAllText(file, "HELLO WORLD!"); /* ... many lines of codes later ... */ foreach (var file in files) File.Delete(file);
// tracing logic private void DumpError(Exception exception, Stackcontext) { if (context.Any()) { Trace.WriteLine(context.Pop()); Trace.Indent(); this.DumpError(exception, context); Trace.Unindent(); } else { Trace.WriteLine(exception.Message); } }
// run an action after 60 seconds private void Schedule(Action action) { new System.Threading.Timer(state => action, null, 60000, Timeout.Infinite); }
// a serializable class [Serializable] class Hello { readonly object accountsLock = new object(); }
// open a file using(var sw = new StreamWriter("resources\temp\textfile.txt")) { ... } DateTime.ToString("dd/mm/yyyy")
// unit test static IEnumerableCapitalLetters(string input) { if (input == null) { throw new ArgumentNullException(input); } foreach (char c in input) { yield return char.ToUpper(c); } }
// spinwait for 100 ticks long now = DateTime.Now.Ticks; while((DateTime.Now.Ticks - now).Ticks < 100); Retry();
// Test that null input is handled correctly CapitalLetters(null).ExpectThrows();
// just a property private int myVar; public int MyVar { get { return MyVar; } }
// get a path var prefix = "C:\\MyFolder\\MySubFolder"; var suffix = "\\log\\"; var path = Path.Combine(prefix, suffix);
// vips = gold clients + some silver ones Collectionsilver_clients = GetClientsVips(ClientStatus.Silver); Collection gold_clients = GetClientsVips(ClientStatus.Silver) Collection vips = new Collection (gold_clients); foreach(var client in silver_clients) { if(ClientMatchesVipRequirements(client)) { vips.Add(client); } }
Sources: - my own experience - http://blog.roboblob.com/tag/dotnetgotcha/ - http://stackoverflow.com/questions/3703681/common-linq-standard-query-operator-mistakes-mis-steps - http://der-waldgeist.blogspot.com/2011/04/common-linq-mistakes.html - http://stackoverflow.com/questions/241134/what-is-the-worst-gotcha-in-c-sharp-or-net
Subscribe to:
Posts (Atom)