Information Technology Research Update
 
               by Diomidis Spinellis
    Department of Management Science and Technology
   Athens University of Economics and Business (AUEB)
             http://www.dmst.aueb.gr/dds/
                  
                Volume 3 Issue 1
                July 27th, 2003

A free periodic newsletter providing summaries, analyses, insights, and
commentaries on information technology research brought to you by the
Information Systems Technology laboratory http://istlab.dmst.aueb.gr

In this issue:

- Automatic Program Verification
- Information Technology and Economic Performance
- Peer to Peer Networks Used for Tracking Zebras
- ACM Transactions on Embedded Computing Systems
- Code Reading

Automatic Program Verification 
------------------------------
(report from Microsoft's Faculty Summit 2003)
What is the state of the art in automatic program verification?  Two
presentations given at the 2003 Microsoft Faculty Summit held in
Cambridge England in the middle of July provide some interesting
insights.

Sir Tony Hoare, in an inspiring keynote presentation, proposed the goal
of a Verifying Compiler as a Grand Challenge for Computing Research. 
According to Hoare's definition, a verifying compiler will use automated
mathematical and logical reasoning to check the correctness of the
programs it compiles.  The criterion of correctness will be specified by
types, assertions, and other redundant annotations that are associated
with the code of the program. The capabilities of the verifying compiler
can be evaluated by applying it on legacy open-source code.  The
particular challenge was abandoned in the 1970's but the current
hardware's speed and capacity, our experience with safety-critical
software, the open source movement, advances in global program analysis
and theorem proving technology, and a unifying theory for concurrency
and object orientation make it now feasible.  The challenge will be of a
magnitude similar to other Grand Challenges such as putting a man on the
moon, curing cancer, or mapping the human genome, and easily satisfies
the attributes a Grand Challenge should have: fundamental, astonishing,
testable, inspiring, understandable, useful, historical, international,
revolutionary, research-directed, challenging, incremental,
co-operative, competitive, effective, and risky.

In a different talk James Larus described tools developed and used
internally at Microsoft for improving their software development.  Two
heuristic tools widely employed are PREfix and PREfast.  PREfix performs
a path-by-path interprocedural analysis and is computationally
expensive, taking 4 weeks to run on the Windows kernel.  PREfast works
by running user-supplied plugins when traversing the compiler's abstract
syntax tree.  It runs on desktops and is easily customized.  The two
tools are widely deployed within Microsoft; 1/6 of the bugs fixed in the
Windows Server 2003 were apparently found by those two tools.  (Note
that the above statement is completely different from: "the two tools
uncovered 1/6 of the bugs in the Windows Server 2003")  

PREfix locates errors by traversing the call graph through a limited set
of paths taking great care to be conservative and avoid false
positives.  Some of the errors it can detect include:

- NULL pointer use
- Memory allocation errors
- Use of unitialized values
- Library usage problems
- Invalid resource states

PREfast works at a local level examining the compiler's abstract syntax
tree for error idioms.  It is based on plugins, which individual
developers write.  If I remember correctly, there are hundreds of
plugins in use within Microsoft.  PREfast also uses an annotated version
of the Windows APIs to detect buffer overflow errors (most WIN32 APIs
receive a memory buffer and its size as separate arguments; I assume the
annotations bind the two).

PREfix and PREfast were termed first generation tools in contrast to the
second generation tools that are sound and based on declarative
specifications and models.

SLAM performs software model checking.  It receives as input the C
source code and the API specification rules mentioned above and can
reason through a systematic exploration of the program's state space
whether a feasible path can lead to a violation of the API
specification.  It works by converting the C source into a boolean
program (removing all non boolean elements) that can then be checked for
specific paths.  Typically the model checker is assisted by annotating
the program with additional predicates.  When applied on a set of 100
device drivers it uncovered more than 60 bugs.

ESP performs scalable error detection through a path-sensitive
analysis.  The goal is to locate program states that signify errors. 
The problem of course is taming the explosion of the state space.  At
each branch the tracked program state can potentially double.   The
combinatorial explosion is managed as follows:  a) knowing the previous
state can determine the branch direction b) states can be merged at
control flow join points and c) little code affects the tracked state.

Finally, Fugue checks the behavior of programs related to the use of the
API by performing a static analysis on annotated versions of the
program.  It works on Microsoft's managed languages that support the
specification of attributes (C#, VB, Managed C++, VJ#).  Attributes are
arbitrary programmer-specified declarations that can be added on the
language's objects and are tracked throughout the compilation cycle. 
Custom attributes are added to the source to specify e.g. the state of a
resource or the "nullness" of a value.  Declarative API usage rules are
then applied to verify the implementation against the contract.

More details are available at:
http://research.microsoft.com/spt 
http://research.microsoft.com/pprc 


Information Technology and Economic Performance
-----------------------------------------------
Since the early 1980s researchers of information systems and economics
have been debating whether the information technology revolution is
paying off in higher productivity.  The first studies found no
correlation between IT investment and productivity at the level of
firms, industries, or the economy.  This was aptly termed the
productivity paradox, and has been an object of intense study, ever
since.  In a recent issue of the ACM Computing Surveys (35(1):1-28,
March 2003) Dedrick, Gurbaxani, and Kraemer critically review the
empirical evidence of more than 50 articles on computers and
productivity based on a solid framework for evaluating the research. 
The review concludes that the productivity paradox has been effectively
refuted at both the firm and the country level.


Peer to Peer Networks Used for Tracking Zebras
----------------------------------------------
If you think that peer to peer networks are only used for swapping CD
tracks you are wrong.  In a paper presented at the 10th International
Conference on Architectural Support for Programming Languages and
Systems (ASPLOS-X) Philo Juang and his colleagues discuss how the
movement of zebras on the open lands of Kenya can be tracked by
attaching on their neck a collar with a GPS receiver, two transmitters,
and memory for storing the data.  The zebras will exchange their
positional data between them when they encounter each other e.g. near a
watering hole, using the low power transmitter.  Researchers will thus
be able to obtain data from multiple zebras using the high-powered
transmitter by locating a single zebra.  The system's design goal is to
allow the zebra to move without any human intervention for periods of at
least one year.
(Energy-Efficient Computing for Wildlife Tracking: Design Tradeoffs and
Early Experiences with ZebraNet. pp 96-107)


ACM Transactions on Embedded Computing Systems
----------------------------------------------
Embedded computers, which now drive everything from toys and cars to
household appliances and telephones, are the subject of ACM's recently
launched 
quarterly Transactions title, Transactions on Embedded Computing
Systems.
http://www.acm.org/tecs/.


Code Reading
------------
[With apologies for the blatant self-promotion]
My book "Code Reading: The Open Source Perspective" (Addison-Wesley,
2003) is now available at online and bricks-and-mortar bookstores.  The
book details how to read and understand software source code.  All 600
examples I use in the book are drawn from widely-used open source
projects.  The publication inaugurates Addison-Wesley's Effective
Software Development Series, edited by Scott Meyers.  I have set up a
web site for the book linked at my home page.

We read computer code when maintaining or extending a software system. 
We also read code when we are scavenging another project for code to
reuse.  Sometimes we read code when inspecting the work of others,
learning how our colleagues solved a particular problem, or even as
literature, to expand our minds.  The reading of code is likely to be
one of the most common activities of a  computing professional, yet it
is seldom taught as a subject or formally used as a method for learning
how to design and program. 

One reason for this sad situation originally may have been the lack of
real-world or high-quality code to read. Companies often protect source
code as a trade secret and rarely allow others to read, comment on,
experiment with, and learn from it. In the few cases where important
proprietary code was allowed out of a company's closet, it spurred
enormous interest and creative advancements. As an example, a generation
of programmers benefited from John Lions's Commentary on the Unix
Operating System that listed and annotated the complete source code of
the sixth-edition Unix kernel. Although Lions's book was originally
written under a grant from AT&T for use in an operating system course
and was not available to the general public, copies of it circulated for
years as bootleg nth-generation photocopies. 

In the last few years, however, the popularity of open-source software
has provided us with a large body of code we can all freely read. Some
of the most popular software systems used today, such as the Apache Web
server, the Perl language, the GNU/Linux operating system, the BIND
domain name server, and the sendmail mail-transfer agent are in fact
available in open-source form. I was thus fortunate to be able to use
open-source software such as the above to write this book as a primer
and reader for software code. My goal was to provide background
knowledge and techniques for reading code written by others. By using
real-life examples taken out of working, open-source projects, I tried
to cover most concepts related to code that are likely to appear before
a software developer's eyes, including programming constructs, data
types, data structures, control flow, project organization, coding
standards, documentation, and architectures.  This book is - as far as I
know - the first one to exclusively deal with code reading as a distinct
activity, one worthy on its own.