Categories
Science & technology

WannaCry: The Story and Lessons

In the wake of last week’s cyber attack on the NHS and other large organisations around the world, Dr Mahdi Aiash explains how the WannaCry ransomware was able to do such widespread damage, and how it was ultimately stopped.

On Friday May 12th 2017, several organizations were affected by a new ransomware strain. The ransomware was very successful in part because it used an SMB vulnerability to spread inside networks. The vulnerability was patched by Microsoft in March for supported versions of Windows. The exploit, known under the name EternalBlue, was released in April as part of a leak of NSA tools.

The massive impact of the ransomware was due to three primary factors:

  • This variant of ransomware possesses the capability to spread itself as a so-called worm
  • It exploits a known vulnerability in Windows
  • It uses the Server Message Block (SMB) network protocol that is often unfiltered inside corporate networks.

WannaCry ransomware borrowed from leaked NSA exploits and spread across at least 75,000 PCs in less than 24 hours. Upon infection, files with specific extensions will be encrypted and the ransomware will install “DOUBLEPULSAR” backdoor to access the system remotely via port 445. The backdoor exploits unpatched vulnerabilities which have been addressed as part of Microsoft Security Bulletin MS17-010. The SMB protocol provides a method for client applications in a computer to read and write to files and to request services from server programs in a computer network or over the Internet.

Historically, SMB had a bad reputation of being unsecure (Null Session Attack). Its functionality within a network appeals to hackers and cybercriminals as it provides an easy way to spread and maximize the damage as traffic is often unfiltered inside corporate networks.

Image by Yuri Samoilov (CC 2.0)

WannaCry in Action

For cyber criminals to gain access to the system they need to download a type of malicious software onto a device within the network. This is often done by getting a victim to click on a link or download it by mistake from the web or an email attachment.

Phase 1:

Once on the victim’s machine and before starting malicious activities, the ransomware has a small piece of housekeeping to perform. A command in WannaCry’s code told it, each time it infected a new machine, to try to communicate with an obscure web address/URL: a long string of gibberish characters. Because the URL is inactive, the ransomware will expect no answer and this is a sign for it to proceed in action. If otherwise, the ransomware will go dormant. This very same feature that is believed to shield the ransomware turned out to be the attack’s Achilles heel, but for the first few hours, it would go unnoticed and WannaCry would be left to propagate unhindered.

Phase 2:

In this phase of infection, the ransomware inspects the file-sharing arrangements on the infected computer, and begins exploiting them. To do so, it deployed its secret weapon — or rather, a weapon that had once been someone else’s secret: a repurposed cyber spying tool known as EternalBlue, stolen from the US National Security Agency and leaked online in April.

EternalBlue exploits a security loophole in the form of a buffer overflow in the SMB implementation of  Windows operating systems that allows a malicious code to spread through structures set up to share files without permission from users.

Whose fault is it?

The vulnerability that allowed the propagation of the ransomware was known by the NSA for quite some time. This was disclosed by the Shadow Brokers leak in April and a sample exploit code was quickly released on Github which could be integrated as a module of the Metasploit which is a well-known exploitation framework. Unfortunately, with EternalBlue in use and the SMB vulnerability being unpatched, WannaCry has become one of the most destructive cyber-attacks ever seen.

It should not come as a surprise that the NSA (or any other agency) might have been aware of this type of “yet to be known” vulnerability. Such agencies might have been using these vulnerabilities in “safe mode” for surveillance against specific targets. However, it is just a matter of time before such a vulnerability becomes known to other “undesired” groups. Therefore, the NSA should have reported the vulnerability under the ‘responsible disclosure‘ term, unfortunately, this was not the case in this instance.

The attack raises a big concern as to whether there might be other unreported vulnerabilities that are currently being used for surveillance by government agencies which might lead to a similar destructive attack.

How has it been temporarily fixed?

A UK-based researcher known as a ”Malware Tech” shut the operation down, albeit by a stroke of good fortune! Upon analysis, he found that the ransomware’s programmers had built it to check whether a certain gibberish URL led to a live web page. One of the web domains used by the attackers hadn’t been registered. The researcher registered the site, took control of the domain (for $10.69) and started seeing connections from infected victims, hence his ability to track the ransomware’s spread. But in doing that he also took down the WannaCry operation without meaning to.

The ransomware analysis shows that as long as the probed, hardcoded URL/domain is unregistered and inactive, the ransomware spreads. But once the URL is active, the ransomware shuts down. Competing theories exist as to why WannaCry’s perpetrators built it this way:

Theory #1:

The functionality of probing an inactive URL was put in place as an intentional “kill switch” feature, in case the creators ever wanted to rein in the monster they’d created or in case something went wrong.

Theory #2:

This debates that hackers could have included the feature to shield the ransomware from analysis by security professionals in a “sandbox”.  Within the sandbox all malware requests (even to unregistered domains) will be intercepted and a response will be sent back by a number of dummy sandbox IP addresses. The ransomware in this case was probing an unregistered domain and hence expecting no response. If a response is received, the ransomware will assume that it works in a sandbox and hence should shut down. Once the probed domain is registered, it starts receiving and responding to requests from ransomwares all over the world. As a result, the communicating ransomware begin to assume that they are running in the middle of a forensic analysis, and shut down.

How to Mitigate Infection: Patch

Newer Windows Versions (Windows Vista, 7-10, Windows Server 2008-2016) can be patched with MS17-010 released by Microsoft in March. Microsoft released a patch for older systems going back to Windows XP and Windows 2003 on Friday.

At the network level, a number of steps could help:

  • Segment Network
    • Prevent internal spreading via port 445 and RDP.
    • Block Port 445 at perimeter.
  • Disable SMBv1
  • Implement internal “kill switch” domains/do not block them

It is crucial to notice that even if you have mitigated the effects of this particular strain of malware, it’s only a matter of time until hackers alter the behaviour or infection path. Patching this vulnerability will not remove the danger of ransomware. This flavour of ransomware uses a vulnerability that can be patched but there are other avenues that can be used by malware to cause havoc in your organization. WannaCry is similar to previous large-scale attacks and highlights the need for a collective effort from security researchers/experts, system/network admins, security agencies and security-tools vendors and providers to face cyber criminals. Hackers are not magicians; they simply make use of our mistakes. Following simple mitigation steps makes the next cyber attack less likely, BUT never impossible.

Find out more about studying Network Security and Pen Testing at Middlesex.

Categories
Science & technology

News from the computational lab – now what?

Giuseppe Primiero, Middlesex UniversityDr Giuseppe Primiero (pictured right), Senior Lecturer in Computing Science and a member of the Foundations of Computing research group at Middlesex University, and Professor Viola Schaffonati, of the Politecnico di Milano, Italy, are working on a philosophical analysis of the methodological aspects of computer science.

In February 2016 science hit the news again: the merger of a binary black hole system was detected by the Advanced LIGO twin instruments, one in Hanford, Washington, and the other 3,000 km away in Livingstone, Louisiana, USA. The signal, detected in September 2015, was famously predicted by Einstein’s general theory of relativity. This phenomenon was also numerically modelled by super-computers since at least 2005 – a typical example of computational experiment.

Computational experiments

The term ‘computational experiment’ is used to refer to a computer simulation of a real scientific experiment. An easier example: to test some macroscopic property of a liquid which is hard to obtain, or where equipment is too expensive to purchase e.g. in an educational setting, a simulation is a more feasible solution than the real experiment. Computational experiments are largely used in several disciplines like chemistry, biology and the social sciences. As experiments are the essence of scientific methodology, indirectly, computer simulations raise interesting questions: how do computational experiments affect results in the other sciences? And what kind of scientific method do computational experiments support?

These questions highlight the much older problem of the status and methodology of computer science (CS) itself. Today we are acquainted with CS as a well-established discipline. Given the pervasiveness of computational artefacts in everyday life, we can even consider computing a major actor in academic, scientific and social contexts. But the status enjoyed today by CS has not always been granted. CS, since its early days, has been a minor god. At the beginning computers were instruments for the ‘real sciences’: physics, mathematics, astronomy needed to perform calculations that had reached levels of complexity unfeasible for human agents.

Computers were also instruments for social and political aims: the US army used them to compute ballistic tables and, notoriously, mechanical and semi-computational methods were at work in solving cryptographic codes during the Second World War.

The UK and the US were pioneers in the transformation that brought CS into the higher education system: the first degree in CS was established at the University of Cambridge Computer Laboratory in 1953 by the mathematics faculty to meet the request of competencies in mechanical computation applied to scientific research. It was followed by Purdue University in 1962. The academic birth of CS is thus the result of creating technical support for other sciences, rather than the acknowledgement of a new science. Subsequent decades brought forth a quest for the scientific status of this discipline. The role of computer experiments as they are used to support results in other sciences, a topic which has been largely investigated, seems to perpetrate this ancillary role of computing.

The collision of two black holes holes—a tremendously powerful event detected for the first time ever by the Laser Interferometer Gravitational-Wave Observatory, or LIGO—is seen in this still from a computer simulation. Photo by the SXS (Simulating eXtreme Spacetimes) Project.
The collision of two black holes holes – a tremendously powerful event detected for the first time ever by the Laser Interferometer Gravitational-Wave Observatory, or LIGO – is seen in this still from a computer simulation. Photo by the Simulating eXtreme Spacetimes Project.

A science?

But what is then the scientific value of computational experiments? Can they be used to assert that computing is a scientific discipline on its own? The natural sciences have a codified investigation method: a problem is identified; a predictable and testable hypothesis is formulated; a study to test the hypothesis is devised; analyses are performed and results of the test are evaluated; on their basis, the hypothesis and the tests are modified and repeated; finally, a theory that answers positively or negatively to the hypothesis is formulated. One important consideration is therefore the applicability of the so-called hypothetical-deductive method to CS. This, in turn, hides several smaller issues.

The first concerns the qualification of which ‘computational problems’ would fit such method. Intuitively, when one refers to the use of computational techniques to address some scientific problem, the latter can come from a variety of backgrounds. We might be interested in computing the value of some equations to test the stability of a bridge. Or we might be interested in knowing the best-fit curve for the increase of some disease, economic behaviour or demographic factor in a given social group. Or we might be interested in investigating a biological entity. These cases highlight the old role of computing as a technique to facilitate and speed-up the process of extracting data and possibly suggest correlations within a well-specified scientific context: computational physics, chemistry, econometrics, biology.

An essential characteristic of scientific experiments is their repeatability.

But besides the understanding of ‘computational experiment’ as the computational study of a non-computational phenomenon, the computational sciences themselves offer problems that can be addressed computationally: how stable is your internet connection? How safe is your installation process when external libraries are required? How consistent are the data extracted from some sample? Just to outline some. These problems (or their formal models) are investigated through computational experiments, but they seem to be less easily identified with scientific problems.

The second: how to formulate a good hypothesis for a computational experiment? Scientific hypotheses depend on the system of reference and, in the case of their translation to a computational setting, we have to be careful that the relevant properties of the system under observation are preserved. An additional complication is presented when the observation itself concerns a computational system, which might include a formal system, a piece of software, or implemented artefacts. Each of the levels of abstraction pertaining to computing reveals a specific understanding of the system, and they can all be taken as essential in the definition of a computing system. Is then a hypothesis on such systems admissible if formulated at only one such level of abstraction e.g. considering a piece of code but not its running instances? And is such an hypothesis still well-formulated enough if it tries instead to account for all the different aspects that a computational system present?

Finally, an essential characteristic of scientific experiments is their repeatability. In computing, this criterion can be understood and interpreted differently: should an experiment be repeatable under exactly the same circumstances for exactly the same computational system? Should it be repeatable for a whole class of systems of the same type? How do we characterize typability in the case of software? and how in the case of hardware?

Irregularities

All the above questions underpin our understanding of what a computational experiment is. Although we are used to expecting some scientific uniformity in the notion of experiment, the case of CS evades such strict criteria. First of all, several sub-disciplines categorise experiments in very specific ways, each not easily applicable by the research group next-door: testing a piece of software for requirements satisfaction is essentially very different from testing a robotic arm for identifying its own positioning.

Experiments in the computational domain do not offer the same regularities that can be observed in the physical, biological and even social sciences. The notion of experiments is often confounded with the more basic and domain-related activity of performing tests. For example, model-based testing is a well-defined formal and theoretical method that differs from computer-simulations in both admissible techniques, recognised methodology, assumptions and verifiability of results. Accordingly, the process of checking an hypothesis that characterises the scientific method described above is often intended simply as testing or checking some functionality of the system at hand, while in other cases it implies a much stronger theoretical meaning. Here the notion of repeatability (of an experiment) merges with the replicability (of an artefact) – a distinction that has already appeared in the literature (Drummond).

Finally, benchmarking is understood as an objective performance evaluation of computer systems under controlled conditions: is it in some sense characterising the quality of computational experiments, or simply identifying the computational artefacts that can be validly subject to experimental practices?

A philosophical analysis

The philosophical analysis on the methodological aspects of CS, of which the above is an example, is a growing research area. The set of research questions that need to be approached is large and diversified. Among these, the analysis on the role of computational experiments in the sciences is not a new one, though less understood is the methodological role of computer simulations in CS, rather than as a support method for testing hypotheses in other sciences.

The Department of Computer Science at Middlesex University is leading both research and teaching activities in this area, in collaboration with several European partners, including the Dipartimento di Elettronica, Informazione e Bioingeneria at Politecnico di Milano in Italy, which offers similar activities and has a partnership with Middlesex through the Erasmus+ network.

In an intense one-week visit, we drafted initial research questions and planned future activities. The following questions represent a starting point for our analysis:

  • Do experiments on computational artefacts (e.g. a simulation of a piece of software) differ in any significant way from experiments performed on engineering artefacts (like a bridge), social (a migration) or physical phenomena (fluid dynamics)?
  • Does the nature of computational artefacts influence the definition of a computational experiment? Or in other words, is running an experiment on a computer significantly different than running it in a possibly smaller-scale but real-world scenario?
  • Does the way in which a computational experiment is implemented influence the validity and generality of its results? In which way does the coding, its language and choice of algorithms affect the results?

These questions require considering the different types of computer simulations, as well as other types of computational experiments, along with the specificities of the problems treated. For example, an agent-based simulation of a messaging system underlies problems and offers results that are inherently different from the testing with real users of a monitoring systems for privacy on social networks. The philosophical analysis on the methodological aspects of CS impacts not only the discussion about the discipline, but also on how its disciplinary status is acknowledged by a larger audience.

Nowadays we are getting used to reading about the role of computational experiments in scientific research and how computer-based results affect the progress of science. It is about time that we become clear about their underlying methodology, so that we might say with some degree of confidence what their real meaning is.

Categories
Science & technology

Of men and machines (doing mathematics)

Giuseppe Primiero, Middlesex UniversityDr Giuseppe Primiero is Senior Lecturer in Computing Science and a member of the Foundations of Computing research group at Middlesex University. Here Giuseppe discusses the recent British Colloquium for Theoretical Computer Science, which he helped organise.

How many of the functionalities located today on your standard desktop or mobile computer were first the object of theoretical study by mathematicians and computer scientists? And which of today’s theoretical results will be essential to tomorrow’s computing technologies?

Often this theoretical work is at such a high level of abstraction that you would hardly recognise the relevance to the final working application, but essential it is.

By its own nature, Computer Science is a field of research that uniquely combines theory and applications, in a way no other scientific field really does. Computers are, after all, physical artefacts ruled by logical principles – a marriage of mathematics and technology born out of many distinct theoretical and practical results.

Notable among other figures is the role of the British mathematician Alan Mathison Turing. His theoretical device first proved the logical possibility of a general-purpose machine during a time when a ‘computer’ was principally a human doing computations. Only many years later did the word became widely and uniquely used for mechanical calculators.

Today research in Computer Science is still bred of the theoretical results that at first are motivated by computational problems. In this first, traditional sense, mathematics is at the core of computation and this theoretical research may take long routes before the results become essential to technologies of interest and profit for all mankind.

On the other hand though (and several decades after the birth of the first calculating machines), today’s research in mathematics and many other sciences is also led by machines working alongside their human counterparts. Their computational power, speed and large (although finite) memory are increasingly of aid when performing otherwise impossible calculations.

Hence, the progress of theoretical research (in particular in the mathematical field) relies heavily on the physical computations performed by machines. In this second and entirely novel sense, mathematics itself becomes a product of computation, and the way this inversed relation will affect the nature and principles of our knowledge and technology in the future is still to be discovered.

Alan Turing (image by parameter_bond, Creative Commons 2.0)
Alan Turing (image by parameter_bond, Creative Commons 2.0)

British Colloquium for Theoretical Computer Science

As a result of this, research in theoretical computer science is becoming more and more of direct interest and quick relevance to other fields outside of academia. It was with this in mind that the Foundations of Computing Group at Middlesex recently hosted the 31st edition of the British Colloquium for Theoretical Computer Science (BCTCS).

This meeting traditionally welcomes PhD students from across the country to present their work alongside talks from internationally renowned researchers – offering an overview of the most relevant trends in the research area.

This year, the remarkable list of invited speakers boasted two Turing Award winners and one Fields Medalist (roughly speaking these are like the Nobel Prizes in Informatics and Mathematics, respectively).

Sir Tony Hoare FRS (Microsoft Research, University of Cambridge) opened the colloquium discussing the interaction between concurrent and sequential processes.

Tony is well known for developing the formal language CSP (Communicating Sequential Processes) to specify the interactions of concurrent processes, which enables programmers to make machines execute processes in overlapping time periods (i.e. concurrently) as opposed to one after another (i.e. sequentially).

This has led to crucial improvements in speeding up computing technologies, and represents some of the essential mechanisms hidden in today’s computing technology which came about through theoretical investigations.

Per Martin-Löf (Stockholm University) opened proceedings on the second day with a talk on the mathematical structures underlying repetitive patterns and functional causal models. This very abstract talk was not surprising from a logician best-known for his type theory – a mathematical result that has been at the very basis of computer programs used today to perform proofs and obtain logical deduction in an automated way.

In the afternoon Samson Abramsky FRS (University of Oxford) offered his fascinating views on contextuality, a key feature of quantum mechanics that permits quantum information processing and computation to transcend the boundaries of classical computation. This could possibly be the next stage of our interaction with machines.

On the third day we welcomed, from Pittsburgh USA, Thomas Hales. Tom is best known for proving the Keppler Conjecture about the close packing of spheres.

Imagine a grocer selling oranges who wants to stack as many as possible in a small place. Most people naturally seek to build an arrangement known as face-centered cubic and the famous 17th century astronomer Johannes Keppler conjectured that this arrangement, filling space at a density around 74%, could not be beaten.

Remarkably it took more than 370 years before a proof appeared. The most trusted proof was given by Tom, but this had 250 pages and involved three gigabytes of computer programs, data and results. Reviewers of his work noted they were 99% certain of its accuracy but incredibly Tom was not satisfied with this and spent a decade engaging an automated theorem prover (running on a computer) to verify all parts of his proof. The climax of this tour de force was the announcement in 2015, by Tom and his many collaborators, of a formal proof for the Keppler Conjecture.

The theorem-proving theme continued in the afternoon when renowned mathematician Sir Tim Gowers FRS (University of Cambridge) spoke on his ideas of an extreme human-oriented, heuristic-based automated theorem prover that would be of use to everyday mathematicians. Tim spoke about his programme to remedy a lack of engagement between most working mathematicians and the automated theorem proving community.

The final day welcomed another star of Computer Science as Joseph Sifakis (University of Lausanne) spoke about rigorous system design. This talk was of interest to theorists and the more application-oriented alike, before the colloquium finished with Andrei Krokhin (Durham University) discussing recent research in the value constraint satisfaction problem.

Theoretical Computer Science is today experiencing an exciting phase in research methodology: the collaboration of men and machines is offering unforeseen possibilities, as well as problems. Theory and technology are becoming less distant and the effects of this collaboration are being seen stronger and faster in everyday life.