Tag Archives: php

PHP’s cURL and how to use CURLPROXY_SOCKS5_HOSTNAME

So, you’re running a proxy (say, something like TOR) on your locallopback.. but you keep getting this error (must have VERBOSE set to true to catch it):

* Hostname was NOT found in DNS cache
* Can’t complete SOCKS5 connection to 0.0.0.0:0. (2)
* Closing connection 0

Your code looks simple enough.. but it doesn’t work:

A little research later you find that the PROPER SOCKS5 protocol is SOCKS5H (hostname), so you use the proper flag..

[…]
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
[…]

only to get told that TOR doesn’t want to be used as an HTTP proxy (what’s the deal)?


It appears you have configured your web browser to use Tor as an HTTP proxy.
This is not correct: Tor is a SOCKS proxy, not an HTTP proxy.
Please configure your client accordingly.

Have not fear script-kiddies.. even though PHP as of 5.5.x does not seem to include this CRITICAL cURL FLAG, you can reference it directly with the integer value of 7. So… if you change your code to this… you’re as golden as the proverbial goose!

curl_setopt($ch, CURLOPT_PROXYTYPE, 7); // 7 = CURLPROXY_SOCKS5_HOSTNAME

So… now you know!

find my article helpful? print some coupons!

Installing Gearman PHP components for OSX

Locating the latest PHP Components

The Gearman.org page has links to the PHP code on the Downloads page, however the link is very old. The latest code is located at: http://pecl.php.net/package/gearman.

As of 23-OCT-2014, the current stable version is gearman-1.1.2.

I like to drop these files in my /opt directory, and work on them there and unball the package.

mv ~/Downloads/gearman-1.1.2.tgz /opt/.
tar xvzf gearman-1.0.2.tgz
cd gearman-1.0.2

Configuring for Build

The following commands prepared the PHP package to build on OSX Yosemite (10.10).

phpize
Configuring for:
PHP Api Version: 20121113
Zend Module Api No: 20121212
Zend Extension Api No: 220121212

./configure
checking for grep that handles long lines and -e… /usr/bin/grep
checking for egrep… /usr/bin/grep -E
checking for a sed that does not truncate output… /usr/bin/sed
[…]
appending configuration tag “CXX” to libtool
configure: creating ./config.status
config.status: creating config.h

Building the Library

Next step is to run the compile and install the built objects:

make
/bin/sh /opt/gearman-1.1.2/libtool –mode=compile cc -I. -I/opt/gearman-1.1.2 -DPHP_ATOM_INC -I/opt/gearman-1.1.2/include -I/opt/gearman-1.1.2/main -I/opt/gearman-1.1.2 -I/usr/include/php -I/usr/include/php/main -I/usr/include/php/TSRM -I/usr/include/php/Zend -I/usr/include/php/ext -I/usr/include/php/ext/date/lib -I/usr/local/include -I/usr/local/include -DHAVE_CONFIG_H -g -O2 -Wall -c /opt/gearman-1.1.2/php_gearman.c -o php_gearman.lo
mkdir .libs
[…]
Build complete.
Don’t forget to run ‘make test’.

make install
Installing shared extensions: /usr/lib/php/extensions/no-debug-non-zts-20121212/

Telling PHP about gearman

You will need to identify your relevant php.ini file, and edit it, letting PHP know where the library file are located.

Typically under OSX, this file does not exist, and it must be created.

Edit the file:

vi /etc/php.ini

Either way, make sure these two lines are in the file:

Add these lines:

include_path=.:/mnt/crawler
extension=gearman.so

DONE

At this point you should be able to reference Gearman library in your PHP code.

These lines of code, should not throw an error:

$client = new GearmanClient(); // instance
$worker = new GearmanWorker(); // instance

PHP array_shift() not a function for general use

While looking at metrics an monitoring the processing of a CBMHDHS*, I noticed that the active job queue would become quiescent for minutes at a time. Most markedly with the one specific tasklist. My gut told me it was an issue with the way PHP was handling my simple list (700,000 items give or take). So some performance testing was required. What I found was astonishing. PHP’s array_shift is horribly inefficient.

Here is an example to demonstrate this.

When shifting 2500 items (one at a time) off the array (list), it can take 30 or more seconds. Keep in mind this is all in memory! This test is with only 108,000 items:


2013-08-16 18:21:06 # STARTING ARRAY_SHIFT TEST: Q:108581
2013-08-16 18:21:43 # DONE removed 2500 Q:106081

It took 37 seconds, to be exact. To me, that seemed like a long time. Doing a little research I found that when you rewind an array to it’s beginning (using PHP), one of it’s side effects is that it returns the element at that pointer. Reset looks like a scary operation. It does not ‘reset’ the array in the sense of flushing it out, it just resets the pointer to the top. At any rate. the code looked like this:


$element = reset($array);

So, in theory one could use this to get the first element off an array automatically, without removing it. OK, but shift_array does this ANY removes the element. So this is not really the same action. However… (and you know there is a point there), there is another PHP function with a useful side effect. The ‘key() function‘ that returns the key value of at the current pointer (which in this case is the same location we returned the element above). It looks like this:


$key = key($array);

Now, with those two lines of code I have the element (value) and the key at the top of the array. So this is 2 lines of code where array_shift() is just one.. but we’re not even done yet.. we have to perform the most important part (for this exercise) and remove the element. So.. that’s a 3rd line of code like this, right?


unset($array[$key]);

OK.. so taking a line of code right out of the processor, there is a direct comparison. array_shift on the left, reset, key, unset on the right.

Array Shift method Array Reset, Key and Unset method
$key = array_shift($this->QUEUE)) {
$payload = $this->DATA[$key];
$appkey  = reset($this->QUEUE);
$payload = $this->DATA[$appkey];
$key     =  key($this->QUEUE);
unset($this->QUEUE[$key]);

What I didn’t expect to find is how dramatically FASTER the more complex (looking) code is. In The numbers don’t lie.. see for yourself.

Array Shift method.

Total time for 2500 items removed is 37 seconds:


2013-08-16 18:21:06 # STARTING ARRAY_SHIFT TEST: Q:108581
2013-08-16 18:21:43 # DONE removed 2500 Q:106081

Array Reset, Key + Unset method.

Total time for 2500 items removed is <1 second:


2013-08-16 18:21:46 # STARTING ARRAY_RESET TEST: Q:108581
2013-08-16 18:21:46 # DONE removed 2500 Q:106081

Those numbers include getting the payload data out of the 2nd array.

Mind blowing in my opinion. It’s not even a contest. Writing your own is at least 15 times faster than PHP’s native operation, that for all intents, accomplishes the same objective.

The point of all this?

If you are seeing an inexplicable slowdown somewhere in PHP.. and you can narrow it down to 1-2 operations… it’s probably worth your while to try to do it another way, even if that way looks more complex!

I could not believe this when I tried it. When I implement this in the processor, I should be able to remove hours of useless waiting to ‘shift’ items off the list. At some point I’d hope Zend would fix this. Keep in mind I ran this test on PHP 5.5.3 (the latest stable release I could get my mitts on, at the time).

*Cloud Based Massivly Horizontal Distributed Harvesting System

Install Gearman + Gearman-PHP on AWS ec2

The fun and games continue!! As with every Gearman implementation I’ve done, there are trick for each environment. Here are the Cliff Notes (originally sourced from [Planet MySQL page] with my own twist) for getting Gearman setup on an Amazon Web Services (AWS) EC2 (Elastic Computing 2) node running the default AWS distribution. As always, your experience may vary.

Install required libraries

First, get all the required libraries installed using yum:

[ec2-user@]$ sudo yum install -y gcc
[ec2-user@]$ sudo yum install -y gcc-c++
[ec2-user@]$ sudo yum install -y gperf
[ec2-user@]$ sudo yum install -y boost
[ec2-user@]$ sudo yum install -y boost-devel
[ec2-user@]$ sudo yum install -y memcached
[ec2-user@]$ sudo yum install -y libuuid
[ec2-user@]$ sudo yum install -y libuuid-devel
[ec2-user@]$ sudo yum install -y libevent-devel
[ec2-user@]$ sudo yum install -y php-devel
[ec2-user@]$ sudo yum install -y php-xml

Compile Gearmand from Source

Very straight forward config and build.

[ec2-user@]$ cd gearmand-1.1.9
[ec2-user@]$ sudo ./configure --with-boost=/usr/include --prefix=/usr
[ec2-user@]$ sudo make
[ec2-user@]$ sudo make install

Compile Gearman PHP Library from Source

Fairly simple build, but you must first phpize.

[ec2-user@]$ cd gearman-1.1.1
[ec2-user@]$ sudo phpize
[ec2-user@]$ sudo ./configure --prefix=/usr
[ec2-user@]$ sudo make
[ec2-user@]$ sudo make install

Run ldconfig to Reload Dynmaic Library Cache

If you don’t run ldconfig, you’re going to get errors when you edit the php.ini file (last step).


bad:
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib64/php/modules/gearman.so' - libgearman.so.8: cannot open shared object file: No such file or directory in Unknown on line 0

[ec2-user@]$ sudo ldconfig

Edit the PHP ini file

This is the last step.

finding location of your php.ini file
[ec2-user@]$ php -i | grep php.ini
Configuration File (php.ini) Path => /etc
Loaded Configuration File => /etc/php.ini

Edit your config file, adding these lines:

[ec2-user@]$ sudo vi /etc/php.ini
[Gearman]
; Add Gearman shared object to config
extension="gearman.so"

Now your install is complete!!

Installing Gearman PHP components for OSX- (less fun that it should be)

gearmanlogo

NOTES: Installing Geaman’s PHP components on OSX is a frustrating and rather complex task. Know this going in. The only way I was finally able to do this was by reading this page here, from one of PHP’s own engineers, got me pretty close but it was still not a full solution. [HERE]. YOU HAVE BEEN WARNED!!

FIRST THINGS FIRST — Get the right versions!!

Do not use gearman.1.1.7!
As of this writing (7-JUN-2013) the current version of gearmand (gearmand-1.1.7) has a bug that prevents it from properly building on OSX. I waste probably 2 days before in a deep corner of the mind I thought.. “If 1.1.7 has a known bug in OSX, and they have not fixed it yet.. let’s try 1.1.6!, and that worked!! The main Gearman download page has multiple versions so just avoid 1.1.7.

Get the lateset PHP source, (gearman-1.1.2) NOT the one linked off the Gearman page!
Yes.. this wasted even more time. The Gearman page didn’t have a quick easy link to the full set of available versions, and the linked version from the page was very much out of date. There is another build bug regarding PHP. One of the engineers decided to get fancy and change the privacy of the objects members somewhere in 1.0.x tree. This BREAKS build on OSX. They did as recently as this year release a FIX for this which is version gearman-1.1.2. All of them can be had on this page [HERE].

Getting Library Dependancies Worked Out

After fighting with source code, screaming at the screen, and even getting completely frustrated with what I was able to (or not able to) install with MacPorts.. I decided to install HomeBrew and give that a run. It’s not a big deal but I moved that to it’s own page located [HERE].

libevent must be built/installed

You’re going to need libevent, and installing it straight up from brew (nor Mac Ports) did the job for me. Check out my previous pages on installing libevent located [HERE] for details on that exciting exercise.

You Must Install Gearmand (sever) regardless

Regardless of how you plan to user Gearman with PHP, you must have the GearmanD server compiled to create the required libraries. There are no two ways about it, just resign yourself to that and keep moving forward!

First, obtain the Gearmand source code for compile from [HERE]. I dropped min in /usr/local

Unball the file and cd into source code directory, configure and build with the following commands:

david$ cd gearmand-1.1.12/
david$ ./configure -disable-shared -prefix=/usr/local
david$ make && make install

A couple of adjunct notes

If you are having problems location the libevent.. or basically seeing this error:

checking test for a working libevent… no
configure: error: Unable to find libevent

Try setting these two environment variables, to tell the configurator exactly where to locate these libraries, if you’ve managed to build libevent from source:

[gearmand]$ export CPPFLAGS=’-I/usr/local/include’
[gearmand]$ export LDFLAGS=’-L/usr/local/lib’

Gearman — Starting the Java-Gearman-Service process

gearmanlogo
These notes apply to testing on a MAC OSX portable, and may or may not apply to your implementation. They are provided as an adjunct to my main Getting Gearman Going post elsewhere in this blog.

The project page for Java-Gearman-Service is located [HERE] on Google Code.

The full set of instructions for staring up Gearman’s Java-Gearman-Service were not clearly linked to the main Gearman project page, so I’m including the link [HERE] to save you the few Google dorkings I did to find it.

Starting up the Java service should be as simple as this:


java -jar java-gearman-service-0.6.6.jar

HOWEVER, I received this instead.. an error:


david$ java -jar java-gearman-service-0.6.6.jar
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/gearman/impl/Main : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Checking my version shows that I am on 1.6 not 1.7 as I had thought;


david$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06-451-11M4406)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01-451, mixed mode)

To get the proper version, I navigated to the Oracle page located [HERE], agreed to their terms (do I really have a functional choice… not if I want/need to use Java…) and pushed forward.

If you are running the install for 1.7, you should see a dialog like this:
Screen Shot 2013-06-03 at 2.12.16 PM

That has at least resolved this part of the issue, and will attempt to restart the server.


david$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b12)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Now I’m going to restart, specific a custom port and get it kicked off in the background:


david$ java -jar java-gearman-service-0.6.6.jar -p6315 &

At this point the process is running on my Job Service box and next step will be to craft some code to see how it all works.

Updated main website with Feed Dividers

Released enhancement today to display daily dividers in those RSS Feed aggregators that show data more than 1 day old. The most obivous of these are my Blog Updates and the USGS data feed.

This update works best in the USGS feed. The posting to my blogs is infrequent enough that the dividers are just as prevelant as the posts themselves.

Here is a screen shot showing this first implementation worked out. Loving it in the USGS Quakes parsers/agreegators but NOT loving it in my own Blog Roll.

New daily dividers
More changes coming, I’m fairly certain of that.