The Softmax Function Derivative (Part 3)

On Machine Intelligence - 1 Červenec, 2020 - 23:00

Previously I’ve shown how to work out the derivative of the Softmax Function combined with the summation function, typical in artificial neural networks.

In this final part, we’ll look at how the weights in a Softmax layer change in respect to a Loss Function. The Loss Function is a measure of how “bad” the estimate from the network is. We’ll then be modifying the weights in the network in order to improve the “Loss”, i.e. make it less bad.

The Python code is based on the excellent article by Eli Bendersky which can be found here.

Cross Entropy Loss Function

There are different kinds Cross Entropy functions depending on what kind of classification that you want your network to estimate. In this example, we’re going to use the Categorical Cross Entropy. This function is typically used when the network is required to estimate which class something belongs to, when there are many classes. The output of the Softmax Function is a vector of probabilities, each element represents the network’s estimate that the input is in that class. For example:

[0.19091352 0.20353145 0.21698333 0.23132428 0.15724743]

The first element, 0.19091352, represents the network’s estimate that the input is in the first class, and so on.

Usually, the input is in one class, and we can represent the correct class for an input as a one-hot vector. In other words, the class vector is all zeros, except for a 1 in the index corresponding to the class.

[0 0 1 0 0]

In this example, the input is in class 3, represented by a 1 in the third element.

The multi-class Cross Entropy Function is defined as follows:

where M is the number of classes, y is the one-hot vector representing the correct classification c for the observation o (i.e. the input). S is the Softmax output for the class c for the observation o. Here is some code to calculate that (which continues from my previous posts on this topic):

def x_entropy(y, S): return np.sum(-1 * y * np.log(S)) y = np.zeros(5) y[2] = 1 # picking the third class for example purposes xe = x_entropy(y, S) print(xe) 1.5279347484961026 Cross Entropy Derivative

Just like the other derivatives we’ve looked at before, the Cross-Entropy derivative is a vector of partial derivatives with respect to it’s input:

We can make this a little simpler by observing that since Y (i.e. the ground truth classification vector) is zeros, except for the target class, c, then the Cross Entropy derivative vector is also going to be zeros, except for the class c.

To see why this is the case, let’s examine the Cross Entropy function itself. We calculate it by summing up a product. Each product is the value from Y multiplied by the log of the corresponding value from S. Since all the elements in Y are actually 0 (except for the target class, c), then the corresponding derivative will also be 0. No matter how much we change the values in S, the result will still be 0.


We can rewrite this a little, expanding out the XE function:

We already know that is 1, so we are left with:

So we are just looking for the derivative of the log of :

The rest of the elements in the vector will be 0. Here is the code that works that out:

def xe_dir(y, S): return (-1 / S) * y DXE = xe_dir(y, S) print(DXE) [-0. -0. -4.60864 -0. -0. ] Bringing it all together

When we have a neural network layer, we want to change the weights in order to make the loss as small as possible. So we are trying to calculate:

for each of the input instances X. Since XE is a function that depends on the Softmax function, which itself depends on the summation function in the neurons, we can use the calculus chain rule as follows:

In this post, we’ve calculated and in the previous posts, we calculated and . To calculate the overall changes to the weights, we simply carry out a dot product of all those matrices:

print(, DL_shortcut).reshape(W.shape)) [[ 0.01909135 0.09545676 0.07636541 0.02035314 0.10176572] [ 0.08141258 -0.07830167 -0.39150833 -0.31320667 0.02313243] [ 0.11566214 0.09252971 0.01572474 0.07862371 0.06289897]] Shortcut

Now that we’ve seen how to calculate the individual parts of the derivative, we can now look to see if there is a shortcut that avoids all that matrix multiplication, especially since there are lots of zeros in the elements.

Previously, we had established that the elements in the matrix can be calculated using:

where the input and output indices are the same, and

where they are different.

Using this result, we can see that an element in the derivative of the Cross Entropy function XE, with respect to the weights W is (swapping c for t):

We’ve shown above that the derivative of XE with respect to S is just . So each element in the derivative where i = c becomes:

This simplifies to:

Similarly, where i <> c:

Here is the corresponding Python code for that:

def xe_dir_shortcut(W, S, x, y): dir_matrix = np.zeros((W.shape[0] * W.shape[1])) for i in range(0, W.shape[1]): for j in range(0, W.shape[0]): dir_matrix[(i*W.shape[0]) + j] = (S[i] - y[i]) * x[j] return dir_matrix delta_w = xe_dir_shortcut(W, h, x, y)

Let’s verify that this gives us the same results as the longer matrix multiplication above:

print(delta_w.reshape(W.shape)) [[ 0.01909135 0.09545676 0.07636541 0.02035314 0.10176572] [ 0.08141258 -0.07830167 -0.39150833 -0.31320667 0.02313243] [ 0.11566214 0.09252971 0.01572474 0.07862371 0.06289897]]

Now we have a simple function that will calculate the changes to the weights for a seemingly complicated single-layer of a neural network.

Kategorie: Transhumanismus

The Softmax Function Derivative (Part 2)

On Machine Intelligence - 14 Červen, 2020 - 18:58

In a previous post, I showed how to calculate the derivative of the Softmax function. This function is widely used in Artificial Neural Networks, typically in final layer in order to estimate the probability that the network’s input is in one of a number of classes.

In this post, I’ll show how to calculate the derivative of the whole Softmax Layer rather than just the function itself.

The Python code is based on the excellent article by Eli Bendersky which can be found here.

The Softmax Layer

A Softmax Layer in an Artificial Neural Network is typically composed of two functions. The first is the usual sum of all the weighted inputs to the layer. The output of this is then fed into the Softmax function which will output the probability distribution across the classes we are trying to predict. Here’s an example with three inputs and five classes:

For a given output zi, the calculation is very straightforward:

We simply multiply each input to the node by it’s corresponding weight. Expressing this in vector notation gives us the familiar:

The vector w is two dimensional so it’s actually a matrix and we can visualise the formula for our example as follows:

I’ve already covered the Softmax Function itself in the previous post, so I’ll just repeat it here for completeness:

Here’s the python code for that:

import numpy as np # input vector x = np.array([0.1,0.5,0.4]) # using some hard coded values for the weights # rather than random numbers to illustrate how # it works W = np.array([[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 0.1], [0.11, 0.12, 0.13, 0.14, 0.15]]) # Softmax function def softmax(Z): eZ = np.exp(Z) sm = eZ / np.sum(eZ) return sm Z =, x) h = softmax(Z) print(h)

Which should give us the output h (the hypothesis):

[0.19091352 0.20353145 0.21698333 0.23132428 0.15724743] Calculating the Derivative

The Softmax layer is a combination of two functions, the summation followed by the Softmax function itself. Mathematically, this is usually written as:

The next thing to note is that we will be trying to calculate the change in the hypothesis h with respect to changes in the weights, not the inputs. The overall derivative of the layer that we are looking for is:

We can use the differential chain rule to calculate the derivative of the layer as follows:

In the previous post, I showed how to work out dS/dZ and just for completeness, here is a short Python function to carry out the calculation:

def sm_dir(S): S_vector = S.reshape(S.shape[0],1) S_matrix = np.tile(S_vector,S.shape[0]) S_dir = np.diag(S) - (S_matrix * np.transpose(S_matrix)) return S_dir DS = sm_dir(h) print(DS)

The output of that function is a matrix as follows:

[[ 0.154465 -0.038856 -0.041425 -0.044162 -0.030020] [-0.038856 0.162106 -0.044162 -0.047081 -0.032004] [-0.041425 -0.044162 0.1699015 -0.050193 -0.034120] [-0.044162 -0.047081 -0.050193 0.177813 -0.036375] [-0.030020 -0.032004 -0.034120 -0.036375 0.132520]] Derivative of Z

Let’s next look at the derivative of the function Z() with respect to W, dZ/dW. We are trying to find the change in each of the elements of Z(), zk when each of the weights wij are changed.

So right away, we are going to need a matrix to hold all of those values. Let’s assume that the output vector of Z() has K elements. There are (i j) individual weights in W. Therefore, our matrix of derivatives is going to be of dimensions (K, (i j)). Each of the elements of the matrix will be a partial derivative of the output zk with respect to the particular weight wij:

Taking one of those elements, using our example above, we can see how to work out the derivative:

None of the other weights are used in z1. The partial derivative of z1 with respect to w11 is x1. Likewise, the partial derivative of z1 with respect to w12 is x2, and with respect to w13 is x3. The derivative of z1 with respect to the rest of the weights is 0.

This makes the whole matrix rather simple to derive, since it is mostly zeros. Where the elements are not zero (i.e. where i = k), then the value is xj. Here is the corresponding Python code to calculate that matrix.

# derivative of the Summation Function Z w.r.t weight matrix W given inputs x def z_dir(Z, W, x): dir_matrix = np.zeros((W.shape[0] * W.shape[1], Z.shape[0])) for k in range(0, Z.shape[0]): for i in range(0, W.shape[1]): for j in range(0, W.shape[0]): if i == k: dir_matrix[(i*W.shape[0]) + j][k] = x[j] return dir_matrix

If we use the example above, then the derivative matrix will look like this:

DZ = z_dir(Z, W, x) print(DZ) [[0.1 0. 0. 0. 0. ] [0.5 0. 0. 0. 0. ] [0.4 0. 0. 0. 0. ] [0. 0.1 0. 0. 0. ] [0. 0.5 0. 0. 0. ] [0. 0.4 0. 0. 0. ] [0. 0. 0.1 0. 0. ] [0. 0. 0.5 0. 0. ] [0. 0. 0.4 0. 0. ] [0. 0. 0. 0.1 0. ] [0. 0. 0. 0.5 0. ] [0. 0. 0. 0.4 0. ] [0. 0. 0. 0. 0.1] [0. 0. 0. 0. 0.5] [0. 0. 0. 0. 0.4]]

Going back to the formula for the derivative of the Softmax Layer:

We now just take the dot product of both of the derivative matrices to get the derivative for the whole layer:

DL =, np.transpose(DZ)) print(DL) [[ 0.01544 0.07723 0.06178 -0.00388 -0.01942 -0.01554 -0.00414 -0.02071 -0.01657 -0.00441 -0.02208 -0.01766 -0.00300 -0.01501 -0.01200] [-0.00388 -0.01942 -0.01554 0.01621 0.0810 0.06484 -0.00441 -0.02208 -0.01766 -0.00470 -0.02354 -0.01883 -0.00320 -0.01600 -0.01280] [-0.00414 -0.02071 -0.01657 -0.00441 -0.02208 -0.01766 0.01699 0.08495 0.06796 -0.00501 -0.02509 -0.02007 -0.00341 -0.01706 -0.01364] [-0.00441 -0.02208 -0.01766 -0.00470 -0.02354 -0.01883 -0.00501 -0.02509 -0.02007 0.01778 0.08890 0.07112 -0.00363 -0.01818 -0.01455] [-0.00300 -0.01501 -0.01200 -0.00320 -0.01600 -0.01280 -0.00341 -0.01706 -0.01364 -0.00363 -0.01818 -0.01455 0.01325 0.06626 0.05300]] Shortcut!

While it is instructive to see the matrices being derived explicitly, it is possible to manipulate the formulas to make it easier. Starting with one of the entries in the matrix DL, it looks like this:

Since the matrix dZ/dW is mostly zeros, then we can try to simplify it. dZ/dW is non-zero when i = k, and then it is equal to xj as we worked out above. So we can simplify the non-zero entries to:

In the previous post, we established that when the indices are the same, then:


When the indices are not the same, we use:

What these two formulas show is that it is possible to calculate each of the entries in the derivative matrix by using only the input values X and the Softmax output S, skipping the matrix dot product altogether.

Here is the Python code corresponding to that:

def l_dir_shortcut(W, S, x): dir_matrix = np.zeros((W.shape[0] * W.shape[1], W.shape[1])) for t in range(0, W.shape[1]): for i in range(0, W.shape[1]): for j in range(0, W.shape[0]): dir_matrix[(i*W.shape[0]) + j][t] = S[t] * ((i==t) - S[i]) * x[j] return dir_matrix DL_shortcut = np.transpose(l_dir_shortcut(W, h, x))

To verify that, we can cross check it with the matrix we derived from first principle:

print(DL_shortcut) [[ 0.01544 0.07723 0.06178 -0.00388 -0.01942 -0.01554 -0.00414 -0.02071 -0.01657 -0.00441 -0.02208 -0.01766 -0.00300 -0.01501 -0.01200] [-0.00388 -0.01942 -0.01554 0.01621 0.08105 0.06484 -0.00441 -0.02208 -0.01766 -0.00470 -0.02354 -0.01883 -0.00320 -0.01600 -0.01280] [-0.00414 -0.02071 -0.01657 -0.00441 -0.02208 -0.01766 0.01699 0.08495 0.06796 -0.00501 -0.02509 -0.02007 -0.00341 -0.01706 -0.01364] [-0.00441 -0.02208 -0.01766 -0.00470 -0.02354 -0.01883 -0.00501 -0.02509 -0.02007 0.01778 0.08890 0.07112 -0.00363 -0.01818 -0.01455] [-0.00300 -0.01501 -0.01200 -0.00320 -0.01600 -0.01280 -0.00341 -0.01706 -0.01364 -0.00363 -0.01818 -0.01455 0.01325 0.06626 0.05300]]

Lastly, it’s worth noting that in order to actually modify each of the weights, we need to sum up the individual adjustments in each of the corresponding columns.

Kategorie: Transhumanismus

Make Music A Full Body Experience With A “Vibro-Tactile” Suit

Futurism - Enhanced Humans - 27 Září, 2018 - 17:09

Tired: Listening to music.
Wired: Feeling the music.

A mind-bending new suit straps onto your torso, ankles and wrists, then uses actuators to translate audio into vivid vibration. The result: a new way for everyone to experience music, according to its creators. That’s especially exciting for people who have trouble hearing.


The Music: Not Impossible suit was created by design firm Not Impossible Labs and electronics manufacturing company Avnet. The suit can create sensations to go with pre-recorded music, or a “Vibrotactile DJ” can adjust the sensations in real time during a live music event.”

Billboard writer Andy Hermann tried the suit out, and it sounds like a trip.

“Sure enough, a pulse timed to a kickdrum throbs into my ankles and up through my legs,” he wrote. “Gradually, [the DJ] brings in other elements: the tap of a woodblock in my wrists, a bass line massaging my lower back, a harp tickling a melody across my chest.”


To show the suit off, Not Impossible and Avnet organized a performance this past weekend by the band Greta Van Fleet at the Life is Beautiful Festival in Las Vegas. The company allowed attendees to don the suits. Mandy Harvey, a deaf musician who stole the show on America’s Got Talent last year, talked about what the performance meant to her in a video Avnet posted to Facebook.

“It was an unbelievable experience to have an entire audience group who are all experiencing the same thing at the same time,” she said. “For being a deaf person, showing up at a concert, that never happens. You’re always excluded.”

READ MORE: Not Impossible Labs, Zappos Hope to Make Concerts More Accessible for the Deaf — and Cooler for Everyone [Billboard]

More on accessible design: New Tech Allows Deaf People To Sense Sounds

The post Make Music A Full Body Experience With A “Vibro-Tactile” Suit appeared first on Futurism.

Kategorie: Transhumanismus

“Synthetic Skin” Could Give Prosthesis Users a Superhuman Sense of Touch

Futurism - Enhanced Humans - 20 Září, 2018 - 21:37

Today’s prosthetics can give people with missing limbs the ability to do almost anything — run marathons, climb mountains, you name it. But when it comes to letting those people feel what they could with a natural limb, the devices, however mechanically sophisticated, invariably fall short.

Now researchers have created a “synthetic skin” with a sense of touch that not only matches the sensitivity of natural skin, but in some cases even exceeds it. Now the only challenge is getting that information back into the wearer’s nervous system.


When something presses against your skin, your nerves receive and transmit that pressure to the brain in the form of electrical signals.

To mimic that biological process, the researchers suspended a flexible polymer, dusted with magnetic particles, over a magnetic sensor. The effect is like a drum: Applying even the tiniest amount of pressure to the membrane causes the magnetic particles to move closer to the sensors, and they transmit this movement electronically.

The research, which could open the door to super-sensitive prosthetics, was published Wednesday in the journal Science Robotics.


Tests shows that the skin can sense extremely subtle pressure, such as a blowing breeze, dripping water, or crawling ants. In some cases, the synthetic skin responded to pressures so gentle that natural human skin wouldn’t be able to detect them.

While the sensing ability of this synthetic skin is remarkable, the team’s research doesn’t address how to transmit the signals to the human brain. Other scientists are working on that, though, so eventually this synthetic skin could give prosthetic wearers the ability to feel forces even their biological-limbed friends can’t detect.

READ MORE: A Skin-Inspired Tactile Sensor for Smart Prosthetics [Science Robotics]

More on synthetic skin: Electronic Skin Lets Amputees Feel Pain Through Their Prosthetics

The post “Synthetic Skin” Could Give Prosthesis Users a Superhuman Sense of Touch appeared first on Futurism.

Kategorie: Transhumanismus

People Are Zapping Their Brains to Boost Creativity. Experts Have Concerns.

Futurism - Enhanced Humans - 19 Září, 2018 - 21:56

There’s a gadget that some say can help alleviate depression and enhance creativity. All you have to do is place a pair of electrodes on your scalp and the device will deliver electrical current to your brain. It’s readily available on Amazon or you can even make your own.

But in a new paper published this week in the Creativity Research Journal, psychologists at Georgetown University warned that the practice is spreading before we have a good understanding of its health effects, especially since consumers are already buying and building unregulated devices to shock them. They also cautioned that the technique, which scientists call transcranial electrical stimulation (tES), could have adverse effects on the brains of young people.

“There are multiple potential concerns with DIY-ers self-administering electric current to their brains, but this use of tES may be inevitable,” said co-author Adam Green in a press release. “And, certainly, anytime there is risk of harm with a technology, the scariest risks are those associated with kids and the developing brain”


Yes, there’s evidence that tES can help patients with depression, anxiety, Parkinson’s disease, and other serious conditions, the Georgetown researchers acknowledge.

But that’s only when it’s administered by a trained health care provider. When administering tES at home, people might ignore safety directions, they wrote, or their home-brewed devices could deliver unsafe amounts of current. And because it’s not yet clear what effects of tES might be on the still-developing brains of young people, the psychologists advise teachers and parents to resist the temptation to use the devices to encourage creativity among children.

The takeaway: tES is likely here to stay, and it may provide real benefits. But for everyone’s sake, consumer-oriented tES devices should be regulated to protect users.

READ MORE: Use of electrical brain stimulation to foster creativity has sweeping implications [Eurekalert]

More on transcranial electrical stimulation: DARPA’s New Brain Device Increases Learning Speed by 40%

The post People Are Zapping Their Brains to Boost Creativity. Experts Have Concerns. appeared first on Futurism.

Kategorie: Transhumanismus

Military Pilots Can Control Three Jets at Once via a Neural Implant

Futurism - Enhanced Humans - 19 Září, 2018 - 16:25

The military is making it easier than ever for soldiers to distance themselves from the consequences of war. When drone warfare emerged, pilots could, for the first time, sit in an office in the U.S. and drop bombs in the Middle East.

Now, one pilot can do it all, just using their mind — no hands required.

Earlier this month, DARPA, the military’s research division, unveiled a project that it had been working on since 2015: technology that grants one person the ability to pilot multiple planes and drones with their mind.

“As of today, signals from the brain can be used to command and control … not just one aircraft but three simultaneous types of aircraft,” Justin Sanchez, director of DARPA’s Biological Technologies Office, said, according to Defense One.


Sanchez may have unveiled this research effort at a “Trajectory of Neurotechnology” session at DARPA’s 60th anniversary event, but his team has been making steady progress for years. Back in 2016, a volunteer equipped with a brain-computer interface (BCI) was able to pilot an aircraft in a flight simulator while keeping two other planes in formation — all using just his thoughts, a spokesperson from DARPA’s Biological Technologies Office told Futurism.

In 2017, Copeland was able to steer a plane through another simulation, this time receiving haptic feedback — if the plane needed to be steered in a certain direction, Copeland’s neural implant would create a tingling sensation in his hands.


There’s a catch. The DARPA spokesperson told Futurism that because this BCI makes use of electrodes implanted in and on the brain’s sensory and motor cortices, experimentation has been limited to volunteers with varying degrees of paralysis. That is: the people steering these simulated planes already had brain electrodes, or at least already had reason to undergo surgery.

To try and figure out how to make this technology more accessible and not require surgical placement of a metal probe into people’s brains, DARPA recently launched the NExt-Generation Nonsurgical Neurotechnology (N3) program. The plan is to make a device with similar capabilities, but it’ll look more like an EEG cap that the pilot can take off once a mission is done.

“The envisioned N3 system would be a tool that the user could wield for the duration of a task or mission, then put aside,” said Al Emondi, head of N3, according to the spokesperson. “I don’t like comparisons to a joystick or keyboard because they don’t reflect the full potential of N3 technology, but they’re useful for conveying the basic notion of an interface with computers.”

READ MORE: It’s Now Possible To Telepathically Communicate with a Drone Swarm [Defense One]

More on DARPA research: DARPA Is Funding Research Into AI That Can Explain What It’s “Thinking”

The post Military Pilots Can Control Three Jets at Once via a Neural Implant appeared first on Futurism.

Kategorie: Transhumanismus

Lab-Grown Bladders Can Save People From a Lifetime of Dialysis

Futurism - Enhanced Humans - 12 Září, 2018 - 22:54

Today, about 10 people on Earth have bladders they weren’t born with. No, they didn’t receive bladder transplants — doctors grew these folks new bladders using the recipients’ own cells.

On Tuesday, the BBC published a report on the still-nascent procedure of transplanting lab-grown bladders. In it, the publication talks to Luke Massella, who underwent the procedure more than a decade ago. Massella was born with spina bifida, which carries with it a risk of damage to the bladder and urinary tract. Now, he lives a normal life, he told the BBC.

“I was kind of facing the possibility I might have to do dialysis [blood purification via machine] for the rest of my life,” he said. “I wouldn’t be able to play sports, and have the normal kid life with my brother.”

All that changed after Anthony Atala, a surgeon at Boston Children’s Hospital, decided he was going to grow a new bladder for Massella.


To do that, Atala first removed a small piece of Massella’s own bladder. He then removed cells from this portion of bladder and multiplied them in a petri dish. Once he had enough cells, he coated a scaffold with the cells and placed the whole thing in a temperature controlled, high oxygen environment. After a few weeks, the lab-created bladder was ready for transplantation into Massella.

“So it was pretty much like getting a bladder transplant, but from my own cells, so you don’t have to deal with rejection,” said Massella.

The number of people with lab-grown bladders might still be low enough to count on your fingers, but researchers are making huge advances in growing everything from organs to skin in the lab. Eventually, we might reach a point when we can replace any body part we need to with a perfect biological match that we built ourselves.

READ MORE: “A New Bladder Made From My Cells Gave Me My Life Back” [BBC]

More on growing organs: The FDA Wants to Expedite Approval of Regenerative Organ Therapies

The post Lab-Grown Bladders Can Save People From a Lifetime of Dialysis appeared first on Futurism.

Kategorie: Transhumanismus
Syndikovat obsah