import pandas as pd
from scipy.spatial.distance import cosine
from openai import OpenAI
'display.max_colwidth', None) pd.set_option(
Warning: If you’re reading this post more than 3 hours after it was posted, some (maybe all) of the information it contains will probably be out of date.
Hello friends, welcome to the new rapidly evolving world of AI. Even the wacky image that you see at the top of this blog was created using AI (specifically, using dall-e-3), just for this blog post!
Like me, I’m sure you’re dwelling on the question of whether AI is going to take all of our jobs. I think, for now, I’ve settled on the opinion that for those of us who adapt and learn to work with AI, our jobs are pretty safe. However, there is no question that AI is going to fundamentally change our jobs, hopefully for the better. In my day-to-day I have already found that GitHub Copilot has boosted my coding efficiency, chatGPT has boosted my efficiency for mundane writing tasks, and the various other ways that AI has snuck into my workday are probably helping me out in other ways too.
And, as you’ll see in this blog post, using the tools provided by OpenAI, NLP tasks like generating synthetic text, and labeling mass amounts of text data are now fairly straightforward too.
In this blog post, I will walk through an example of generating a small collection of synthetic doctor’s notes using chatGPT and then using text OpenAI’s embedding models to automatically label these doctor’s notes in terms of whether they correspond to a patient experiencing a chronic or an acute event. Fortunately, the documentation provided on the OpenAI website is very good, and that will certainly be the best point of reference.
Note that, for now, if you want to use these tools effectively, you’ll want to be fairly proficient in Python (if you’re an R user who is interested in learning Python, check out my “An introduction to Python for R Users” blog post).
Getting set up
As usual, the first thing you’ll want to do is to import the libraries you’ll need.
Then you’ll need to set up your OpenAI API key. An API key is literally a jumble of letters that OpenAI uses to identify you so that they can track your usage. After you create an OpenAI account, see the OpenAI API keys website for information on creating an API key.
Once you have created an API key, and you have saved it somewhere on your computer, say in a file called “api_key.txt”, you will need to use it to connect to the OpenAI client.
To do this, either create a local variable called your_key
or read in your key from your text file using code like the following:
with open('api_key.txt', 'r') as file:
= file.read().strip() your_key
Then you can set up your OpenAI client which is how you will connect to the OpenAI API:
# Define a local variable called `your_key` that contains your OpenAI key.
# Try really hard to avoid writing your key in your notebook
# And when you do, try even harder to avoid uploading it to GitHub
= OpenAI(api_key=your_key) client
Note that it is really, really important that you do not write your API key in plain text in your notebook file, especially if you are going to upload your notebook publicly to GitHub. This would allow someone else to pretend to be you and use your OpenAI account.
Instead, you will want to only ever define your key in a local variable (that is not saved in your code) or load it in from a file that should not be uploaded to GitHub!
Once you’re set up, let’s generate some fake doctor’s notes to analyze!
Creating synthetic doctor’s notes with chatGPT
The first thing I did was use chatGPT-4 to create a collection of synthetic doctor’s notes.
Note that at the time of writing, you have to pay to access chatGPT-4 through the API. If you don’t want to pay OpenAI, you can replace model="gpt-4"
with model="gpt-3.5-turbo"
, which is currently free (but is also less good).
The following code uses chatGPT-4 to generate 50 doctor’s notes using the prompt:
“Provide a collection of 50 doctors notes that resemble the kind that you would enter into an EHR for your patients on a day-to-day basis. Each note should have around 3 sentences. Each note should appear on a new line. Do not add any numbers or superfluous text.”
= client.chat.completions.create(
completion ="gpt-4",
model=[
messages"role": "system",
{"content": "You are a doctor working in a large hospital."},
"role": "user",
{"content": """Provide a collection of 50 doctors notes that
resemble the kind that you would enter into an
EHR for your patients on a day-to-day basis.
Each note should have around 3 sentences.
Each note should appear on a new line.
Do not add any numbers or superfluous text."""}
] )
The text created by ChatGPT can be extracted using completion.choices[0].message.content
. But since this is just one big string value, I want to use the .split
method to separate each note into a list element.
# extract the notes and place them in a list, where each list element is detected by the presence of a line break ('\n')
= completion.choices[0].message.content.split('\n') notes_list
Since the output provided tended to add additional blank lines between the notes (even when I asked it not to), I used the following code to remove any entries in my notes_list
that contain blank strings:
# remove any empty notes
# may or may not be required for you
= [note for note in notes_list if note != '']
notes_list # place the notes in a DataFrame
= pd.DataFrame({'notes': notes_list}) notes_df
Let’s take a look at the notes it created:
# look at the first 5
5) notes_df.head(
notes | |
---|---|
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. |
However, since chatGPT returns a different collection of notes every time, you can load the notes I created above using the following code (if you prefer to follow along with my doctor’s notes, that is):
= pd.read_csv("https://raw.githubusercontent.com/rlbarter/personal-website-quarto/main/blog/data/doctors_notes.csv", index_col=0) notes_df
If you just loaded the notes using the URL above, take a look at the first 5 notes.
# look at the first 5
5) notes_df.head(
notes | |
---|---|
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. |
Computing text embeddings
My goal in the rest of this post will be to identify whether each note corresponds to a patient with a chronic condition (which I define as a condition that they have been experiencing for more than one month) or an acute condition (e.g., a new condition).
To do this, I will use something called text embeddings. The idea is that I will use OpenAI’s pre-trained text embedding models to embed each note into a numeric 1,536-dimensional that somehow approximately respects semantic distance.
The equivalent of this in a two-dimensional world would be taking each individual doctor’s note and placing it on a scatterplot so that notes that we might intuitively consider to be more “similar” to one another are closer together in the plot, and notes that we might consider to be more “different” from one another are further apart. Unfortunately, it is unlikely that we would be able to come up with any two quantifiable values that we could use to define the two axes of our scatterplot, such that when we place each note in the scatterplot we achieve our desired property that “similar” notes are closer together. But it turns out that once your space has, say, 1,536 dimensions, this task somehow becomes possible (although what each dimension/axis represents is not necessarily going to be meaningful to us).
Fortunately, OpenAI has already trained some general text embedding models that we can use to embed each of the doctor’s notes in such a 1,536-dimensional space. In this post, I will be using the “text-embedding-3-small” OpenAI model to compute the text embeddings.
After computing the embeddings, our task becomes determining how close each doctor’s note is to the embedding of the following text: “Patient presenting with ongoing chronic condition defined as ongoing for more than one month”, which I will call the target. The closer a note’s embedding is to the target’s embedding, the more likely the note corresponds to a patient who is presenting with a chronic condition!
I hear you. That sounds crazy, right? But it works… eerily well! Let me show you.
First, I need to compute the embeddings for each of the chatGPT-generated doctor’s notes. I can do that using this custom function get_embedding()
function that I literally just copy-and-pasted from the OpenAI documentation:
def get_embedding(text, model="text-embedding-3-small"):
= text.replace("\n", " ")
text return client.embeddings.create(input = [text], model=model).data[0].embedding
This function basically takes a string text
, and uses the “text-embedding-3-small” model to create a list of 1,536 numbers corresponding to the 1,536-dimensional embedding of the text entry.
For example, the following code creates an embedding of the text “hello, how are you”:
'hello, how are you') get_embedding(
[0.020681990310549736,
-0.03974681720137596,
-0.000452458014478907,
0.029285797849297523,
-0.013973292894661427,
-0.059632860124111176,
0.0007248803740367293,
0.02625361829996109,
0.0012578805908560753,
-0.039999499917030334,
0.004143978469073772,
-0.032242171466350555,
-0.011370671913027763,
-0.04505313187837601,
-0.005956968758255243,
0.06473702937364578,
-0.02131369337439537,
0.05190080404281616,
-0.0056916531175374985,
0.02920999377965927,
0.058015696704387665,
0.01499665342271328,
0.009684022516012192,
0.04065646976232529,
0.0349964015185833,
0.025268161669373512,
0.023827875033020973,
0.01146542839705944,
0.02449748106300831,
-0.017624542117118835,
0.05356850102543831,
-0.03800331428647041,
0.012665665708482265,
0.013291052542626858,
0.002278872299939394,
0.013834318146109581,
-0.0029911184683442116,
0.009267098270356655,
-0.001953544793650508,
-0.060896266251802444,
-0.006733964662998915,
-0.01929224096238613,
0.0412629060447216,
0.016411669552326202,
0.01106745470315218,
0.0036165055353194475,
0.003145886119455099,
-0.018572097644209862,
0.04836326092481613,
0.020088188350200653,
0.02371416985988617,
-0.01297520101070404,
0.014529192820191383,
0.13715557754039764,
0.04500259459018707,
-0.04282953217625618,
0.036563027650117874,
0.00012782136036548764,
0.005268411710858345,
0.03388460353016853,
0.002237811451777816,
-0.04341069981455803,
0.01293729804456234,
0.027264345437288284,
-0.0006838195840828121,
-0.030422866344451904,
-0.013758513145148754,
0.012728836387395859,
-0.016828594729304314,
3.400344212423079e-05,
0.02913418971002102,
0.04070700705051422,
-0.008610125631093979,
-0.018572097644209862,
-0.03648722544312477,
0.016007380560040474,
-0.0004595646751113236,
-0.032823339104652405,
0.01567889377474785,
0.02673371322453022,
-0.0628671869635582,
-0.023524656891822815,
-0.03421308845281601,
-0.006399161648005247,
-0.032444316893815994,
0.010827407240867615,
-0.014794507995247841,
0.01747293397784233,
0.005975920241326094,
0.004292428959161043,
-0.02663264237344265,
0.012634080834686756,
-0.02045457623898983,
0.005565312225371599,
0.022488662973046303,
-0.04763048142194748,
4.67115496576298e-05,
0.02693585865199566,
0.036360882222652435,
-0.00024103456235025078,
0.05119329318404198,
-0.02443431131541729,
-0.0230319295078516,
-0.002187275094911456,
0.05336635559797287,
0.03901404142379761,
0.021717984229326248,
-0.020290333777666092,
-0.014124901965260506,
0.011219063773751259,
-0.10142640024423599,
-0.002395737450569868,
-0.012570910155773163,
0.006550770718604326,
0.04090915247797966,
0.03982262313365936,
0.05958232283592224,
-0.04581117630004883,
0.025470305234193802,
-0.02701166458427906,
0.007056133821606636,
-0.015982111915946007,
0.03939306363463402,
-0.04042905569076538,
-0.061755385249853134,
-0.010941113345324993,
-0.014491289854049683,
0.02185695990920067,
-0.03727053850889206,
-0.006525502540171146,
-0.008875441737473011,
0.023082464933395386,
0.009854583069682121,
0.0023704692721366882,
-0.028376145288348198,
0.013152077794075012,
-0.020745160058140755,
0.03964574262499809,
-0.045659568160772324,
0.014327047392725945,
0.061654310673475266,
-0.04992988705635071,
-0.013632172718644142,
0.01441548578441143,
0.005697970278561115,
-0.06215967610478401,
-0.0039228820241987705,
0.04992988705635071,
0.010953747667372227,
-0.014251242391765118,
0.03524908423423767,
0.012861493974924088,
-0.0747937560081482,
0.023309879004955292,
0.002479438204318285,
0.008066860027611256,
0.030448133125901222,
-0.021086279302835464,
-0.023486755788326263,
0.03287387639284134,
0.013000468723475933,
0.0094060730189085,
-0.03125671669840813,
-0.04914657399058342,
-0.029285797849297523,
0.00018635268497746438,
-0.021490570157766342,
0.04257684946060181,
0.0014047517906874418,
0.017422396689653397,
-0.03153466433286667,
0.03924145549535751,
-0.04664502665400505,
-0.02923526242375374,
0.0100061921402812,
-0.02749175950884819,
0.028805702924728394,
-0.01360690500587225,
-0.05020783469080925,
0.003398567670956254,
0.023865777999162674,
0.02221071347594261,
0.019608093425631523,
-0.015249335207045078,
-0.013291052542626858,
0.016310598701238632,
-0.025167088955640793,
0.010656846687197685,
-0.028199266642332077,
0.017460299655795097,
0.010770553722977638,
0.0555899553000927,
0.004349282011389732,
0.018799511715769768,
0.025697719305753708,
-0.04399186745285988,
0.07878612726926804,
-0.0081489821895957,
-0.04550795629620552,
0.023385683074593544,
-0.008824905380606651,
-0.008212151937186718,
0.00013364487676881254,
-0.02683478593826294,
0.0020340869668871164,
0.007744691334664822,
-0.02515445463359356,
0.06332200765609741,
0.008654344826936722,
-0.0013494776794686913,
0.04816111549735069,
-0.020568283274769783,
0.003951308783143759,
0.04667029157280922,
-0.009046001359820366,
-0.03171154111623764,
0.04404240474104881,
0.03380879759788513,
0.00862907711416483,
0.004924132954329252,
-0.06377683579921722,
0.04295587167143822,
0.08070650696754456,
-0.02221071347594261,
0.026278886944055557,
0.012071863748133183,
0.027087468653917313,
0.005540044046938419,
0.013316321186721325,
-0.003521749982610345,
0.007605716586112976,
0.028780434280633926,
0.025621915236115456,
-0.021440034732222557,
-0.03997423127293587,
0.011484378948807716,
0.015779966488480568,
-0.04282953217625618,
0.019140630960464478,
-0.00490834005177021,
-0.021819056943058968,
-0.026657909154891968,
-0.02341095171868801,
0.017510835081338882,
0.03393514081835747,
-0.005634800065308809,
0.021149450913071632,
-0.0350722074508667,
-0.00835744384676218,
0.011901304125785828,
-0.003910247702151537,
0.01939331367611885,
0.014781873673200607,
-0.041338711977005005,
-0.003625981044024229,
-0.002013556659221649,
-0.008231103420257568,
0.004658817313611507,
0.00835744384676218,
-0.02357519418001175,
-0.028881506994366646,
-0.037523217499256134,
-0.008104762993752956,
-0.03054920583963394,
-0.021149450913071632,
-0.045836444944143295,
-0.015552553348243237,
0.01571679674088955,
0.012002376839518547,
0.02059355191886425,
0.04214729368686676,
-0.0374726839363575,
0.03459211066365242,
0.01058735977858305,
-0.022981392219662666,
0.02759283222258091,
0.02728961408138275,
0.008679613471031189,
0.03075135126709938,
-0.008603808470070362,
-0.0325959287583828,
-0.009791412390768528,
0.003357506822794676,
-0.02835087664425373,
-0.043056946247816086,
0.030877692624926567,
-0.024901771917939186,
-0.02213490940630436,
0.035678643733263016,
0.019191168248653412,
0.01446602214127779,
-0.00023274344857782125,
-0.002964271232485771,
-0.028805702924728394,
0.0007979211513884366,
0.01111799106001854,
0.036739904433488846,
-0.049980420619249344,
0.01627269573509693,
-0.012949932366609573,
0.052456703037023544,
0.004175563808530569,
0.013240516185760498,
-0.01747293397784233,
-0.0157546978443861,
0.04368865117430687,
0.021402131766080856,
-0.0054516056552529335,
0.0010115160839632154,
0.0004481150535866618,
-0.03982262313365936,
0.03297495096921921,
0.05766194313764572,
-0.02741595357656479,
-0.014604996889829636,
-0.010833724401891232,
0.02855302207171917,
0.0026926384307444096,
0.004750414285808802,
-0.034263625741004944,
0.020201895385980606,
-0.0313577875494957,
-0.016866497695446014,
-0.018812146037817,
-0.032444316893815994,
-0.014289145357906818,
0.024825967848300934,
-0.003780748462304473,
-0.04176827147603035,
-0.021490570157766342,
-0.0008843856048770249,
-0.06908315420150757,
0.006279137916862965,
-0.005394752137362957,
0.04414347559213638,
0.008717515505850315,
-0.03236851468682289,
0.033303435891866684,
0.002240970032289624,
0.014920849353075027,
-0.015186164528131485,
-0.0399489626288414,
-0.019936578348279,
-0.051218561828136444,
-0.003907089587301016,
-0.019090095534920692,
0.01955755613744259,
0.002414688700810075,
-0.031281981617212296,
-0.021920129656791687,
0.018989022821187973,
0.01154754962772131,
-0.0017545579466968775,
-0.027643367648124695,
-0.012659348547458649,
-0.0206314530223608,
0.07332820445299149,
-0.0028174000326544046,
-0.013253150507807732,
0.025280794128775597,
0.0007900248165242374,
-0.008420614525675774,
2.5046078008017503e-05,
-0.024067923426628113,
0.04245050996541977,
0.022804515436291695,
-0.06023929640650749,
-0.008572223596274853,
0.018445758149027824,
0.049197107553482056,
0.025204990059137344,
0.0005176024860702455,
0.0021714826580137014,
0.03476898744702339,
-0.06498970836400986,
0.02979116141796112,
0.037220001220703125,
0.004434562288224697,
0.0020909402519464493,
0.011724426411092281,
0.0931384414434433,
0.03727053850889206,
-0.013948025181889534,
-0.025722987949848175,
0.027062200009822845,
0.00029196569812484086,
-0.009791412390768528,
-0.006696062628179789,
-0.03974681720137596,
0.033758264034986496,
0.08611389249563217,
-0.012867811135947704,
0.015514650382101536,
-0.037801168859004974,
0.0338340662419796,
0.007668886799365282,
0.028022389858961105,
0.0259756688028574,
-0.04942452162504196,
-0.04308221489191055,
0.03171154111623764,
-0.055135127156972885,
0.009551364928483963,
-0.004949401132762432,
0.014807142317295074,
0.035880789160728455,
-0.03792750835418701,
-0.011105356737971306,
0.04760521650314331,
0.012166619300842285,
-0.0026942177210003138,
-0.05745979771018028,
0.03643668815493584,
0.060694120824337006,
-0.010220970958471298,
-0.018281513825058937,
-0.027820244431495667,
-0.02971535734832287,
-0.0034522623755037785,
-0.0050504738464951515,
0.025167088955640793,
0.012318228371441364,
0.026607373729348183,
0.02683478593826294,
-0.031004033982753754,
0.019026925787329674,
0.025760889053344727,
-0.04502786323428154,
0.03600712865591049,
-0.06888100504875183,
-0.03355611860752106,
0.0016124245012179017,
0.009835631586611271,
0.014327047392725945,
0.009785095229744911,
-0.017864590510725975,
0.02855302207171917,
0.015640990808606148,
0.013569002039730549,
0.003704944159835577,
0.029563747346401215,
-0.027087468653917313,
-0.037801168859004974,
-0.048060040920972824,
-0.009349219501018524,
-0.01725815422832966,
0.004706195089966059,
0.020151358097791672,
-0.01653801091015339,
-0.012810957618057728,
-0.039797354489564896,
0.03161047026515007,
0.013303686864674091,
0.018243612721562386,
0.00651918537914753,
0.01388485450297594,
-0.027138004079461098,
-0.006377052050083876,
0.017662445083260536,
-0.005874847527593374,
-0.0029579540714621544,
-0.00447878148406744,
-0.034920599311590195,
0.06140163168311119,
0.00862907711416483,
-0.008003690280020237,
-0.012539324350655079,
-0.004687243606895208,
-0.005878006108105183,
-0.011730743572115898,
0.003411201760172844,
0.003979735542088747,
0.044320352375507355,
0.013960658572614193,
-0.03719473257660866,
-0.03140832483768463,
-0.027466490864753723,
0.020543014630675316,
0.013240516185760498,
-0.02327197603881359,
-0.013695343397557735,
0.02025243081152439,
0.0024004753213375807,
0.04816111549735069,
-0.01360690500587225,
0.019519655033946037,
-0.002253604121506214,
0.014832410030066967,
-0.06049197539687157,
0.03178734704852104,
0.018129905685782433,
0.045634299516677856,
0.04022691026329994,
-0.013695343397557735,
-0.03676517307758331,
0.05013203248381615,
-0.03724526986479759,
-0.00802895799279213,
-0.007321449462324381,
-0.019759701564908028,
-0.00988616794347763,
-0.024585921317338943,
-0.000546818773727864,
0.002975326031446457,
0.07534965872764587,
0.019987115636467934,
-0.007908934727311134,
-0.02797185443341732,
0.004289270378649235,
-0.04237470403313637,
0.005915908142924309,
0.006209650542587042,
-0.02289295382797718,
-0.016222158446907997,
-0.007005597464740276,
0.02595040202140808,
0.01771298050880432,
0.03219163790345192,
-0.010239922441542149,
-0.050814270973205566,
0.025647183880209923,
0.010378897190093994,
0.018824780359864235,
-0.013392125256359577,
-0.02563454955816269,
0.006449698004871607,
-0.05948125198483467,
-0.027921317145228386,
-0.01785195618867874,
0.018167806789278984,
0.014693435281515121,
-0.03727053850889206,
-0.02776970900595188,
0.012596177868545055,
-0.00855958927422762,
-0.013569002039730549,
0.055893171578645706,
-0.02739068679511547,
0.006822403520345688,
-0.06468649208545685,
0.014314413070678711,
0.04141451418399811,
-0.003040075534954667,
-0.014377583749592304,
-0.007902617566287518,
0.01255827583372593,
0.01733395829796791,
-0.013720611110329628,
0.014554460532963276,
0.014440753497183323,
0.018951119855046272,
-0.0007438315078616142,
-0.015097726136446,
0.004245051182806492,
0.004570378456264734,
0.006569721736013889,
-0.02741595357656479,
-0.03168627247214317,
0.023688901215791702,
0.03547649830579758,
-0.03229270875453949,
-0.014743971638381481,
-0.005846420768648386,
0.0059853955172002316,
0.006525502540171146,
0.04712511971592903,
-0.0026926384307444096,
-0.0008496419177390635,
-0.004990461748093367,
-0.011692841537296772,
-0.04543215408921242,
0.0009396597160957754,
-0.013177345506846905,
-0.08060543239116669,
0.03340451046824455,
0.007706788834184408,
0.00399868655949831,
0.021187352016568184,
-0.02203383669257164,
0.016121085733175278,
-0.012463520281016827,
-0.03840760514140129,
-0.022147543728351593,
0.0014797666808590293,
-0.03643668815493584,
0.030448133125901222,
0.0022630796302109957,
0.014655533246695995,
0.028856240212917328,
0.05250723659992218,
0.0012602495262399316,
-0.0028237169608473778,
0.029058385640382767,
-0.02749175950884819,
0.015640990808606148,
-0.04328436031937599,
-0.032722268253564835,
-0.02883097156882286,
-0.006446539424359798,
0.011162210255861282,
-0.023082464933395386,
-0.0059032742865383625,
-0.0058053601533174515,
-0.0521029494702816,
0.02055564895272255,
-0.003995527978986502,
-0.019759701564908028,
0.009317634627223015,
-0.024585921317338943,
-0.049879349768161774,
-0.021048378199338913,
-0.003600712865591049,
0.01647484116256237,
-0.04368865117430687,
0.0026452606543898582,
0.019671263173222542,
-0.019683897495269775,
0.013227881863713264,
-0.0014758185716345906,
-0.0006309144082479179,
-0.042399972677230835,
0.009159708395600319,
0.016424303874373436,
0.021528473123908043,
0.03423835709691048,
0.01139594055712223,
-0.018685804679989815,
-0.019279606640338898,
-0.03843287378549576,
0.016904398798942566,
0.01715708151459694,
0.01257722731679678,
0.00628861365839839,
-0.01911536417901516,
-0.008332176133990288,
-0.012387716211378574,
0.0033827750012278557,
-0.02645576372742653,
0.04495205730199814,
-0.03269699960947037,
0.024863870814442635,
0.0015310925664380193,
0.019443849101662636,
-0.023448852822184563,
0.029816430062055588,
0.006396003067493439,
-0.013922756537795067,
0.0100061921402812,
-0.016386402770876884,
-0.01237508188933134,
0.031105106696486473,
-0.026379959657788277,
-0.020025016739964485,
-0.028502484783530235,
0.004845169838517904,
0.021098913624882698,
-0.042197827249765396,
0.019848139956593513,
-0.0049367668107151985,
0.0067402818240225315,
0.014743971638381481,
0.03734634071588516,
0.013126809149980545,
0.007195108570158482,
-0.005631641484797001,
0.019709166139364243,
0.028325608000159264,
-0.012962566688656807,
-0.006165431346744299,
-0.01204659603536129,
-0.013758513145148754,
-0.004668292589485645,
0.05265884846448898,
-0.037017855793237686,
0.036259811371564865,
-0.014301778748631477,
-0.014390217140316963,
-0.0026373642031103373,
-0.0060296147130429745,
-0.023549925535917282,
-0.03661356493830681,
0.018205709755420685,
-0.034162554889917374,
-0.002649998292326927,
0.0009459767607040703,
0.038938235491514206,
0.06498970836400986,
-0.016727522015571594,
-0.02779497765004635,
0.0223370548337698,
-0.018951119855046272,
-0.0023657316341996193,
0.0200123842805624,
0.020429307594895363,
0.029487943276762962,
0.01567889377474785,
0.023284610360860825,
0.030827155336737633,
0.03724526986479759,
-0.01245720311999321,
-0.014339681714773178,
-0.05020783469080925,
0.009159708395600319,
-0.003430152777582407,
0.003888138337060809,
-0.0002218860317952931,
-0.028376145288348198,
0.017346592620015144,
0.023966850712895393,
0.05291152745485306,
-0.017763517796993256,
0.015236700884997845,
-0.0033038121182471514,
-0.005919066723436117,
-0.004374550189822912,
0.019001657143235207,
-0.0017719297902658582,
-0.016790693625807762,
0.005116802640259266,
-0.007984738796949387,
-0.011503330431878567,
-0.0035185914020985365,
-0.03388460353016853,
-0.026986395940184593,
0.005028363782912493,
-0.02523025870323181,
-0.03229270875453949,
-0.021124182268977165,
0.013430027291178703,
-0.013164712116122246,
0.010707383044064045,
-0.0272138100117445,
0.01049892045557499,
0.02409319207072258,
0.020163992419838905,
0.012640397064387798,
-0.009551364928483963,
-0.00644338084384799,
-0.03633561357855797,
-0.029260531067848206,
-0.008142665028572083,
0.015742063522338867,
0.020366137847304344,
0.0010249398183077574,
0.010606310330331326,
0.011225380003452301,
0.007624667603522539,
0.009747193194925785,
-0.03181261569261551,
-0.012804640457034111,
0.004769365303218365,
0.00830690748989582,
-0.020947305485606194,
0.022842416539788246,
-0.011743377894163132,
-0.04894442856311798,
0.01647484116256237,
0.009494511410593987,
0.011143258772790432,
0.014150169678032398,
0.01977233588695526,
0.018963754177093506,
-0.0009965130593627691,
0.006702379789203405,
-0.002929527312517166,
0.011452794075012207,
-0.028856240212917328,
0.004219783004373312,
-0.022867685183882713,
0.03294968232512474,
0.05508458986878395,
-0.022349689155817032,
-0.006557087879627943,
-0.024901771917939186,
0.04502786323428154,
-0.017447665333747864,
-0.021819056943058968,
0.02011345513164997,
-0.040681738406419754,
-0.0010296775726601481,
0.022943489253520966,
0.019671263173222542,
-0.0050346809439361095,
0.021263157948851585,
-0.0057074460200965405,
-0.012090815231204033,
-0.0011962894350290298,
-0.012292960658669472,
0.029159458354115486,
-0.0006060410523787141,
0.02031560055911541,
-0.01542621199041605,
0.015502016991376877,
-0.044118206948041916,
-0.007409888319671154,
0.022589735686779022,
0.007814179174602032,
-0.009412390179932117,
-0.0090586356818676,
0.02271607704460621,
-0.028224535286426544,
-0.004188197664916515,
0.002675266470760107,
-0.015224066562950611,
0.019949212670326233,
-0.01637376844882965,
-0.0326717309653759,
0.011073771864175797,
0.012103448621928692,
-0.02059355191886425,
-0.00830690748989582,
0.014604996889829636,
-0.015565186738967896,
0.024762798100709915,
0.014402851462364197,
-0.03178734704852104,
0.020429307594895363,
0.004017637576907873,
0.037624292075634,
0.00696769542992115,
0.057813551276922226,
0.016310598701238632,
-0.032722268253564835,
0.004052381496876478,
0.0035312254913151264,
0.006816086359322071,
0.00017786415992304683,
-0.030726084485650063,
-0.01891321875154972,
-0.020126089453697205,
0.009715607389807701,
-0.013948025181889534,
-0.011332769878208637,
-0.00497782789170742,
0.0064812833443284035,
0.03570391237735748,
-0.013177345506846905,
-0.002182537456974387,
-0.015552553348243237,
-0.013442661613225937,
0.002740016207098961,
-0.017182350158691406,
0.0411112979054451,
-0.008654344826936722,
-0.01685386337339878,
0.041237637400627136,
0.006301247514784336,
-0.036840979009866714,
0.003932357300072908,
0.009980923496186733,
-0.005675860680639744,
0.02089676819741726,
0.06493917107582092,
0.01785195618867874,
-0.009153391234576702,
0.007460424676537514,
-0.029083652421832085,
-0.014200706034898758,
0.035602837800979614,
-0.018989022821187973,
-0.017750883474946022,
0.022021202370524406,
0.03292441368103027,
-0.005682177841663361,
0.017359226942062378,
-0.01637376844882965,
-0.00912812352180481,
-0.016007380560040474,
-0.016121085733175278,
0.027744440361857414,
-0.02701166458427906,
0.0036607247311621904,
-0.02069462463259697,
-0.024990210309624672,
0.02817399986088276,
0.03666410222649574,
-0.04495205730199814,
0.005634800065308809,
-0.03342977538704872,
-0.020568283274769783,
0.033859334886074066,
-0.0026547361630946398,
0.010113581083714962,
0.014781873673200607,
0.007371985819190741,
-0.004090283531695604,
0.038255997002124786,
0.03350558131933212,
-0.0020656720735132694,
0.016209525987505913,
0.033859334886074066,
-0.0010202019475400448,
-0.010922162793576717,
-0.001951965386979282,
0.013910122215747833,
-0.00596328591927886,
-0.05569102615118027,
-0.0019803920295089483,
-0.021010475233197212,
0.00425452645868063,
-0.0124256182461977,
0.003106404561549425,
-0.03075135126709938,
-0.022905588150024414,
0.003079557092860341,
0.01771298050880432,
0.004879913758486509,
0.01921643689274788,
0.011762328445911407,
-0.006784501019865274,
-0.025520842522382736,
-0.03901404142379761,
0.033202365040779114,
0.03651249408721924,
-0.01949438638985157,
0.056853361427783966,
-0.015363042242825031,
0.015312505885958672,
0.0073530348017811775,
-0.015186164528131485,
-0.004355599172413349,
0.008591175079345703,
0.00958926696330309,
0.011534915305674076,
0.020985208451747894,
-0.01925433799624443,
-0.02549557387828827,
0.026127278804779053,
-0.0447499118745327,
0.01547674834728241,
-0.0005452395416796207,
0.034920599311590195,
-0.0223370548337698,
0.030245987698435783,
0.04232417047023773,
0.015742063522338867,
-0.007100353017449379,
0.018850047141313553,
-0.019418582320213318,
-0.021528473123908043,
-0.018925853073596954,
0.025242893025279045,
0.018951119855046272,
0.031105106696486473,
0.03782643750309944,
0.014213340356945992,
-0.0230319295078516,
0.0040207961574196815,
0.0049367668107151985,
-0.013733245432376862,
-0.010170434601604939,
0.03380879759788513,
0.008660661987960339,
0.02573562227189541,
-0.01134540420025587,
-0.025861961767077446,
0.049980420619249344,
-0.005249460227787495,
0.010890576988458633,
0.03858448192477226,
-0.014743971638381481,
-0.011263282969594002,
-0.022589735686779022,
-0.003392250509932637,
-0.02824980393052101,
0.008869124576449394,
-0.00402395473793149,
0.00573271419852972,
0.0229561235755682,
0.0387108214199543,
-0.022084372118115425,
0.003357506822794676,
0.008534321561455727,
-0.007409888319671154,
0.027163272723555565,
-0.01154754962772131,
0.007270913105458021,
0.016045281663537025,
0.025647183880209923,
-0.014263876713812351,
0.03254539147019386,
0.025242893025279045,
0.03461737930774689,
0.014478656463325024,
-0.036461956799030304,
0.00034664757549762726,
0.0006245973636396229,
0.026102010160684586,
-0.013316321186721325,
-0.012261374853551388,
0.004469305742532015,
-0.011768645606935024,
0.02317090332508087,
0.02327197603881359,
-0.011206429451704025,
-0.01527460291981697,
-0.00888807512819767,
-0.02693585865199566,
0.017308689653873444,
0.006279137916862965,
...]
Next, I can apply this function to each of the notes stored in the notes
column of the notes_df
DataFrame, and save it in a new column of the DataFrame called embedding
:
'embedding'] = notes_df['notes'].apply(lambda x: get_embedding(x)) notes_df[
Below, I show the first 5 rows. You can see that there is a new column containing a collection of 1,536-length lists corresponding to the note’s embedding.
5) notes_df.head(
notes | embedding | |
---|---|---|
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. | [0.007875490933656693, -0.0006737920339219272, -0.048676371574401855, 0.016742711886763573, -0.01964789256453514, 0.007379627320915461, -0.0036548112984746695, 0.01582098752260208, -0.00981811247766018, 0.05731024220585823, -0.05661020055413246, 0.05338999629020691, -0.0089722266420722, 0.005787027534097433, 0.009333916008472443, 0.002053461503237486, 0.020791297778487206, -0.013055814430117607, -0.03626226261258125, 0.06151050329208374, -0.0013227908639237285, 0.06081046164035797, 0.021164653822779655, 0.0037043977063149214, -0.004646540153771639, 0.009497258812189102, 0.006889596581459045, -0.02360313944518566, 0.04398607835173607, -0.06272391229867935, 0.02380148507654667, -0.01809612847864628, 0.003272704314440489, -0.017582762986421585, 0.030848590657114983, 0.023124776780605316, 0.022611411288380623, -0.046296220272779465, -0.006352896336466074, -0.006597911473363638, -0.023031437769532204, -0.01648602820932865, -0.01208742056041956, -0.006551241967827082, -0.030731918290257454, 0.0008546366589143872, 0.04699626564979553, -0.018119463697075844, 0.004751546308398247, -0.02417484112083912, 0.010763171128928661, -0.026694998145103455, 0.006452069152146578, 0.008348020724952221, 0.010535657405853271, 0.02921515703201294, 0.02755838632583618, 0.012624121271073818, -0.013405836187303066, 0.0015488466015085578, 0.0514298751950264, -0.03217867389321327, 0.02334645763039589, 0.013347499072551727, -0.004766130819916725, -0.013499175198376179, 0.03761567920446396, 0.01333583239465952, 0.026461651548743248, 0.010774838738143444, 0.006154550705105066, -0.028328433632850647, 0.025574930012226105, 0.003345625475049019, -0.025994954630732536, 0.04412608593702316, 0.019204530864953995, -0.041139233857393265, 0.024034833535552025, 0.014700917527079582, 0.0011288204696029425, -0.012075753882527351, 0.043052684515714645, -0.0011660102754831314, -0.023159777745604515, -0.05987706780433655, -0.033765438944101334, -0.023778149858117104, 0.027441712096333504, -0.031735312193632126, 0.031525298953056335, 0.03994916006922722, 0.006761255208402872, -0.028771795332431793, -0.013347499072551727, -0.02634497731924057, 0.02550492435693741, 0.02359147183597088, -0.03294872120022774, 0.03000853955745697, ...] |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. | [-0.0345405712723732, -0.010776856914162636, -0.010913430713117123, 0.01344623975455761, -0.0371975414454937, 0.01843736506998539, 0.022373152896761894, -0.01380629651248455, 0.06128406524658203, -0.0055250017903745174, -0.02246006391942501, 0.03488821163773537, -0.060837097465991974, 0.019294051453471184, -0.005022164434194565, 0.017419274896383286, 0.014824386686086655, -0.016761241480708122, -0.030120572075247765, 0.0056708864867687225, 0.036700911819934845, 0.03622911125421524, -0.006679664831608534, -0.0573606938123703, 0.024992873892188072, 0.01998933218419552, 0.030319223180413246, -0.023142928257584572, 0.029872257262468338, -0.03113866224884987, 0.004680731799453497, -0.020436298102140427, 0.014042195864021778, -0.020846018567681313, -0.02521635591983795, 0.04705563187599182, 0.04226315766572952, 0.006909355986863375, -0.010733402334153652, -0.015283768996596336, -0.006704496685415506, -0.02394995093345642, 0.01163354329764843, -0.01688539795577526, -0.030865514650940895, -0.037445854395627975, -0.023502985015511513, -0.040425632148981094, 0.07345148175954819, 0.03406877443194389, -0.022149669006466866, 0.016016297042369843, 0.0220627598464489, -0.022075176239013672, 0.038935743272304535, -0.040624283254146576, 0.05368563532829285, 0.010013289749622345, -0.003945099655538797, 0.031188324093818665, 0.03250439092516899, 0.03357214480638504, 0.04186585545539856, 0.007896406576037407, 0.0016187013825401664, -0.0722595751285553, 0.01103137992322445, 0.01600388064980507, 0.008777923882007599, 0.043728217482566833, -0.005543625447899103, 0.010950678028166294, 0.005773316603153944, 0.05368563532829285, 0.01149076223373413, 0.024744559079408646, 0.006235802546143532, -0.028754839673638344, -0.00626994576305151, 0.009876716881990433, -0.02686764858663082, 0.006273049861192703, 0.009839469566941261, 0.046385183930397034, 0.0031737720128148794, -0.02080877125263214, -0.047105297446250916, -0.06217799708247185, 0.01106862723827362, -0.05676473677158356, 0.021963434293866158, -0.0022767353802919388, -0.00857306458055973, 0.015159611590206623, 0.005546729080379009, -0.050954174250364304, -0.0011461274698376656, 0.02709113247692585, -0.00894553679972887, 0.04352956265211105, ...] |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | [-0.017208361998200417, 0.006888406351208687, -0.005633207969367504, 0.04822390526533127, -0.014728333801031113, 0.025873279199004173, -0.03370814397931099, -0.0009527865331619978, 0.03423452004790306, -0.010254159569740295, -0.04486321285367012, -0.008685161359608173, 0.01806878112256527, 0.019941454753279686, 0.028039507567882538, 0.016347944736480713, 0.0295174028724432, -0.05652441084384918, -0.020892975851893425, 0.009252025745809078, 0.053163718432188034, 0.015467280521988869, 0.02342361770570278, -0.0332222618162632, 0.04615890234708786, -0.051422636955976486, 0.01870650239288807, -0.00034986119135282934, -0.023038960993289948, -0.024010727182030678, 0.047697532922029495, -0.0066960775293409824, 0.03565167635679245, -0.026075730100274086, -0.035894621163606644, 0.06778070330619812, 0.03374863415956497, 0.0055269212462008, -0.015720345079898834, -0.02858612686395645, 0.009727786295115948, 0.024597834795713425, 0.019010178744792938, -0.020235009491443634, -0.007941152900457382, 0.025245679542422295, 0.008472587913274765, -0.02548862062394619, 0.02113591879606247, 0.0035783271305263042, 0.0027204395737499, 0.03320201858878136, 0.021115673705935478, -0.01451575942337513, 0.006088723428547382, -0.05980411916971207, -0.02872784249484539, -0.0322909876704216, 0.03806084766983986, 0.025144454091787338, 0.004529848229140043, 0.03957923501729965, 0.06259794533252716, 0.05907529592514038, -0.010821023024618626, -0.07883454114198685, 0.05713176354765892, 0.040915410965681076, 0.044458311051130295, -0.02583278901875019, 0.013078355230391026, 0.02773583121597767, -0.010238975286483765, 0.02148008532822132, 0.07474502921104431, 0.00839666835963726, 0.08215474337339401, -0.026865290477871895, 0.010861513204872608, -0.0009135616128332913, 0.004243885632604361, 0.019081037491559982, 0.005435817874968052, 0.03941727057099342, 0.014819436706602573, 0.02615671046078205, -0.0274321548640728, -0.034760892391204834, 0.02696651592850685, 0.00028643698897212744, 0.001967573771253228, 0.011448622681200504, 0.04247428849339485, -0.0023927215952426195, -0.006756812799721956, -0.03441672399640083, 0.001920756883919239, 0.03672466799616814, -0.031015543267130852, 0.02471930719912052, ...] |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. | [-0.01908160373568535, -0.007425420451909304, 0.017478106543421745, 0.0633997693657875, -0.03739846125245094, 0.0564923994243145, -0.014924848452210426, 0.03949534147977829, -0.022670967504382133, -0.008227168582379818, 0.0023235275875777006, 0.01861288957297802, -0.05338408425450325, 0.021585524082183838, 0.032809995114803314, 0.0043664430268108845, 0.05175592005252838, -0.01282796822488308, -0.01846487447619438, 0.010040352120995522, -0.01032404787838459, -0.01255044061690569, 0.03727511689066887, -0.03327871114015579, -0.01349403616040945, -0.02069743350148201, 0.026494689285755157, -0.05005374550819397, 0.042801011353731155, -0.04746348410844803, 0.04080280661582947, -0.02429913356900215, 0.025137884542346, 0.0548148974776268, -0.005189776886254549, -0.0036479535046964884, 0.0019658245146274567, -0.0015927032800391316, 0.022905325517058372, 0.0035554443020373583, -0.0008025189745239913, -0.028912268579006195, -0.020956460386514664, 0.0659160241484642, -0.03490687534213066, 0.02893693745136261, 0.017613787204027176, -0.03764515370130539, 0.0155539121478796, 0.02130182832479477, -0.019982028752565384, 0.010428891517221928, -0.023115012794733047, 0.019328294321894646, 0.03290867432951927, 0.015183874405920506, 0.03014572709798813, 0.020487746223807335, -0.007838629186153412, 0.031132493168115616, 0.014184772968292236, -0.013975084759294987, 0.022300930693745613, 0.025729944929480553, -0.04578598216176033, -0.02073443867266178, 0.015072863548994064, -0.011446495540440083, 0.03140385448932648, 0.021425174549221992, -0.014949517324566841, -0.030022380873560905, -0.030269071459770203, 0.05358143895864487, -0.0035708623472601175, -0.022892991080880165, -0.005436468403786421, -0.03009638749063015, -0.003157653845846653, 0.055308278650045395, -0.029603004455566406, -0.02204190380871296, -0.009065920487046242, 0.04381244629621506, 0.009756657294929028, -0.06902433931827545, 0.0014246446080505848, 9.824304288486019e-05, -0.005482723005115986, -0.08269105851650238, 0.0053686280734837055, 0.005710912868380547, -0.0006240529473870993, 0.004471287131309509, -0.03041708655655384, 0.003200824838131666, -0.04183891415596008, 0.03251396492123604, -0.0017438019858673215, -0.0027629469987004995, ...] |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. | [-0.010823288932442665, 0.03450469672679901, -0.0003536372387316078, 0.01706509292125702, -0.033481039106845856, 0.030859481543302536, -0.00017135703819803894, 0.05228135362267494, -0.010186624713242054, 0.017614372074604034, 0.003529740497469902, 0.021334487944841385, -0.02444290742278099, 0.054828010499477386, -0.012496092356741428, -0.005037136375904083, 0.008351534605026245, -0.02938641607761383, -0.005436611827462912, 0.02893700636923313, 0.03158353269100189, 0.06366640329360962, 0.05123273283243179, 0.015329872258007526, 0.0488358773291111, 0.01000561285763979, 0.010242801159620285, -0.041695255786180496, 0.03263215348124504, -0.022308209910988808, 0.02629048004746437, -0.00510579627007246, -0.018038814887404442, 0.03467946499586105, -0.005889142397791147, 0.022645266726613045, 0.025504013523459435, -0.004253789782524109, -0.013145240023732185, -0.0078896414488554, 0.016728036105632782, -0.020210962742567062, 0.016815422102808952, 0.01926220953464508, -0.015641963109374046, -0.046139419078826904, 0.02104736492037773, -0.03822481259703636, 0.051582273095846176, 0.006853501312434673, -0.039173565804958344, 0.012858116999268532, 0.02134697139263153, 0.013245109468698502, 0.014643273316323757, -0.010473747737705708, 0.003486047964543104, -0.024005981162190437, 0.06111975014209747, 0.02057298831641674, 0.052980437874794006, -0.003170836716890335, 0.02688969485461712, 0.030260268598794937, -0.03732599318027496, -0.037800367921590805, 0.03233254700899124, -0.04114597663283348, 0.06072027608752251, -0.013956675305962563, -0.019536849111318588, 0.005062103737145662, 0.04409210756421089, 0.004054052289575338, 0.008794702589511871, -0.013170207850635052, 0.05472814291715622, -0.06970847398042679, -0.025416627526283264, 0.05397912487387657, -0.0226327832788229, 0.025591399520635605, 0.008863362483680248, 0.04014728590846062, -0.012508576735854149, -0.05412892997264862, -0.029186679050326347, -0.03198300674557686, -0.015254970639944077, -0.03430495783686638, -0.024230685085058212, 0.012764490209519863, 0.010411330498754978, 0.012957986444234848, -0.016128823161125183, -0.012021715752780437, 0.015854183584451675, 0.00357655412517488, -0.015666930004954338, 0.02527930773794651, ...] |
Since I will want to compute the cosine distance to the “target” (“Patient presenting with ongoing chronic condition defined as ongoing for more than one month”), I will also need to compute the embedding of the label.
= 'Patient presenting with ongoing chronic condition defined as ongoing for more than one month'
target # Compute embedding of the target
= get_embedding(target) target_embedding
Computing the cosine distance from the target
Now that I have all of my embeddings, I need to compute the cosine distance from each doctor’s note’s embedding to this target embedding. Below, I compute and save this cosine distance in a new column of notes_df
called cosine_dist
(note that a lower cosine_dist
value corresponds to a doctor’s note that is more similar to the target: “Patient presenting with ongoing chronic condition defined as ongoing for more than one month”).
'cosine_dist'] = notes_df.embedding.apply(lambda x: cosine(x, target_embedding)) notes_df[
You can see that the final column in notes_df
is now the cosine distance:
5) notes_df.head(
notes | embedding | cosine_dist | |
---|---|---|---|
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. | [0.007875490933656693, -0.0006737920339219272, -0.048676371574401855, 0.016742711886763573, -0.01964789256453514, 0.007379627320915461, -0.0036548112984746695, 0.01582098752260208, -0.00981811247766018, 0.05731024220585823, -0.05661020055413246, 0.05338999629020691, -0.0089722266420722, 0.005787027534097433, 0.009333916008472443, 0.002053461503237486, 0.020791297778487206, -0.013055814430117607, -0.03626226261258125, 0.06151050329208374, -0.0013227908639237285, 0.06081046164035797, 0.021164653822779655, 0.0037043977063149214, -0.004646540153771639, 0.009497258812189102, 0.006889596581459045, -0.02360313944518566, 0.04398607835173607, -0.06272391229867935, 0.02380148507654667, -0.01809612847864628, 0.003272704314440489, -0.017582762986421585, 0.030848590657114983, 0.023124776780605316, 0.022611411288380623, -0.046296220272779465, -0.006352896336466074, -0.006597911473363638, -0.023031437769532204, -0.01648602820932865, -0.01208742056041956, -0.006551241967827082, -0.030731918290257454, 0.0008546366589143872, 0.04699626564979553, -0.018119463697075844, 0.004751546308398247, -0.02417484112083912, 0.010763171128928661, -0.026694998145103455, 0.006452069152146578, 0.008348020724952221, 0.010535657405853271, 0.02921515703201294, 0.02755838632583618, 0.012624121271073818, -0.013405836187303066, 0.0015488466015085578, 0.0514298751950264, -0.03217867389321327, 0.02334645763039589, 0.013347499072551727, -0.004766130819916725, -0.013499175198376179, 0.03761567920446396, 0.01333583239465952, 0.026461651548743248, 0.010774838738143444, 0.006154550705105066, -0.028328433632850647, 0.025574930012226105, 0.003345625475049019, -0.025994954630732536, 0.04412608593702316, 0.019204530864953995, -0.041139233857393265, 0.024034833535552025, 0.014700917527079582, 0.0011288204696029425, -0.012075753882527351, 0.043052684515714645, -0.0011660102754831314, -0.023159777745604515, -0.05987706780433655, -0.033765438944101334, -0.023778149858117104, 0.027441712096333504, -0.031735312193632126, 0.031525298953056335, 0.03994916006922722, 0.006761255208402872, -0.028771795332431793, -0.013347499072551727, -0.02634497731924057, 0.02550492435693741, 0.02359147183597088, -0.03294872120022774, 0.03000853955745697, ...] | 0.697187 |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. | [-0.0345405712723732, -0.010776856914162636, -0.010913430713117123, 0.01344623975455761, -0.0371975414454937, 0.01843736506998539, 0.022373152896761894, -0.01380629651248455, 0.06128406524658203, -0.0055250017903745174, -0.02246006391942501, 0.03488821163773537, -0.060837097465991974, 0.019294051453471184, -0.005022164434194565, 0.017419274896383286, 0.014824386686086655, -0.016761241480708122, -0.030120572075247765, 0.0056708864867687225, 0.036700911819934845, 0.03622911125421524, -0.006679664831608534, -0.0573606938123703, 0.024992873892188072, 0.01998933218419552, 0.030319223180413246, -0.023142928257584572, 0.029872257262468338, -0.03113866224884987, 0.004680731799453497, -0.020436298102140427, 0.014042195864021778, -0.020846018567681313, -0.02521635591983795, 0.04705563187599182, 0.04226315766572952, 0.006909355986863375, -0.010733402334153652, -0.015283768996596336, -0.006704496685415506, -0.02394995093345642, 0.01163354329764843, -0.01688539795577526, -0.030865514650940895, -0.037445854395627975, -0.023502985015511513, -0.040425632148981094, 0.07345148175954819, 0.03406877443194389, -0.022149669006466866, 0.016016297042369843, 0.0220627598464489, -0.022075176239013672, 0.038935743272304535, -0.040624283254146576, 0.05368563532829285, 0.010013289749622345, -0.003945099655538797, 0.031188324093818665, 0.03250439092516899, 0.03357214480638504, 0.04186585545539856, 0.007896406576037407, 0.0016187013825401664, -0.0722595751285553, 0.01103137992322445, 0.01600388064980507, 0.008777923882007599, 0.043728217482566833, -0.005543625447899103, 0.010950678028166294, 0.005773316603153944, 0.05368563532829285, 0.01149076223373413, 0.024744559079408646, 0.006235802546143532, -0.028754839673638344, -0.00626994576305151, 0.009876716881990433, -0.02686764858663082, 0.006273049861192703, 0.009839469566941261, 0.046385183930397034, 0.0031737720128148794, -0.02080877125263214, -0.047105297446250916, -0.06217799708247185, 0.01106862723827362, -0.05676473677158356, 0.021963434293866158, -0.0022767353802919388, -0.00857306458055973, 0.015159611590206623, 0.005546729080379009, -0.050954174250364304, -0.0011461274698376656, 0.02709113247692585, -0.00894553679972887, 0.04352956265211105, ...] | 0.583531 |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | [-0.017208361998200417, 0.006888406351208687, -0.005633207969367504, 0.04822390526533127, -0.014728333801031113, 0.025873279199004173, -0.03370814397931099, -0.0009527865331619978, 0.03423452004790306, -0.010254159569740295, -0.04486321285367012, -0.008685161359608173, 0.01806878112256527, 0.019941454753279686, 0.028039507567882538, 0.016347944736480713, 0.0295174028724432, -0.05652441084384918, -0.020892975851893425, 0.009252025745809078, 0.053163718432188034, 0.015467280521988869, 0.02342361770570278, -0.0332222618162632, 0.04615890234708786, -0.051422636955976486, 0.01870650239288807, -0.00034986119135282934, -0.023038960993289948, -0.024010727182030678, 0.047697532922029495, -0.0066960775293409824, 0.03565167635679245, -0.026075730100274086, -0.035894621163606644, 0.06778070330619812, 0.03374863415956497, 0.0055269212462008, -0.015720345079898834, -0.02858612686395645, 0.009727786295115948, 0.024597834795713425, 0.019010178744792938, -0.020235009491443634, -0.007941152900457382, 0.025245679542422295, 0.008472587913274765, -0.02548862062394619, 0.02113591879606247, 0.0035783271305263042, 0.0027204395737499, 0.03320201858878136, 0.021115673705935478, -0.01451575942337513, 0.006088723428547382, -0.05980411916971207, -0.02872784249484539, -0.0322909876704216, 0.03806084766983986, 0.025144454091787338, 0.004529848229140043, 0.03957923501729965, 0.06259794533252716, 0.05907529592514038, -0.010821023024618626, -0.07883454114198685, 0.05713176354765892, 0.040915410965681076, 0.044458311051130295, -0.02583278901875019, 0.013078355230391026, 0.02773583121597767, -0.010238975286483765, 0.02148008532822132, 0.07474502921104431, 0.00839666835963726, 0.08215474337339401, -0.026865290477871895, 0.010861513204872608, -0.0009135616128332913, 0.004243885632604361, 0.019081037491559982, 0.005435817874968052, 0.03941727057099342, 0.014819436706602573, 0.02615671046078205, -0.0274321548640728, -0.034760892391204834, 0.02696651592850685, 0.00028643698897212744, 0.001967573771253228, 0.011448622681200504, 0.04247428849339485, -0.0023927215952426195, -0.006756812799721956, -0.03441672399640083, 0.001920756883919239, 0.03672466799616814, -0.031015543267130852, 0.02471930719912052, ...] | 0.748090 |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. | [-0.01908160373568535, -0.007425420451909304, 0.017478106543421745, 0.0633997693657875, -0.03739846125245094, 0.0564923994243145, -0.014924848452210426, 0.03949534147977829, -0.022670967504382133, -0.008227168582379818, 0.0023235275875777006, 0.01861288957297802, -0.05338408425450325, 0.021585524082183838, 0.032809995114803314, 0.0043664430268108845, 0.05175592005252838, -0.01282796822488308, -0.01846487447619438, 0.010040352120995522, -0.01032404787838459, -0.01255044061690569, 0.03727511689066887, -0.03327871114015579, -0.01349403616040945, -0.02069743350148201, 0.026494689285755157, -0.05005374550819397, 0.042801011353731155, -0.04746348410844803, 0.04080280661582947, -0.02429913356900215, 0.025137884542346, 0.0548148974776268, -0.005189776886254549, -0.0036479535046964884, 0.0019658245146274567, -0.0015927032800391316, 0.022905325517058372, 0.0035554443020373583, -0.0008025189745239913, -0.028912268579006195, -0.020956460386514664, 0.0659160241484642, -0.03490687534213066, 0.02893693745136261, 0.017613787204027176, -0.03764515370130539, 0.0155539121478796, 0.02130182832479477, -0.019982028752565384, 0.010428891517221928, -0.023115012794733047, 0.019328294321894646, 0.03290867432951927, 0.015183874405920506, 0.03014572709798813, 0.020487746223807335, -0.007838629186153412, 0.031132493168115616, 0.014184772968292236, -0.013975084759294987, 0.022300930693745613, 0.025729944929480553, -0.04578598216176033, -0.02073443867266178, 0.015072863548994064, -0.011446495540440083, 0.03140385448932648, 0.021425174549221992, -0.014949517324566841, -0.030022380873560905, -0.030269071459770203, 0.05358143895864487, -0.0035708623472601175, -0.022892991080880165, -0.005436468403786421, -0.03009638749063015, -0.003157653845846653, 0.055308278650045395, -0.029603004455566406, -0.02204190380871296, -0.009065920487046242, 0.04381244629621506, 0.009756657294929028, -0.06902433931827545, 0.0014246446080505848, 9.824304288486019e-05, -0.005482723005115986, -0.08269105851650238, 0.0053686280734837055, 0.005710912868380547, -0.0006240529473870993, 0.004471287131309509, -0.03041708655655384, 0.003200824838131666, -0.04183891415596008, 0.03251396492123604, -0.0017438019858673215, -0.0027629469987004995, ...] | 0.644881 |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. | [-0.010823288932442665, 0.03450469672679901, -0.0003536372387316078, 0.01706509292125702, -0.033481039106845856, 0.030859481543302536, -0.00017135703819803894, 0.05228135362267494, -0.010186624713242054, 0.017614372074604034, 0.003529740497469902, 0.021334487944841385, -0.02444290742278099, 0.054828010499477386, -0.012496092356741428, -0.005037136375904083, 0.008351534605026245, -0.02938641607761383, -0.005436611827462912, 0.02893700636923313, 0.03158353269100189, 0.06366640329360962, 0.05123273283243179, 0.015329872258007526, 0.0488358773291111, 0.01000561285763979, 0.010242801159620285, -0.041695255786180496, 0.03263215348124504, -0.022308209910988808, 0.02629048004746437, -0.00510579627007246, -0.018038814887404442, 0.03467946499586105, -0.005889142397791147, 0.022645266726613045, 0.025504013523459435, -0.004253789782524109, -0.013145240023732185, -0.0078896414488554, 0.016728036105632782, -0.020210962742567062, 0.016815422102808952, 0.01926220953464508, -0.015641963109374046, -0.046139419078826904, 0.02104736492037773, -0.03822481259703636, 0.051582273095846176, 0.006853501312434673, -0.039173565804958344, 0.012858116999268532, 0.02134697139263153, 0.013245109468698502, 0.014643273316323757, -0.010473747737705708, 0.003486047964543104, -0.024005981162190437, 0.06111975014209747, 0.02057298831641674, 0.052980437874794006, -0.003170836716890335, 0.02688969485461712, 0.030260268598794937, -0.03732599318027496, -0.037800367921590805, 0.03233254700899124, -0.04114597663283348, 0.06072027608752251, -0.013956675305962563, -0.019536849111318588, 0.005062103737145662, 0.04409210756421089, 0.004054052289575338, 0.008794702589511871, -0.013170207850635052, 0.05472814291715622, -0.06970847398042679, -0.025416627526283264, 0.05397912487387657, -0.0226327832788229, 0.025591399520635605, 0.008863362483680248, 0.04014728590846062, -0.012508576735854149, -0.05412892997264862, -0.029186679050326347, -0.03198300674557686, -0.015254970639944077, -0.03430495783686638, -0.024230685085058212, 0.012764490209519863, 0.010411330498754978, 0.012957986444234848, -0.016128823161125183, -0.012021715752780437, 0.015854183584451675, 0.00357655412517488, -0.015666930004954338, 0.02527930773794651, ...] | 0.688291 |
And that’s it! Let’s take a look at the notes in order of smallest to largest cosine_dist
.
Ideally, I will see that the notes at the top of the rearranged DataFrame correspond to patients with chronic conditions, and the notes at the bottom correspond to patients with acute conditions.
The top 5 notes are:
'display.max_colwidth', None)
pd.set_option('cosine_dist')[['notes', 'cosine_dist']].head(5) notes_df.sort_values(
notes | cosine_dist | |
---|---|---|
43 | Patient with chronic bronchitis showed increasingly frequent flare-ups. Reinforced importance of smoking cessation and upgraded prescribed inhalers. | 0.483188 |
19 | Patient visited for follow-up on hypertension medication. Blood pressure readings stable. Emphasized continued adherence to treatment. | 0.547638 |
27 | Patient reported persistent cough and difficulty breathing. Referred for radiological imaging to rule out pneumonia or tuberculosis. | 0.550634 |
22 | Patient with a history of heart disease complained of palpitations. Recommended further cardiology work-up and emphasized continued adherence to treatment. | 0.558663 |
31 | Chronic back pain patient showed improvement after physiotherapy. Recommended continuing sessions and daily exercises at home. | 0.565519 |
Those all seem like patients with chronic conditions to me!
And the bottom 5 notes are:
'cosine_dist')[['notes', 'cosine_dist']].tail(5) notes_df.sort_values(
notes | cosine_dist | |
---|---|---|
39 | Child patient presented with red, itchy eyes. Diagnosed as viral conjunctivitis. Instructed parents about proper eye care and hand hygiene. | 0.743466 |
16 | Patient suffered a minor burn in the kitchen. Prescribed topical ointment and advised on wound care. | 0.745391 |
26 | Patient presented with severe knee pain. Suspected ligament tear. Recommended MRI scan. | 0.746433 |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | 0.748090 |
49 | Patient presented with dislocated shoulder. After successful relocation, advised sling use and gentle mobility exercises for rehabilitation. | 0.748190 |
These definitely seem like new/acute conditions!
If you’re interested in looking at all of the distances, see the following output (remember that the smaller the distance, the closer the note to the chronic condition target):
'cosine_dist')[['notes', 'cosine_dist']] notes_df.sort_values(
notes | cosine_dist | |
---|---|---|
43 | Patient with chronic bronchitis showed increasingly frequent flare-ups. Reinforced importance of smoking cessation and upgraded prescribed inhalers. | 0.483188 |
19 | Patient visited for follow-up on hypertension medication. Blood pressure readings stable. Emphasized continued adherence to treatment. | 0.547638 |
27 | Patient reported persistent cough and difficulty breathing. Referred for radiological imaging to rule out pneumonia or tuberculosis. | 0.550634 |
22 | Patient with a history of heart disease complained of palpitations. Recommended further cardiology work-up and emphasized continued adherence to treatment. | 0.558663 |
31 | Chronic back pain patient showed improvement after physiotherapy. Recommended continuing sessions and daily exercises at home. | 0.565519 |
15 | Asthmatic patient presented with increased shortness of breath. Stepped-up asthma control medication. | 0.578419 |
11 | Patient admitted for chronic fatigue. Bloodwork ordered to rule out anemia, thyroid disorders, and diabetes. Recommended nutritional counseling. | 0.580687 |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. | 0.583531 |
5 | Patient came in for annual physical. All markers within normal limits. Encouraged to continue healthy lifestyle for prevention of non-communicable diseases. | 0.592007 |
13 | Patient managing COPD came in for routine check-up. Oxygen levels stable. Expressed importance of avoiding respiratory irritants. | 0.597322 |
35 | Patient presented with swollen (oedemic) ankles. Adjusted diuretic medication dosage and recommended follow-up visit next week. | 0.598728 |
30 | Elderly patient presented with signs of Parkinson's disease. Referred to a neurologist for specialized care and management. | 0.601647 |
51 | Patient complained of periodic blurry vision. Referred to ophthalmology for proper ocular examination and management. | 0.607228 |
24 | Patient managed for rheumatoid arthritis reported increased inflammation in joints. Adjusted medication and recommended physiotherapy for managing symptoms. | 0.610967 |
28 | Patient presented with severe migraine episodes. Prescribed stronger painkillers and referred to a neurologist for a complete neurological work-up. | 0.613839 |
41 | Patient presented with painful and swollen joints. Diagnosed as gout and recommended dietary adjustment along with pain management. | 0.618245 |
25 | Elderly patient with a history of diabetes complained of numbness in feet. Diagnosed peripheral neuropathy, referred to podiatrist for further management. | 0.619704 |
37 | Elderly patient exhibited signs of Alzheimer's. Informed family and referred to neurology specialist for comprehensive management. | 0.619889 |
10 | Patient presented with chest pain. Preliminary assessment ruled out heart conditions. Recommended gastroenterology consult to check for possible reflux disease. | 0.625152 |
46 | Patient with anxiety showed improvement with recent medication schedule. Encouraged continued therapy and stress management techniques. | 0.626963 |
18 | Patient reports persistent vertigo. Referred to an Ear, Nose, and Throat (ENT) specialist for detailed evaluation to rule out Meniere's disease. | 0.628280 |
42 | Patient reported sleep apnea symptoms. Advised to undergo a sleep study, prescribed initial treatment using CPAP device at night during sleep. | 0.636255 |
48 | Child patient presented with symptoms of ADHD. Referred to pediatric psychiatrist for cognitive and behavioral evaluation plus treatment. | 0.636682 |
44 | Patient with history of stroke presented with dizziness. Follow-up brain scan scheduled to rule out recurrence. | 0.637395 |
32 | Patient came in for weight management consultation. Diet plan adjusted and encouraged increased physical activity. | 0.639904 |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. | 0.644881 |
20 | Historically anxious patient reported increased panic attacks recently. Referred to psychiatrist for therapeutic support, prescribed low-dose anxiolytic. | 0.645849 |
45 | Patient reported feeling low, indicating depression. Counseled and referred to mental health services for further treatment and support. | 0.647704 |
12 | Elderly patient with poor vision due to cataracts. Scheduled for ophthalmology consultation. | 0.648220 |
38 | Patient with history of irritable bowel syndrome showed signs of relief with diet modification. Reinforced importance of hydration and fiber intake. | 0.657894 |
36 | Teenage patient came in with signs of acne. Prescribed topical retinoid and advised consistent skin care routine. | 0.658439 |
9 | Patient returned for post-operative wound check. Incision healing well with no signs of infection. Advised continued wound care protocol and scheduled follow-up in a week. | 0.660693 |
8 | Arthritic patient reports increased joint pain. Advised gentle physical therapy and adjustment of pain medication. | 0.661807 |
14 | Patient presented with signs of depression. Recommended seeking a psychologist for cognitive behavioral therapy and prescribed low dose antidepressant. | 0.666589 |
52 | Elderly patient with hearing difficulty. Scheduled an appointment for audiology consultation and potential hearing aids. | 0.667571 |
6 | Routine consultation with patient managing type II diabetes. Blood glucose levels fairly controlled. Advised to keep monitoring sugar levels and follow diet regimen. | 0.669839 |
21 | Patient has concerning mole on the back. Scheduled for dermatology consult to rule out potential skin cancer. | 0.675400 |
29 | Patient suffering from high fevers and weight loss. Ordered comprehensive bloodwork to assess for possible infectious disease. | 0.676904 |
53 | Patient presented with signs of Bell's Palsy. Referred to neurologist and prescribed corticosteroid medication to reduce inflammation. | 0.682327 |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. | 0.688291 |
50 | Pregnant patient at full term demonstrated normal signs. Advised to prepare for labor and familiarize with signs of onset. | 0.696364 |
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. | 0.697187 |
33 | Pregnant patient showed normal progression in second trimester. Advised rest, proper nutrition and regular exercise. | 0.702353 |
47 | Type 1 diabetic patient showed increased blood sugar levels. Insulin doses were adjusted, advised dietary check. | 0.707895 |
40 | Patient showed symptoms of urinary tract infection. Collected sample for testing and started preliminary antibiotic treatment. | 0.710449 |
7 | Patient presented with symptoms of influenza including fever, cough, and body aches. Prescribed antiviral medication and recommended rest and plenty of fluids. | 0.715153 |
23 | Post-surgical patient reported redness and discomfort around stitches. Diagnosed with a mild post-operative infection. Prescribed antibiotics and wound care guidance. | 0.726964 |
17 | Rash observed on patient's arm. Allergenic contact dermatitis suspected. Prescribed topical steroid and recommended patch test. | 0.730986 |
34 | Postnatal patient showed healthy recovery from C-section. Wound healing well without signs of infection. | 0.742733 |
39 | Child patient presented with red, itchy eyes. Diagnosed as viral conjunctivitis. Instructed parents about proper eye care and hand hygiene. | 0.743466 |
16 | Patient suffered a minor burn in the kitchen. Prescribed topical ointment and advised on wound care. | 0.745391 |
26 | Patient presented with severe knee pain. Suspected ligament tear. Recommended MRI scan. | 0.746433 |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | 0.748090 |
49 | Patient presented with dislocated shoulder. After successful relocation, advised sling use and gentle mobility exercises for rehabilitation. | 0.748190 |
It definitely seems like it did a pretty good job here (although to be fair, OpenAI did all the hard lifting!)
Zero-shot Binary classification
The above approach provides each doctor’s note with a score (cosine distance) relative to a target (“Patient presenting with ongoing chronic condition defined as ongoing for more than one month”). But if I wanted to use this to create a binary label, I would need to come up with a threshold so that everyone whose cosine distance to the target is below, say 0.6, is classified as “presenting with a chronic condition”, in which case, everyone whose cosine distance to the target is above 0.6 is classified as “presenting with an acute condition”.
That approach sounds okay, but it turns out that there’s a (hypothetically) better way.
Instead of just computing the distance to a single target, I can compute two distances for each doctor’s note:
the distance to a positive label (corresponding to the original target: “Patient presenting with ongoing chronic condition defined as ongoing for more than one month”)
the distance to a negative label (“Patient presenting with acute or new condition”)
Then, each note is classified based on which label text its embedding is closest to.
The code for doing this is very similar to the code above.
Fortunately, I don’t need to recompute the embeddings for the notes, but I do need to compute the embedding for the new negative label.
= [
labels 'Patient presenting with ongoing chronic condition defined as ongoing for more than one month',
'Patient presenting with acute or new condition'
]# Compute embedding of the labels
= [get_embedding(label) for label in labels] label_embeddings
Then I need to compute the difference between the cosine distance to the first label and the cosine distance to the second label (to determine which label the note is closest to).
# compute the difference in the cosine distances to the two labels for each note
'cosine_score_binary'] = [cosine(x, label_embeddings[1]) - cosine(x, label_embeddings[0]) for x in notes_df['embedding']] notes_df[
I have saved this cosine_score_binary
in a new column in the DataFrame:
'notes', 'cosine_score_binary']].head() notes_df[[
notes | cosine_score_binary | |
---|---|---|
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. | -0.097242 |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. | 0.061629 |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | -0.074738 |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. | -0.009268 |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. | -0.013830 |
A positive cosine_score_binary
value means that the note is closer to the first label, “Patient presenting with ongoing chronic condition defined as ongoing for more than one month” (i.e., the distance to the second label is greater than the distance to the first label), and a negative value means that the note is closer to the second label, “Patient presenting with acute or new condition”.
I can use this to create a binary classification for each note.
'chronic_assessment'] = ['chronic' if score > 0 else 'acute' for score in notes_df['cosine_score_binary']] notes_df[
Then I can view the results in descending order of the cosine_score_binary
so that the first notes are those that are deemed most chronic and least acute:
'notes', 'cosine_score_binary', 'chronic_assessment']].sort_values('cosine_score_binary', ascending=False) notes_df[[
notes | cosine_score_binary | chronic_assessment | |
---|---|---|---|
31 | Chronic back pain patient showed improvement after physiotherapy. Recommended continuing sessions and daily exercises at home. | 0.110414 | chronic |
43 | Patient with chronic bronchitis showed increasingly frequent flare-ups. Reinforced importance of smoking cessation and upgraded prescribed inhalers. | 0.091766 | chronic |
19 | Patient visited for follow-up on hypertension medication. Blood pressure readings stable. Emphasized continued adherence to treatment. | 0.081715 | chronic |
1 | Patient reports difficulty sleeping. Exhibiting signs of insomnia characterized by frequent night time awakenings. Recommended a sleep study to find underlying cause. | 0.061629 | chronic |
6 | Routine consultation with patient managing type II diabetes. Blood glucose levels fairly controlled. Advised to keep monitoring sugar levels and follow diet regimen. | 0.055009 | chronic |
11 | Patient admitted for chronic fatigue. Bloodwork ordered to rule out anemia, thyroid disorders, and diabetes. Recommended nutritional counseling. | 0.053559 | chronic |
51 | Patient complained of periodic blurry vision. Referred to ophthalmology for proper ocular examination and management. | 0.049252 | chronic |
5 | Patient came in for annual physical. All markers within normal limits. Encouraged to continue healthy lifestyle for prevention of non-communicable diseases. | 0.045713 | chronic |
25 | Elderly patient with a history of diabetes complained of numbness in feet. Diagnosed peripheral neuropathy, referred to podiatrist for further management. | 0.037681 | chronic |
13 | Patient managing COPD came in for routine check-up. Oxygen levels stable. Expressed importance of avoiding respiratory irritants. | 0.035781 | chronic |
27 | Patient reported persistent cough and difficulty breathing. Referred for radiological imaging to rule out pneumonia or tuberculosis. | 0.030508 | chronic |
18 | Patient reports persistent vertigo. Referred to an Ear, Nose, and Throat (ENT) specialist for detailed evaluation to rule out Meniere's disease. | 0.028819 | chronic |
24 | Patient managed for rheumatoid arthritis reported increased inflammation in joints. Adjusted medication and recommended physiotherapy for managing symptoms. | 0.026801 | chronic |
30 | Elderly patient presented with signs of Parkinson's disease. Referred to a neurologist for specialized care and management. | 0.019938 | chronic |
9 | Patient returned for post-operative wound check. Incision healing well with no signs of infection. Advised continued wound care protocol and scheduled follow-up in a week. | 0.018720 | chronic |
42 | Patient reported sleep apnea symptoms. Advised to undergo a sleep study, prescribed initial treatment using CPAP device at night during sleep. | 0.014099 | chronic |
22 | Patient with a history of heart disease complained of palpitations. Recommended further cardiology work-up and emphasized continued adherence to treatment. | 0.009364 | chronic |
37 | Elderly patient exhibited signs of Alzheimer's. Informed family and referred to neurology specialist for comprehensive management. | 0.007949 | chronic |
15 | Asthmatic patient presented with increased shortness of breath. Stepped-up asthma control medication. | 0.002814 | chronic |
32 | Patient came in for weight management consultation. Diet plan adjusted and encouraged increased physical activity. | 0.001463 | chronic |
33 | Pregnant patient showed normal progression in second trimester. Advised rest, proper nutrition and regular exercise. | 0.000649 | chronic |
12 | Elderly patient with poor vision due to cataracts. Scheduled for ophthalmology consultation. | -0.000179 | acute |
36 | Teenage patient came in with signs of acne. Prescribed topical retinoid and advised consistent skin care routine. | -0.000709 | acute |
41 | Patient presented with painful and swollen joints. Diagnosed as gout and recommended dietary adjustment along with pain management. | -0.004671 | acute |
21 | Patient has concerning mole on the back. Scheduled for dermatology consult to rule out potential skin cancer. | -0.007328 | acute |
3 | Hypertensive patient presented with higher blood pressure at 160/100. Adjusted medications and emphasized importance of diet and exercise. | -0.009268 | acute |
38 | Patient with history of irritable bowel syndrome showed signs of relief with diet modification. Reinforced importance of hydration and fiber intake. | -0.011980 | acute |
4 | Examined patient for frequent headaches. No alarming neurological signs detected. Recommended reducing caffeine intake and keeping hydrated. | -0.013830 | acute |
28 | Patient presented with severe migraine episodes. Prescribed stronger painkillers and referred to a neurologist for a complete neurological work-up. | -0.015010 | acute |
52 | Elderly patient with hearing difficulty. Scheduled an appointment for audiology consultation and potential hearing aids. | -0.015633 | acute |
45 | Patient reported feeling low, indicating depression. Counseled and referred to mental health services for further treatment and support. | -0.018623 | acute |
35 | Patient presented with swollen (oedemic) ankles. Adjusted diuretic medication dosage and recommended follow-up visit next week. | -0.020730 | acute |
47 | Type 1 diabetic patient showed increased blood sugar levels. Insulin doses were adjusted, advised dietary check. | -0.020986 | acute |
17 | Rash observed on patient's arm. Allergenic contact dermatitis suspected. Prescribed topical steroid and recommended patch test. | -0.030286 | acute |
34 | Postnatal patient showed healthy recovery from C-section. Wound healing well without signs of infection. | -0.032995 | acute |
8 | Arthritic patient reports increased joint pain. Advised gentle physical therapy and adjustment of pain medication. | -0.033021 | acute |
48 | Child patient presented with symptoms of ADHD. Referred to pediatric psychiatrist for cognitive and behavioral evaluation plus treatment. | -0.038751 | acute |
53 | Patient presented with signs of Bell's Palsy. Referred to neurologist and prescribed corticosteroid medication to reduce inflammation. | -0.042732 | acute |
46 | Patient with anxiety showed improvement with recent medication schedule. Encouraged continued therapy and stress management techniques. | -0.048093 | acute |
44 | Patient with history of stroke presented with dizziness. Follow-up brain scan scheduled to rule out recurrence. | -0.053002 | acute |
14 | Patient presented with signs of depression. Recommended seeking a psychologist for cognitive behavioral therapy and prescribed low dose antidepressant. | -0.057786 | acute |
29 | Patient suffering from high fevers and weight loss. Ordered comprehensive bloodwork to assess for possible infectious disease. | -0.059315 | acute |
16 | Patient suffered a minor burn in the kitchen. Prescribed topical ointment and advised on wound care. | -0.062896 | acute |
23 | Post-surgical patient reported redness and discomfort around stitches. Diagnosed with a mild post-operative infection. Prescribed antibiotics and wound care guidance. | -0.068201 | acute |
2 | Patient came in with complaints of stomachache. Upon examination, no obvious signs of appendicitis or gallstones were found. Recommended patient to undergo an ultrasound to assess internal organs. | -0.074738 | acute |
39 | Child patient presented with red, itchy eyes. Diagnosed as viral conjunctivitis. Instructed parents about proper eye care and hand hygiene. | -0.077558 | acute |
10 | Patient presented with chest pain. Preliminary assessment ruled out heart conditions. Recommended gastroenterology consult to check for possible reflux disease. | -0.080826 | acute |
26 | Patient presented with severe knee pain. Suspected ligament tear. Recommended MRI scan. | -0.081825 | acute |
20 | Historically anxious patient reported increased panic attacks recently. Referred to psychiatrist for therapeutic support, prescribed low-dose anxiolytic. | -0.083702 | acute |
49 | Patient presented with dislocated shoulder. After successful relocation, advised sling use and gentle mobility exercises for rehabilitation. | -0.091579 | acute |
0 | Patient presented signs of common cold; symptoms include a runny nose and mild fever. Advised rest, hydration, and over-the-counter medication to manage symptoms. | -0.097242 | acute |
7 | Patient presented with symptoms of influenza including fever, cough, and body aches. Prescribed antiviral medication and recommended rest and plenty of fluids. | -0.100791 | acute |
40 | Patient showed symptoms of urinary tract infection. Collected sample for testing and started preliminary antibiotic treatment. | -0.101196 | acute |
50 | Pregnant patient at full term demonstrated normal signs. Advised to prepare for labor and familiarize with signs of onset. | -0.101702 | acute |
While the results aren’t perfect, they’re certainly pretty impressive! Especially given how little code we had to write to achieve it!
Hopefully, this tutorial has given you some ideas to try in your own work. But keep in mind that this is a rapidly evolving field, and the specific pieces of code used here may or may not work in the future as things continue to shift.